I'm one of those people who is terrible at throwing things away: I have shelves and shelves full of old letters, bank statements, magazines, course notes, and all kinds of other stuff.Most of it never gets consulted, at least in part because it's such a pain to find anything, and you know you'll have to climb on a chair and probably get covered in dust.

It obviously makes sense to throw away the things I never look at and never will need to look at. (Although that raises its own problems: the council only collects waste paper every four weeks, and shredding 30 years of bank statements is going to be a tedious job if they have to be fed into the shredder a page at a time...)

Digital storage is getting a lot cheaper than the living space it would take to store the corresponding amount of paper, and even exotic things like offsite backup that used to be the province of big corporate servers are now part of the everyday consumer experience - I already keep my photo library in the cloud, for instance. Digitising archives also gives you a lot more options for indexing and retrieving documents.

 

Hardware

The weak point is still the interface between the computer and the paper document. As an ordinary consumer, you don't have access to the kind of fast, high-capacity scanning hardware that big companies use for digitising their archives. You soon get bored with laying a page at a time in your flatbed scanner, closing the lid, and pressing the button. That is almost acceptable for dealing with the incoming mail, but it's not practical for large piles of archive material.

A couple of years ago I got myself a Canon MX925 "all-in-one" printer/scanner/fax (do any of their customers still use the fax part?) with a sheet-feeder for the scanner. That works pretty well for paper correspondence, invoices, statements, etc., and you can scan a pile of 30 pages or so in one go, double-sided if necessary. It has a few limitations: you soon discover that some of the organisations who send you a lot of mail are fond of non-standard paper sizes, or ridiculously thick paper, or they staple everything together.

More recently, I've acquired a dedicated document scanner, the Fujitsu ScanSnap iX500. This is a lot faster than the all-in-one, and scans both sides of the paper in one go. Since the paper path is nearly straight, it doesn't suffer much from paper jams, and a bonus is that the page-size detection is done by the software, so non-standard paper sizes (or documents with more than one size of page in them) are scanned cleanly. I suspect that it's another device that will get used intensively a few times a year and not at all for weeks on end in between...

Mock-up bookscannerPaper-feed scanners don't help much for bound documents. I'd especially like to scan some parts of my collection of old magazines so that I can read them on my iPad. I had a look into what the DIY bookscanner people were doing: building devices with a V-shaped platen to support the book and a pair of digital cameras to record the image. That's obviously the best technical solution, but it means building a giant, expensive machine, so it is something you're only going to do if you have a serious commitment to scanning. I tried knocking up a V-shaped book-stand out of coroplast and setting up a camera on a tripod next to it, and it was obvious very quickly why you need a strong, rigid structure and a camera with a remote shutter control.

The iPhone has a good camera and there are plenty of scanning apps that will allow you to take a reasonable picture of a document, automating the cropping and distortion correction steps (Evernote's Scannable app seems to be one of the best). I even found a purpose-built stand that will support your phone at the right height over a table so that you can take consistent images. Getting the lighting right is tricky, especially with glossy paper, and the automatic cropping doesn't always work well. It's OK for single sheets, but I haven't had any success with more complicated things like books and magazines. The iPhone approach is good for incidental tasks like scanning handwritten notes into Evernote after a meeting, but it isn't a replacement for a proper scanner.

Someone pointed me to the Fujitsu Scansnap SV600. This is essentially a cross between a document camera and a line scanner: it sits on your desk and captures documents that you place in its field of view, but it does so by scanning a line of light across them. It has enough depth-of-field to cope with the curvature of bound pages, and the software corrects automatically for the perspective distortion across the field of view. It also comes with tools for automatic cropping and page-flattening. It's reasonably fast - I found I can scan a 40-page magazine in about 5 minutes, including the page-turns, plus a couple of minutes for adjusting the cropping on those pages that weren't detected automatically. The cropping works well if the pages have a single-colour border, but it gets into difficulties with full-bleed illustrations. The quality isn't superb, but the resulting PDFs are perfectly readable, and it comes with OCR software you can use for batch-processing the PDFs to make them searchable.

A difficulty of working with the SV600 for magazines that I encountered - I've seen other people complaining about this too - is that pages tend to bounce back a bit after you've turned them. If this happens whilst the scanner is working, it will blur the image. So you either need to keep a finger on the margin (and then remove it from the scan using the cleanup tool, which takes time) or use a sheet of transparent material to hold the pages as flat as possible. I do this with a piece of polycarbonate I happened to have, but glass would be better, because the polycarbonate gets scratched quite quickly. Even with this, the page turns are still easier than on a flatbed scanner, since the magazine is face-up on the table, and you can scan a double-page spread in one go.

Books and bound journals work pretty well: I've got better results with the SV600 than with a high-quality digital camera, and almost as quickly - quicker if you count post-processing time.

My SV600 hasn't done very well on colour images up to now, so I wouldn't necessarily recommend it as a photo scanner (although the automatic cropping does mean it's very quick to scan a pile of prints - you can easily do four or five at a time). While it produces very clear monochrome and greyscale images, it seems to make colours rather muddy, irrespective of where you set the sliders for compression level and resolution. The preview images in the cropping tool look fine, so it must be a problem in the PDF- and JPG-rendering software. (I tried installing it on a Windows computer to see if it was any better there, but it wasn't.) But I'm still playing with that: maybe I'm doing something wrong.

Software

Of course, all this just leaves you with directories full of big PDF files that you need to organise and make retrievable! I'm still working on finding the best solution for this. There are a lot of tools available, and none of them seems to have a huge following, suggesting that not many people really do this, outside the big corporate world of tailored solutions.

For the personal paperwork, I've been using a tool called Paperless (Mariner software) for a number of years. This has the nice feature that it does OCR on the fly when you scan something in, and tries to guess where to file the document automatically. It normally needs a bit of tweaking (e.g. it often takes my account number for the amount of the invoice!), and the search and filing tools have their limitations, but it is simple enough to use that you don't tend to build up a big backlog of incoming mail. You can use it to archive any printable document that you receive as well as scanning things in.

For the scanned magazines, I'm still looking for a good solution. For the moment, I have them all in a tree of subdirectories on my NAS server, which is fine for browsing (e.g. with an app like GoodReader on the iPad) but hopeless for any kind of search.