While digital information accuracy is important to all document preservation, some institutions benefit from it more than others. Intelligent Information Management (IIM) and paperless offices are sufficient for most businesses, but if the content is important for historical or informational purposes rather than a backup, the quick and easy options for digitization don’t always do the trick. There are serious pitfalls of intelligent capture, especially if a precise representation of the document’s content is important to a collection, such as legal documents, documents used for research and reference, or a historical document collection like a digital library.
In these cases, human factors can never be replaced by technology because of the intelligence and problem-solving care experts possess. Whether the end result is for a paperless office or a collection of documents, the metadata, organization, and hands-on human approach can make the resulting digital library much more accurate and efficient.
When handling documents, a human touch is invaluable. While smart scanners can do a great job scanning without the aid of a human, they don’t always know to stop if there’s a problem, or even register that a problem occurred.
Important documents can have rips or rough edges that hamper a smooth feed, causing further damage to the document in the scanning process. Other documents could contain a missed staple or paperclip, or a few pages could slip off the feed due to an unexpected air draft. These are times a human can intervene.
Human touch and monitoring of the scanning process can prevent damage, provide custom solutions on the fly, and identify gaps in a collection long before originals are lost or destroyed. This human-monitored scanning process ensures that nothing goes awry.
Optical character recognition (OCR) is an essential part of digitizing; however, in cases where every word must be accurately digitized, the human eye absolutely cannot be replaced by software. Once a document is OCR’d, most optical reading software highlights low confidence characters that should be examined to confirm accuracy. Often these characters are only one or two letters off from the correct word or number, but the computer isn’t able to distinguish what the text is supposed to be. Common one-off characters include ri instead of n, rn instead of m, and 8 instead of S. When it comes to search, accuracy is hampered by poor OCR.
OCR software can also fail to register the correct order of text if there are multiple columns or text boxes or misrepresent handwritten notes as pictures on the page. Even if the words are correctly identified, but the reading order is wrong, the final searchable document can end up vastly different from the original and impede quick research. If text is recognized as images, the search function is lost altogether.
When a collection is used for reference or research, precise OCR is often the difference between finding a search term almost instantly, and not finding it at all.
The human mind catches patterns and solves puzzles much more creatively than current software can automate. When tagging documents with metadata, familiarity with the content is crucial, and only a human can work with documents at that level. Because of specialized familiarity, a human is able to match and label works a computer wouldn’t be able to sort without additional input, such as when an author is referred to or published under different names.
When tagging for a paperless office, automated metadata generation may be enough. However, for a more robust search and detailed collection, a human can determine the search categories the institution needs and how the collection will be used, so that relevant metadata is recorded to create a functional and useful collection. The human mind can also notice themes when working intimately with documents and is able to tag them accordingly.
When it comes to digitizing, IIM is the answer for many. But for a quality digital collection, archived for history and enhanced by robust search, the human factor combined with efficient technical processes is essential.