By: John Mancini on April 27th, 2011
What are the Standard Output Image Formats of Capture Software?
While capture vendors typically offer an array of outputs to support multiple different ECM image formats, the industry has mostly settled on three standard formats. There are pros and cons to each format. Which one you use depends on the volume, what information is being captured, and how you plan on using that information.
1. TIFF
As bandwidths were limited when the imaging and document management industry began, the industry adopted 200 dots-per-inch TIFF Group 4 compressed black and white images as their standard (which created a roughly 75K KB image for a standard page of text). A secondary advantage to the TIFF group 4 format was that it is a loss-less compression standard—i.e., no image data is removed during the compression.
Each vendor then added some specific headers, which made their formats unique. Third-party capture vendors, therefore, had to create “formatters” or “release scripts” in order to create an image that would seamlessly import into the document management systems.
2. JPEG
In the photographic and consumer world, a lossy standard named JPEG became ubiquitous and is used exclusively in mobile phone captured images. The accuracy of OCR and other recognition technologies from a JPEG compressed format depends not only on resolution but also on the amount of loss that was taken when the image was compressed. Usually, compression can be as much as 80% without much loss, but this very much dependent on the complexity (busyness) of the images since JPEG encodes areas of similarity in a picture (e.g., the sky).
JPEG2000 seemed to be interesting as it has the ability to be loss-less, but has never become popular. We believe this is partly because the compression time required has often been too great to support high volume imaging.
3. PDF
Since the origins of the industry and adoption of TIFF, PDF has also become a standard. The advantage of PDF is that it contains a standardized header that can allow the image to be decompressed and displayed or managed using standard PDF viewing software which Adobe and others have made available freely.
Underneath the PDF header, many different formats can be carried. These include TIFF Group 4, JPEG, and numbers of other formats including, the TIFF Group 3 format used in faxes. It can also contain ASCII text complete with font format information or a mixture of images and text. But all this is transparent to the user. The user just sees the image.
PDF/A is used as an output from batch imaging capture software where conversions to archives are being performed. It is a version of PDF that is used for protecting documents for archiving. It is increasingly important for capture solutions to support it and a requirement in many European countries.
About John Mancini
John Mancini is the President of Content Results, LLC and the Past President of AIIM. He is a well-known author, speaker, and advisor on information management, digital transformation and intelligent automation. John is a frequent keynote speaker and author of more than 30 eBooks on a variety of topics. He can be found on Twitter, LinkedIn and Facebook as jmancini77. Recent keynote topics include: The Stairway to Digital Transformation Navigating Disruptive Waters — 4 Things You Need to Know to Build Your Digital Transformation Strategy Getting Ahead of the Digital Transformation Curve Viewing Information Management Through a New Lens Digital Disruption: 6 Strategies to Avoid Being “Blockbustered” Specialties: Keynote speaker and writer on AI, RPA, intelligent Information Management, Intelligent Automation and Digital Transformation. Consensus-building with Boards to create strategic focus, action, and accountability. Extensive public speaking and public relations work Conversant and experienced in major technology issues and trends. Expert on inbound and content marketing, particularly in an association environment and on the Hubspot platform. John is a Phi Beta Kappa graduate of the College of William and Mary, and holds an M.A. in Public Policy from the Woodrow Wilson School at Princeton University.