While capture vendors typically offer an array of outputs to support multiple different ECM image formats, the industry has mostly settled on three standard formats. There are pros and cons to each format. Which one you use depends on the volume, what information is being captured, and how you plan on using that information.
As bandwidths were limited when the imaging and document management industry began, the industry adopted 200 dots-per-inch TIFF Group 4 compressed black and white images as their standard (which created a roughly 75K KB image for a standard page of text). A secondary advantage to the TIFF group 4 format was that it is a loss-less compression standard—i.e., no image data is removed during the compression.
Each vendor then added some specific headers, which made their formats unique. Third-party capture vendors, therefore, had to create “formatters” or “release scripts” in order to create an image that would seamlessly import into the document management systems.
In the photographic and consumer world, a lossy standard named JPEG became ubiquitous and is used exclusively in mobile phone captured images. The accuracy of OCR and other recognition technologies from a JPEG compressed format depends not only on resolution but also on the amount of loss that was taken when the image was compressed. Usually, compression can be as much as 80% without much loss, but this very much dependent on the complexity (busyness) of the images since JPEG encodes areas of similarity in a picture (e.g., the sky).
JPEG2000 seemed to be interesting as it has the ability to be loss-less, but has never become popular. We believe this is partly because the compression time required has often been too great to support high volume imaging.
Since the origins of the industry and adoption of TIFF, PDF has also become a standard. The advantage of PDF is that it contains a standardized header that can allow the image to be decompressed and displayed or managed using standard PDF viewing software which Adobe and others have made available freely.
Underneath the PDF header, many different formats can be carried. These include TIFF Group 4, JPEG, and numbers of other formats including, the TIFF Group 3 format used in faxes. It can also contain ASCII text complete with font format information or a mixture of images and text. But all this is transparent to the user. The user just sees the image.
PDF/A is used as an output from batch imaging capture software where conversions to archives are being performed. It is a version of PDF that is used for protecting documents for archiving. It is increasingly important for capture solutions to support it and a requirement in many European countries.