Document capture software is the front-end software that is used to convert unstructured and semi-structured paper or formatted electronic business documents (e.g., various PDF files or faxes), and other unstructured content-centric business documents into an indexed image and then automatically use pattern recognition technologies supplemented by business rules to extract accurate data and add pertinent metadata for use in one or more business processes.
These documents and associated relevant data are validated and then placed into document and records management workflow, put in centralized or distributed repositories, or published electronically on the Internet or an intranet, while the data is used to incept or add to business process transactions.
In its most basic form, capture is simply scanning and indexing the image of the document. At its most complex, it consists of a series of modular components to accurately convert many thousands of pages of documents a day to images and associated validated data, whether received as paper, fax, or within an email.
The systems can scan documents; analyze and classify them according to known layouts; convert areas and fields of information using pattern recognition (OCR, ICR, OMR, and barcode); apply rules to fields and areas of information; interface and extract validation and verification information from back end databases, create relevant metadata, key data that cannot be automatically recognized; and verify fields.
Systems sell for anything between a few hundred dollars and several million dollars, depending on the complexity of the solution. There are a number of variations in between these extremes. Many vendors’ products overlap these broad generalizations. Ad-hoc capture vendors have added batch scanning capability, batch image capture vendors have added some transactional capability; batch transaction systems vendors have added business process integration.
Typically vendors offer a variety of solutions. Most traditional ECM vendors offer some form of proprietary ad-hoc and batch image capture capability as the first step in getting the paper into the company’s workflow. A few have started to offer transactional capture capability. Pure capture vendors usually offer a range of outputs (or scripts) to support multiple different ECM image formats.