By: John Mancini on April 28th, 2011

What Is Capture?

Document capture software is the front-end software that is used to convert unstructured and semi-structured paper or formatted electronic business documents (e.g., various PDF files or faxes), and other unstructured content-centric business documents into an indexed image and then automatically use pattern recognition technologies supplemented by business rules to extract accurate data and add pertinent metadata for use in one or more business processes.

These documents and associated relevant data are validated and then placed into document and records management workflow, put in centralized or distributed repositories, or published electronically on the Internet or an intranet, while the data is used to incept or add to business process transactions.

In its most basic form, capture is simply scanning and indexing the image of the document. At its most complex, it consists of a series of modular components to accurately convert many thousands of pages of documents a day to images and associated validated data, whether received as paper, fax, or within an email.

The systems can scan documents; analyze and classify them according to known layouts; convert areas and fields of information using pattern recognition (OCR, ICR, OMR, and barcode); apply rules to fields and areas of information; interface and extract validation and verification information from back end databases, create relevant metadata, key data that cannot be automatically recognized; and verify fields.

Systems sell for anything between a few hundred dollars and several million dollars, depending on the complexity of the solution. There are a number of variations in between these extremes. Many vendors’ products overlap these broad generalizations. Ad-hoc capture vendors have added batch scanning capability, batch image capture vendors have added some transactional capability; batch transaction systems vendors have added business process integration.

Typically vendors offer a variety of solutions. Most traditional ECM vendors offer some form of proprietary ad-hoc and batch image capture capability as the first step in getting the paper into the company’s workflow. A few have started to offer transactional capture capability. Pure capture vendors usually offer a range of outputs (or scripts) to support multiple different ECM image formats.

About John Mancini

John Mancini is the President of Content Results, LLC and the Past President of AIIM. He is a well-known author, speaker, and advisor on information management, digital transformation and intelligent automation. John is a frequent keynote speaker and author of more than 30 eBooks on a variety of topics. He can be found on Twitter, LinkedIn and Facebook as jmancini77. Recent keynote topics include: The Stairway to Digital Transformation Navigating Disruptive Waters — 4 Things You Need to Know to Build Your Digital Transformation Strategy Getting Ahead of the Digital Transformation Curve Viewing Information Management Through a New Lens Digital Disruption: 6 Strategies to Avoid Being “Blockbustered” Specialties: Keynote speaker and writer on AI, RPA, intelligent Information Management, Intelligent Automation and Digital Transformation. Consensus-building with Boards to create strategic focus, action, and accountability. Extensive public speaking and public relations work Conversant and experienced in major technology issues and trends. Expert on inbound and content marketing, particularly in an association environment and on the Hubspot platform. John is a Phi Beta Kappa graduate of the College of William and Mary, and holds an M.A. in Public Policy from the Woodrow Wilson School at Princeton University.