8 Things You Need to Know about Automating Document Indexing
John Mancini

By: John Mancini on January 3rd, 2010

Print/Save as PDF

8 Things You Need to Know about Automating Document Indexing

Document Management  |  Capture and Imaging

1. Choose Your Battles

Just because you have purchased a great new scanning/capture/data entry automation application doesn't mean that it makes sense to automate every type of document under the sun. Sure, you may feel empowered to spend the time or money required to automate the indexing of that quarterly report that is generated only four times per year, but that would be analogous to hunting for quail with a bazooka. Make sure that you look at the feasibility and return on investment before jumping into projects. Always take the automation projects with the highest & fastest ROI first and pass on the low or negative net present value projects.

2. Choose the Most Accurate Recognition Technology

Obviously, your choice may be limited by the records that you are trying to automate. However, if you do have a choice, follow this simple rule. Barcode/Patch Code recognition is the most accurate, then OCR (machine printed text recognition), then constrained ICR (handwriting recognition), and lastly, unconstrained ICR.

Get Your Free eBook: Intelligent Capture - A Key Element in Your Digital  Transformation Strategy

3. Test For Recognition Accuracy Early

Even when using barcode recognition or OCR, the accuracy of recognition will likely be less than 100% in the long run. Make certain that you test the accuracy of the recognition component of your capture/automation design early on in the process with a relatively large sample. This will ensure that there are no surprises down the road.

Additionally, if you are in the evaluation stage of selecting an application, make sure that the supplier of the product performs the demonstration with a large sample of your documents. Avoid demonstrations using standard documents that are prepared by the vendor. Why? You want to ensure that the automated indexing procedure that they have developed works on your documents with a high degree of accuracy, not only on their documents that they have prepared for the demo. A good trick to throw at vendors is to provide ten samples of the document type to be automated for the demonstration. Then, at the time of the demo, give them 100 more documents (of the same type) that they have never seen before. This will truly address the accuracy of the application and automation process.

4. Key on Documents You Control

In many capture applications, the logic used to automate indexing and separate an individual document (set of pages) from a batch is to key off of some identification page. In most cases, it is easier to achieve full automation with a high level of accuracy if your identification page is one that you control.

Assume that you need to scan and index all of your vendor bills into your document management system. Automating the indexing for these documents can be difficult since you have no control, and there are many different formats of vendor invoices. For example, 1,000 different vendors could mean 1,000 or more different invoice formats. Creating an automated indexing process would be very time-consuming in this case. Furthermore, your vendors could change the format of their invoice on you without any notice. This can result in the constant reworking of your data entry automation scheme. Additionally, automating vendor invoices is a process that typically requires human quality control, which will increase your overall costs.

As an alternative, explore automating the input using records you control as the identification page. Using our vendor invoices example, you can use the checks you cut to pay the invoices as the identification page. Your bank checks, in conjunction with your accounting system's database, can typically provide an automation process that is nearly 100% accurate and fully automated.

The key here is a change in the process. Rather than having each individual vendor invoice in your document management system as the process output, you would have a check packet in your system as the output. The check packet would consist of the check (or check stub) followed by all of the invoices the check paid for. If you ever need to retrieve a specific invoice, you can search your accounting system for the check number that paid it and then pull up the check packet in your document management system.

Sure, this process does add an extra step to retrieval, but it cuts down dramatically on the input process costs and would provide a greater ROI due to reduced input costs related to quality assurance and the like.

5. Quality Assurance

Any index automation process is prone to some level of error. Therefore, it is best practice to establish some level of quality assurance procedure, even if it is a very brief procedure. Even though today's scanning devices have features to detect multi-feeds and auto-threshold scanned images, you will want to verify image quality even if on a random basis.

6. Pre and Post-Verification

It is important to ensure that you track what was intended to be processed and what was actually processed. At a minimum, simple page counts and record (individual documents in the batch) counts should be employed and verified with the output. Even the most thorough index automation process can come across an unexpected file that will throw the process off.

7. Documentation

There are thousands of technical writers working for thousands of software companies. Some are definitely better than others. However, regardless of how good these technical writers are, boilerplate documentation is never best for a specific process. Take the time to document the process (with screenshots and videos, if possible) for your scanning/indexing staff. The time spent documenting the process will pay off tenfold down the road.

8. Outsource

Last, and certainly not least, outsource any manual indexing processes that make sense to outsource. All too often, firms spend time and money staffing people to perform tasks that are not part of the firm's distinctive competence. Outsourcing makes sense in many situations, even for small companies and small projects. Keep in mind that you can lower costs through 'hybrid' outsourcing...where only part of the process is outsourced. For example, local service bureaus can charge an arm and a leg to perform scanning and indexing services. It costs too much using foreign firms for small projects due to the shipping costs of the records. For processes where the indexing can't be automated, try scanning the files yourself and outsourcing the indexing to any one of the thousands of firms in India with a simple upload of the scanned files.

 

Free eBook: Intelligent Capture - A Key Element in Your Digital Transformation Strategy

About John Mancini

John Mancini is the President of Content Results, LLC and the Past President of AIIM. He is a well-known author, speaker, and advisor on information management, digital transformation and intelligent automation. John is a frequent keynote speaker and author of more than 30 eBooks on a variety of topics. He can be found on Twitter, LinkedIn and Facebook as jmancini77. Recent keynote topics include: The Stairway to Digital Transformation Navigating Disruptive Waters — 4 Things You Need to Know to Build Your Digital Transformation Strategy Getting Ahead of the Digital Transformation Curve Viewing Information Management Through a New Lens Digital Disruption: 6 Strategies to Avoid Being “Blockbustered” Specialties: Keynote speaker and writer on AI, RPA, intelligent Information Management, Intelligent Automation and Digital Transformation. Consensus-building with Boards to create strategic focus, action, and accountability. Extensive public speaking and public relations work Conversant and experienced in major technology issues and trends. Expert on inbound and content marketing, particularly in an association environment and on the Hubspot platform. John is a Phi Beta Kappa graduate of the College of William and Mary, and holds an M.A. in Public Policy from the Woodrow Wilson School at Princeton University.