The venerable template allows structured form data to be accurately extracted. In the document capture industry, the concept of templates where you specify the location of each data element is a tried-and-true strategy for structured forms. If the form is standardized, giving the software the precise place to look for data will almost always result in better performance over alternatives such as rules-based approaches using keywords or patterns. Even with unstructured data such as on invoices, we find that many organizations have opted for a template approach after finding that more flexible, rules-based approaches fall short. The result is a tremendous amount of upfront effort and a lot of maintenance.
Machine Learning vs Templates
Can machine learning that uses a more adaptive set of technologies perform better than even structured templates? The answer is: definitely. Why? Because using adaptive algorithms can also account for variations not only in data location, but with image quality problems such as resolution or noise.
Recently, we worked with a solution provider who implemented several thousand templates to process invoices and were achieving decent results. This was after many hours spent to create each template and curate them to accommodate for subtle and not-so-subtle changes in layout.
Smart Learning beats templates because it can intelligently cluster data into similar groups and then identify key features on thousands of examples much more efficiently than any human could. Using this information, it can construct a model that can process a wider variety of documents with improved precision. And it can do this for structured forms and unstructured documents. The result is much faster payback without the cost.
The template is dead, long live Smart Learning.
About the Authors: Greg Council is the VP of Marketing and Product Management at Parascript who specializes in bringing products to market in the document capture, enterprise content management and business process management markets.
About Greg Council
Greg Council is the VP of Marketing and Product Management at Parascript who specializes in bringing products to market in the document capture, enterprise content management and business process management markets.