Since the earliest forays into optical character recognition (OCR) by Ray Kurzweil in the early 1970s, software developers have been on a mission to teach computers how to do the paperwork for us.
What if the computer could replace the interminable number of hours needed each day in offices around the globe for humans to read documents, understand the meaning, and extract the right data for the next step in a work process? What if the computer could also do the data entry?
This is the Holy Grail of knowledge worker productivity. It’s also the mission of intelligent document processing (IDP) software, which for 50 years has progressed very slowly towards the goal. AI has always been at the center of this quest. As the availability of cost-effective AI power has increased, so has the ratio of machine automation to human work.
In 2023 cost-effective AI power has gone to the next level. Foundational Large Language Models (LLMs) are now widely available to developers and – just as importantly – are now economically feasible for business applications. While not designed with IDP in mind, these LLMs have shown great promise to move us even closer to our Holy Grail. This marks the beginning of the 4th Wave of IDP.
For the first time, a machine can reliably classify documents and extract data without the need for training samples or prior knowledge. Our clumsy categorization of documents into structured, semi-structured, and un-structured does not matter to the LLM. Send it invoices, contracts, forms, emails, correspondence, or any other text. In AI terms this capability is known as zero shot learning. It looks like magic to us.
When ChatGPT first landed, we fed it invoice data and asked it to extract company names, payment dates, discount data, product information from tables, and more. The results were astonishing. We knew then that LLMs could change IDP products forever – once skillfully integrated. This AI could replace current IDP needs for long and expensive AI model training work, coding, and arcane regular expression language. Users would need no data science knowledge or business intelligence experience to process documents.
Microsoft demonstrated this near-magical 4th Wave capability on May 24th at its Build conference. An Office365 user opens Microsoft Copilot (a generative AI assistant) and types a question in natural language. The user asks it about the information contained inside a document in the SharePoint library. Copilot (powered by a Microsoft Syntex plugin invoking GPT-4) reads the document and generates an accurate reply. If the user needed data from the document, Copilot could also extract the data into a formatted file and even create an Office document.
As of [May 31, 2023], Copilot is in preview mode and does not have a batch processing capability to automate large groups of documents. However, Microsoft told us this is only the beginning and to expect batch automation functionality in the future. We expect this will be integrated with the Microsoft PowerAutomate platform. If you cannot wait for this and want to roll your own, Microsoft is providing the developer tools through AzureOpenAI.
Other IDP companies have announced plans to use foundational LLMs from OpenAI and Google. In recent conversations with various IDP vendors, we learned that the immediate value of foundational LLMs is their content summarization and sentiment analysis capabilities. One vendor even called it an easy entry into the unstructured document market for contracts and such.
When it comes to the core IDP grunt work of document classification and data extraction, every vendor told us that foundational LLMs are not as accurate or predictable as their own deep learning models which have been fine-tuned for years with domain-specific business documents. They also stressed the vital importance of the best possible OCR text extraction to feed the LLM. The LLM’s input and output must also be carefully integrated within the IDP workflow of OCR, data validation, quality control, and integrations.
Even with those caveats, the barrier of entry for an IDP product has never been this low. Will the 4th Wave produce another Cambrian explosion of startups like we saw with deep learning in the 2010s? We at Deep Analysis believe it will and that this is only the beginning.
We spoke last month to one of the earliest 4th Wave startups. One such startup was founded by two very bright graduate students at Carnegie-Mellon University in my hometown of Pittsburgh, Pennsylvania. They are using LLMs to create a flurry of automation tools for insurance brokerages and facilities management offices.
Unlike previous IDP startups, this company doesn’t need to raise money to build an expensive data scientist team or spend years building proprietary LLMs and training AI models. Using the new foundational LLMs, they started on the product journey much closer to the customer’s pain point and quickly put a solution into production. The flexibility of a foundational LLM has also enabled them to rapidly iterate new features and move into other use cases. Expect to see other startups that focus on solving IDP’s last big challenge: unstructured document understanding.
Will the 4th Wave lift IDP users to automation nirvana? It’s far too early to tell. We have covered the challenges and dangers of foundational LLMs in previous analyst notes.
But make no mistake: as we post, there are thousands of clever and industrious developers and product managers working hard to mitigate those issues and launch 4th Wave IDP products that will move us ever closer to the goal.
[This blog post was adapted, with permission from Deep Analysis. See original post here.]