As companies deploy automation solutions to digitize their workflows and processes, there are many instances when structured and unstructured data from documents needs to be put into a system or used in a process. Typically, this is done via manual transcription or using an extraction tool that is dependent on creating and maintaining document layout templates that map out the location of key information. These methods are time-consuming and require scores of workers to maintain.
Automation Anywhere intelligent document processing solution, IQ Bot, uses a combination of artificial intelligence (AI) technologies to easily obtain data from documents and the need for document layout maps. The latest release of IQ Bot for Enterprise A2019 includes new pre-trained models for invoice extraction. IQ Bot auto-extraction further leverages AI to create models that speed data extraction for invoices, so users can get up and running with extracting data without the need to train a custom document extraction model.
Consider the visual blocks
Let’s take a moment to examine an invoice and deconstruct it into visual blocks. The invoice will typically contain many different elements such as:
- Order numbers
- Table elements
When looking at an invoice, we can quickly identify these blocks and see various patterns associated with each of the elements. For example, the format of a “date” block differs from the “address” block. These elements need to be handled differently. For example, a different action is required for the table elements vs. number fields vs. graphics.
This identification is necessary and requires different implied logic and rules so that the elements can be properly extracted and grouped and the information structured. For example, take an instance where a description stretches across multiple lines or a case where serial numbers are embedded into the columns. The first step is understanding what an object is to properly structure and extract information in a useful matter.
By applying this realization to invoices using computer vision, we can locate and identify each of the different elements. This is done through the training of a multi-layered deep learning model. Each layer of the model feeds into the next based on characteristics of the block and contents until the block type is identified. The beauty of this technique is that the position of the blocks is irrelevant and can be moved around and placed anywhere on a document.
Other solutions, especially ones that rely on document coordinate maps, may compensate for slight shifts in an object’s position due to scanning shifts and skews. But once a vendor updates a layout to look more modern and moves field locations around, those maps are immediately obsolete and need to be reworked. Since customers can have hundreds or thousands of vendors, the scale of maintaining those document maps is significant.
Computer vision solves that problem and is document agnostic. Once a computer vision model is adequately trained on an element, that element can be located anywhere on the page. The model still works across different document types as it isn’t connected to any layout. In our case, we utilize industry-proven capabilities from TensorFlow to accomplish this. Let’s say goodbye to those pesky document maps.
The next challenge after element identification is to extract the actual data values with high confidence. The data extraction for Document Automation auto-extraction utilizes additional AI enhancements.
Let’s use a common invoice field—invoice date—to explain the challenges. When the date field is passed to a commodity off-the-shelf OCR engine, we receive basic optical character extraction results and probability scores for the output. Since the element block is a “date” field, the various returned result combinations can then be sent into an AI model that determines the final output.
There are a finite number of methods for constructing a date field and what the subset of valid alpha-numeric characters is. Through model training, this information is “learned” by the model and can determine with a high degree of accuracy the valid options.
Employing a machine learning (ML) model to perform this type of analysis is more appropriate than using a series of simple rules or applying regex patterns and validation checks. If we were using rules and patterns to build similar capabilities, there could easily be hundreds of checks to ensure proper coverage for American date conventions.
When we expand the problem set to include additional languages, countries, and conventions, the number of rules and patterns quickly becomes quite large. This makes the technique of using them poorly suited for addressing the problem. In contrast, by using AI, all that is required to support other languages and countries would be to retain the model with more additional samples from those languages and countries.
When considering AI model training, the larger and more comprehensive the data set, the better the results or outcomes will typically be. This work usually is highly technical and detailed and would require a data scientist or data analyst to ensure that the data set is accurate, impartial, balanced, and labeled correctly. The models used in the IQ Bot auto-extraction capability are trained using more than a million data points to ensure solid results.
By having pre-trained models packaged for users, obstacles are removed, speeding data and business flow.