As OCR technologies become increasingly important for digitizing business processes we are looking for ways to perform text recognition more efficiently in terms of time, accuracy, and computational resources. Abto computer vision engineers have proposed a novel approach to data extraction that focuses only on the meaningful information within your files and retrieves key-value pairs from structured documents.
Businesses work with documents that display information according to some specific keywords inherent to that type of documents. For example, insurance claim form, apart from its main body that varies depending on the insurance company, type of incident, etc. most probably contains words “name”, “address”, “Social Security number”, followed by corresponding text. Such piece of text together with its label is called a key-value pair.
Abto prototype for structured OCR extracts information in key-value format and transforms documents into business-ready data better prepared for processing, analysis, and storage. You choose which keys to look for and the text recognition algorithm extracts data from all of the documents that contain indicated keys no matter where they are positioned within the document.
The structured data extraction technology can be implemented across a variety of industries for different use cases. Since it works without preliminary template setup and places little restrictions on the document structure Abto data extraction technology can be used for:
Our structured data extraction prototype allows to capture information from images and PDF files. First, the algorithm performs initial preprocessing of the input documents facilitating text recognition that follows next. After finding user-indicated keys within the OCR-ed document it extracts the corresponding values and saves this information in the key-value format. Finally, the post-processing stage ensures the highest accuracy of the entire data extraction process.
To try out our data extraction demo follow the next steps:
You can also choose one of the four sample images to test our data extraction prototype – just click on one of the images below and press the “Extract data” button. The language and the keys will be filled in automatically.
Click on one of the sample images or upload your own file making sure it follows the requirements described above and press the “Extract data” button.