As OCR technologies become increasingly important for digitizing business processes we are looking for ways to perform text recognition more efficiently in terms of time, accuracy, and computational resources. Abto computer vision engineers have proposed a novel approach to data extraction that focuses only on the meaningful information within your files and retrieves key-value pairs from semi-structured and unstructured documents.

Automating the Processing of Unstructured Documents

More often than not businesses work with documents that display information according to some specific keywords inherent to that type of documents. For example, insurance claim form, apart from its main body that varies depending on the insurance company, type of incident, etc. most probably contains words “name”, “address”, “Social Security number”, followed by corresponding text. Such pieces of text together with their labels are called key-value pairs.

As it happens, real-life business documents lack consistent structure and the key-value pairs within them are usually positioned at random places. These unstructured documents are tricky to process digitally as no hard-coded rules can be used for data extraction. With this challenge in mind, we have built a data extraction tool that can help you automate the processing of any unstructured documents.

Abto prototype for data extraction retrieves information in the key-value format and transforms documents into business-ready data better prepared for processing, analysis, and storage. You choose which keys to look for and the text recognition algorithm extracts data from all of the documents that contain indicated keys no matter where they are positioned within the document.

Try our demo

How Our Data Extraction Technology Works

Our structured data extraction prototype allows capturing information from images and PDF files. First, the algorithm performs initial preprocessing of the input documents facilitating text recognition that follows next. After finding user-indicated keys within the OCR-ed document it extracts the corresponding values and saves this information in the key-value format. The post-processing stage ensures the highest accuracy of the entire data extraction process.

Features of Abto Data Extraction Technology

  • Extracts valuable information from your documents
    Retrieve key-value pairs from your financial and insurance forms, bank statements, receipts, invoices, bills, and other semi-structured and unstructured documents.
  • Fast data entry automation
    OCR algorithm processes only relevant information, extracts the data, and presents it in a format ready for further integration into your business processes.
  • Accurate text recognition
    Abto Software has extensive experience in the field of OCR technologies that allows us to tackle all challenges of text recognition.
  • Flexible solution for document-intensive processes
    Our structured data extraction prototype supports six languages, allows users to set up custom keys, and does not require documents in one batch to be of the same template.
Read how we developed a Handwritten Numbers Recognition Model that is 2.5 times more accurate than Google Cloud Vision.

Data Extraction Solution for Variety of Industries

The automated data extraction technology can be implemented across a variety of industries for different use cases. Since it works without preliminary template setup and places little restrictions on the document structure Abto data extraction technology can be used for:

  • Financial documents: analyze tax forms, balance sheets, bank statements to extract taxpayer information, names of organizations, and financial figures.
  • Invoices & receipts: analyze restaurant bills, grocery receipts, utility bills, purchase orders to extract issue and due dates, account and invoice numbers, quantities and amounts of the products.
  • Insurance documents: analyze insurance policies, contracts, claims, agreements to extract personal data of the insured person, information from an insurance claim and related documents.
  • Medical documents: analyze medical records, operative reports, doctor prescriptions to extract patient personal data, doctor’s name, lab results, and prescribed drug names.
  • Judicial documents: analyze court records, case files, victim and police reports to extract identification details, addresses, dates, and incident types.
  • Shipping documents: analyze packing lists, certificates, shipping labels to extract origin and destination addresses, package weight, shipping class, and contents description.

The extracted text is presented in a structured format ready for subsequent integration into the document management system of your choice. You can automatically import extracted structured data into your ERP/EHR/CRM or accounting system for further analysis and processing.

Abto data extraction tool helps your business to automate document processing by digitizing legacy documents, eliminating error-prone manual data entry, and cutting costs.

What’s Next

The developed automated data extraction tool is the first component of our comprehensive document management solution that we plan to extend with the next functionality:

  • document classification & indexing;
  • database verification of the extracted information;
  • duplicated documents detection;
  • sensitive data detection;
  • role-based data redaction & visibility.

Interested? Fill in the contact form below to talk with our experts about your use case.

Contact Us

To find out more about Abto Software expertise, request a quote or get a demo of your custom solution.

  • Clicking this button, I agree to the processing of my personal data.
Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert