Data Extraction for Structured Documents


Retrieve structured data from your documents with our automated Optical Character Recognition tool.

Try it free

Features of Abto Structured Data Extraction Technology

Extracts valuable information from your structured documents
Retrieve key-value pairs from your financial and insurance forms, bank statements, receipts, invoices, bills, and other structured documents
Fast data entry automation
OCR algorithm processes only relevant information, extracts the data, and presents it in a format ready for further integration into your business processes
Accurate text recognition
Abto Software has extensive experience in the field of OCR technologies that allows us to tackle all challenges of text recognition
Flexible solution for document-intensive processes
Our structured data extraction prototype supports six languages, allows users to set up custom keys, and does not require documents to be of the same template

As OCR technologies become increasingly important for digitizing business processes we are looking for ways to perform text recognition more efficiently in terms of time, accuracy, and computational resources. Abto computer vision engineers have proposed a novel approach to data extraction that focuses only on the meaningful information within your files and retrieves key-value pairs from structured documents.

Businesses work with documents that display information according to some specific keywords inherent to that type of documents. For example, insurance claim form, apart from its main body that varies depending on the insurance company, type of incident, etc. most probably contains words “name”, “address”, “Social Security number”, followed by corresponding text. Such piece of text together with its label is called a key-value pair.

Abto prototype for structured OCR extracts information in key-value format and transforms documents into business-ready data better prepared for processing, analysis, and storage. You choose which keys to look for and the text recognition algorithm extracts data from all of the documents that contain indicated keys no matter where they are positioned within the document.

Would you like to learn more
about our OCR solutions?


Data Extraction Solution for Variety of Industries

The structured data extraction technology can be implemented across a variety of industries for different use cases. Since it works without preliminary template setup and places little restrictions on the document structure Abto data extraction technology can be used for:

  • Financial documents: tax forms, balance sheets, bank statements to extract taxpayer information, financial figures;
  • Invoices & receipts: restaurant bills, grocery receipts, utility bills, purchase orders to extract issue and due dates, account and invoice numbers, quantities and amounts of the products;
  • Insurance documents: insurance policies, contracts, claims, agreements to extract personal data of the insured person, information from insurance claim and related documents;
  • Medical documents: medical records, lab results, operative reports, doctor prescriptions to extract patient and doctor personal data, other necessary medical information;
  • Judicial documents: court records, case files, victim and police reports;
  • Shipping documents: packing lists, certificates, shipping labels to extract origin and destination addresses, package weight, its shipping class and content.

How to Use Our Data Extraction Technology

Our structured data extraction prototype allows to capture information from images and PDF files. First, the algorithm performs initial preprocessing of the input documents facilitating text recognition that follows next. After finding user-indicated keys within the OCR-ed document it extracts the corresponding values and saves this information in the key-value format. Finally, the post-processing stage ensures the highest accuracy of the entire data extraction process.

To try out our data extraction demo follow the next steps:

  1. Upload a PDF file or image (BMP, PNG, JPEG, and JPG formats are supported) from your computer. Make sure the document is not rotated or upside down and the text is black on white. Note that only the first page of the PDF document will be processed.
  2. Select the language of the document.
  3. Type in up to five case-insensitive keywords (from 3 to 50 characters) that you want to be found within the file pressing Enter to separate them. Note that the prototype requires that:
    • there is no more than one keyword per line in the uploaded document;
    • each keyword in the document is followed by a colon (:);
    • the value is in the same line as its key.
  4. Press the “Extract data” button and receive your results. You will see both the original and preprocessed file with highlighted key-value pairs and the table with OCR results.
  5. Press the “Try again” button to run the data extraction demo for another file.

You can also choose one of the four sample images to test our data extraction prototype – just click on one of the images below and press the “Extract data” button. The language and the keys will be filled in automatically.

Try Data Extraction Demo Yourself

Click on one of the sample images or upload your own file making sure it follows the requirements described above and press the “Extract data” button.

Contact Us

To find out more about Abto Software expertise, request a quote or

get a demo of your custom solution.

  • Clicking this button, I agree to the processing of my personal data.
Insert math as
Additional settings
Formula color
Text color
Type math using LaTeX
Nothing to preview