Recently we have completed a project for a pharmaceutical company from Belgium. We needed to develop an intelligent character recognition software that would verify tests by checking whether handwritten ID (the set of letters and numbers) on the test container matches the ID from the dataset. Every container has a label with ID and QR-Code which is used as a link to the dataset.
Intelligent Character Recognition (ICR) is an advanced form of Optical Character Recognition; much like OCR, it is a process for the electronic conversion of scanned or sometimes photographed images of handwritten characters to be converted into computer-readable text. Here is a list of problems that occurred during the development of a handwritten characters recognition application:
- poor quality text
- a variety of individual handwritings
- characters separation from the background
- a unique form of symbols
- nonlinear character location
- neighboring symbols can be connected.
Intelligent Character Recognition Software
We have done a lot of work to overcome the limitations of OCR by applying preprocessing algorithms (image binarization, waste clearing, text lines detection, character detection). Overall, the handwritten character recognition can be divided into following steps:
- create a relevant training database
- run preprocessing algorithms
- correctly segment areas with numbers (our region of interest)
Our ICR platform is a self-learning system which means the level of recognition accuracy is increased each time the application is used. Given variances in handwriting styles, capture rates can vary dramatically, however, the OCR software, developed by our image processing team maintains the accuracy rate of 98% if source documents are in good condition. For lower quality input files we were able to achieve an accuracy rate of 80%.
Technologies: MATLAB, Python, OpenCV, C++, Hough transformation algorithms, Android, IOS, image processing, machine learning, k-nearest neighbors algorithm (k-NN).
Applications and Benefits
Today’s digital document libraries need to be searchable and office workers need to be able to index and pull data from within these documents. Traditionally this is done with an office worker keying in the contents of the document. This is, unfortunately, slow and expensive when compared to a computer that could do the same task. OCR of handwritten characters is a rather difficult task. The software for handwritten text recognition that we developed does produce relatively high results, which in turn makes them worthwhile to consider for wide business applications. It can be used for fixed forms processing, such as processing surveys, various applications, questionnaires, tests, and fill-in-the-blank types of forms.