Image processing for business card layout
Image processing refers to the various methods of using computer power to extract data from pictures. It’s a hot topic among developers.
Over the past few decades, image processing has significantly impacted fields like medical imaging, space exploration, geology, and oceanography.
Today, with powerful digital cameras in every phone, its applications are more accessible than ever. Optical character recognition, text recognition, and word recognition have become everyday tools. Image and video processing, along with digital signal processing, are now within reach for both scientists and the general public. And that’s thanks to advanced image processing techniques.
Having accomplished a number of projects using image processing techniques (here are just some examples):
- Super-resolution Image Reconstruction
- Deconvolution
- Object Matcher by means of Image Processing
- Super-Resolution for Identification Purposes
We can confidently state that image and video processing has become the most interesting area in digital signal processing. This article is dedicated to an interesting Image Processing technique used to automatically recognize and extract information from business cards.
Business cards recognition systems accuracy
Several automatic recognition systems and business card reader apps already exist, offering 80% recognition accuracy with a recall of 80% and a precision of 70%. However, we believe that the algorithm our team suggests can achieve 90% accuracy.
Stages of business cards recognition
The text recognition process requires significant CPU power. To optimize this, we recommend using a standard client-server approach. The client, such as a mobile device, captures a photo of a business card and identifies the text areas on the image.
It then sends these areas to a more powerful server for optical recognition. Once the text is recognized, the server can send it back to the phone, store it in a database, or pass it on to a third-party app. This approach leverages optical character recognition and word recognition. It makes the process efficient while reducing the load on the mobile device.
The task thus has two main stages:
- Business card layout recognition
- Optical character recognition (OCR)
Note: As an OCR engine we propose to use Tesseract — probably the most accurate open source OCR engine available. Tesseract’s OCR accuracy is near 98% for character recognition and 95-97% for word recognition.
How the system works: step-by-step explanation
We’re going to describe a simple algorithm implemented in MATLAB to recognize a business card layout. The algorithm works with a grayscale image, so the process begins by converting a color image into grayscale.
To detect text areas, we use a special filter — a modified method of standard deviation on sliding window calculation. The filter’s output is then converted into a binary image using the Otsu Thresholding algorithm.
Next, we identify blobs that meet specific criteria for length, width, and direction. For each qualifying blob, we create a bounding box and generate a mask to locate text areas.
The pictures below show how effective this approach is. Keep in mind that this is just a prototype, and all control parameters are hardcoded for this type of image.
Here are a few examples of the algorithm at work:
With the algorithm described above we can efficiently find text areas on business cards building reasonable guesses on the purpose of each text area. And then, using Tesseract for Optical character recognition (the 2nd stage of our task), we can reliably achieve 90% precision of business card layout and text recognition.