Using Image Processing for Business Card Layout and Text Recognition

Image processing, which is a generalizing title for all kinds and methods of applying the computer power to extract data from pictures, is definitely a hot topic among developers. Over the last decades, image processing has greatly impacted the medical industry, space exploration, geology, oceanography, and today, with a powerful digital camera in every phone, its applications are quickly becoming close not only to scientists but to everyone. Having accomplished a number of projects using image processing techniques (here are just some examples):
- Super-resolution Image Reconstruction
- Deconvolution
- Object Matcher by means of Image Processing
- Super-Resolution for Identification Purposes
We can confidently state that image and video processing has become the most interesting area in digital signal processing. This article is dedicated to an interesting Image Processing technique used to automatically recognize and extract information from business cards.
Business cards recognition systems accuracy
Although there already exist several automatic recognition systems and business card reader apps showing 80% recognition accuracy with a recall of 80% and a precision of 70%, (we also recommend you to read the article about the Best Business Card Reader App for the iPhone), we dare to say that the algorithm our team suggest can show 90% accuracy.
Stages of business cards recognition
Since the text recognition part of the process takes up quite a bit of CPU power, we suggest using a standard client-server methodology where a client (e.g., mobile device) takes a photo of a certain business card, finds the text areas on the image and sends them to the powerful server for optical recognition. Once recognized, this textual content can be sent back to the phone, stored in a database at the server, passed on to some third party app, etc. The task thus has two main stages:
- Business card layout recognition
- Optical character recognition (OCR)
Note: As an OCR engine we propose to use Tesseract — probably the most accurate open source OCR engine available. Tesseract’s OCR accuracy is near 98% for character recognition and 95-97% for word recognition.
How the system works: step-by-step explanation
Now we are going to describe a simple algorithm implemented in MATLAB to recognize a business card layout. The algorithm will work with a grayscaled image. That’s why we start the process from transforming a color image into a grayscaled one. To detect text areas we use a special filter — a modified method of standard deviation on sliding window calculation. The result of this filter is converted into a binary image by means of the Otsu Thresholding algorithm. After that, we pick the blobs satisfying certain criteria for length, width, and direction. For each blob which satisfies these criteria, we build a bounding box. Having a set of bounding boxes we obtain a mask for finding text areas. The pictures below illustrate the effectiveness of the suggested approach. Note that it is only a prototype and all control parameters are hardcoded for this type of image.

1.1. Original color image

1.2. Gray-scaled image

1.3. Filtered image

1.4. Thresholded image

1.5. Filtered blobs

1.6. Bounding boxes for found blobs

1.7. Found text areas
Here are a few examples of the algorithm at work:

Card 1 – horizontal layout, dark text on light background

Card 2 – horizontal layout, light text on dark background

Card 3 – vertical layout, combination of dark/light text and backgrounds
With the algorithm described above we can efficiently find text areas on business cards building reasonable guesses on the purpose of each text area. And then, using Tesseract for Optical character recognition (the 2nd stage of our task), we can reliably achieve 90% precision of business card layout and text recognition.