Gesture recognition: from ADAS to sign language translation

Gesture recognition: from ADAS to sign language translation
To understand how gesture recognition technologies change the way we interact with gadgets and even with each other, we highlight the most popular use cases, analyze the industries that will be transformed first, and present our take on building our own hand gesture recognition system.

Demand for gesture recognition is on the rise

Gesture recognition is a computer vision field that encompasses a set of image and video processing algorithms that capture, analyze, and interpret human bodily motion, mainly hand movements and facial expressions. As we have already addressed some of the concepts of facial expression analysis, such as face recognition, head tilting detection, eye tracking, in our article on drowsiness detection we will focus on hand gesture recognition from now on.

How we built AI-based hand gesture recognition system

Our internal R&D team developed a camera-based hand gesture recognition system. Limited by 10 static hand gestures it uses custom CNN to analyze infrared images and video. Read on to find out how we achieved 97%+ recognition accuracy.

Step 1. Dataset creation

We have created a dataset of infra-red images of 10 distinct static hand gestures. The dataset contains 6 single-handed gestures and 4 two-handed gestures performed by different people under different conditions, e. g. at various angles and distances to the IR camera. We have gathered around 1,000 samples per gesture and applied data augmentation to the training dataset afterwards.

See the hand gestures we have selected to be recognized by our AI model:

OK or ring gesture

OK or ring gesture

ILY sign

ILY sign

V sign

V sign

Vulcan salutation

Vulcan salutation

Finger gun

Finger gun

High five

High five

Fist bump

Fist bump

Thumbs up

Thumbs up

Pray sign

Pray sign

Hand heart

Hand heart

Step 2. AI model training

We have built an appearance-based AI model to perform the classification of input visual information into 11 classes, as we consider the absence of any hand gesture as the 11th class. Only left-frame images captured by IR-camera were taken into account and these so-called depth maps were used to train a custom-build CNN.

Step 3. Real-life implementation

The created hand gesture recognition model can analyze both images and video. The only difference in the algorithm is the additional post-processing step taken while processing the video stream to smooth out recognition results. We achieve it by applying a temporal filter, which in our case is a weighted averaging of the recognition results obtained for successive frames of the video sequence. This common technique is often employed to remove noise from video or audio signals and reduce errors. See for yourself how we reached an average of 97% hand gesture recognition accuracy on real-world data.

Exploring approaches to gesture recognition applications

We have mentioned that our hand gesture recognition model is appearance-based. But what does that mean? Let’s dwell on the existing approaches to gesture recognition and the differences between them.

Shape-based model

Shape-based methods represent a hand by its contour or mask (inverted silhouette). They perform gesture recognition based solely on this information. Shape-based approaches lack accuracy for efficient gesture recognition, as they can derive only low-level features. These include contour/area ratios, finger/palm length and thickness, finger number, and similar.

Appearance-based model

Appearance-based models use only the general appearance of a hand. They extract required features and compare them with previously learned training images to match the unknown input image with the most similar known one. Appearance-based approaches utilize Machine Learning models of different complexity. The final gesture recognition accuracy depends mostly on the quality of dataset used for training and robustness of the ML model.

Model-based model

Model-based or volumetric gesture recognition algorithms approximate a hand with its 3D model. This way they analyze its position and movements in the three-dimensional space. 

This method requires capturing information about the shape of a hand. This is often achieved by using stereo cameras or 3D sensors. The method might also apply the structure-from-motion photogrammetric technique to several 2D photos of a hand captured from different views. The 3D hand model is computed from the produced point cloud and is used for further analysis. 

This  approach is expensive and thus not used for real-time gesture recognition. More commonly, it’s applied in modern computer animation.

Skeletal-based model

Skeletal-based approaches detect the key points of a hand and compute its virtual skeleton. Around 20 key points can be detected on a hand. That is just enough to accurately recognize both static and dynamic hand gestures. Skeletal representation is now becoming the most common way to perform hand gesture recognition.

Summary
From ADAS to Sign Language Translation - Hand Gesture Recognition
Title
From ADAS to Sign Language Translation - Hand Gesture Recognition
Description

AI-based hand gesture recognition system and how this technology can change the way we interact with gadgets and with each other.

Contact us

Tell your idea, request a quote or ask us a question