Camera-based driver activity recognition for driver monitoring system

Improving driving safety through Advanced Driver Assistance Systems (ADAS)

Industry:

Project summary

We developed a custom, CV-based DMS that detects distracting behaviors in real-time only using a camera. The flexible and scalable, AI-backed DMS can identify drivers smoking, talking, texting, and eating with over 90% accuracy – a basis for smarter ADAS solutions.

Driver distraction is the key factor in 8% of all fatal crashes in the United States, which highlights the demand for intelligent monitoring technologies.

Services:

AI development

Technical consulting

Device integration

Customer overview

Our client is a $50 billion public corporation that provides cutting-edge technology solutions across the globe. One of their fastest-growing divisions is the automotive solutions division that aims at creating new standards for vehicles with the vision of improving driving safety through Advanced Driver Assistance Systems (ADAS).

The company approached us in 2019 with the request to develop a Computer Vision-powered Driver Monitoring System (DMS) based on driver activity recognition. Abto Software was chosen among other candidates due to our extensive expertise in AI-driven Computer Vision technologies including face recognition, driver fatigue detection, eye & gaze tracking, pose estimation, and gesture recognition.

Project overview & goals

The project entrusted to Abto Software included three key tasks:

1. To analyze state-of-the-art Deep Learning and Computer Vision approaches to vision-based activity recognition;

2. To determine optimal in-cabin camera placement for the task at hand;

3. To develop a Driver Monitoring System capable of real-time driver activity recognition and its temporal localization with 90%+ accuracy.

The major goal of the project was defined as ensuring the driver’s focus on the road and increasing road safety for all traffic participants.

The development of the driver activity recognition algorithm also brings our customer one step closer to building a comprehensive Automated Driving System (ADS). This stems from the fact that starting from Level 3 Vehicle Automation (called Limited Self-Driving Automation according to the National Highway Safety Administration, NHTSA) carmakers are bound to monitor the drivers to be sure they are paying attention when they need to take control of an autonomous vehicle and reengage in the driving task.

Due to the complex nature of the project, we split its execution into six phases:

Planning
Development of the video capturing and annotation tool
Data collection
Analysis of activity recognition approaches
Development of the AI-driven driver activity recognition model
Development of the toolkit for adding recognized activities

In this case study we will look closer at each phase – its core objectives and main deliverables.

Driver monitoring system development process

Phase 1. Planning

We began the development of the Driver Monitoring System by collecting requirements and defining the activities supported by the real-time recognition algorithm. They include 6 types of activities with all the other activities considered to be of type ‘other’:

eating
reading or messaging on the phone
chatting with the passenger
smoking a cigarette
applying cosmetics
making a phone call

Next, we established a training and development setup. After brief research, we selected the optimal type of FullHD IR cameras and suggested two possible ways of their positioning in the car cabin (marked blue and red on the scheme below).

Note: only one camera in any of the positions is required to monitor the driver.

Figure 1. Two options for camera setup in the car cabin:

Phase 2. Video Capturing & Video Annotation Toolkits

In preparation for the data collection process, we have developed custom cross-platform video capturing and video annotation toolkits. As the proposed Driver Monitoring System had to support two camera positioning options, we had to collect training and test videos from both viewpoints thus the need for custom video capturing and video annotation toolkits that helped to speed up the data collection process. They allowed us to record videos from both cameras simultaneously and then annotate captured data in the same manner – two videos at once. This way we doubled up the number of collected videos and cut the annotation time in half.

Tools & Technologies used for the development of video capturing & video annotation toolkits: C++, Qt.

Phase 3. Data Collection

The dataset collected and annotated by Abto consists of around 1900 video clips that cover:

6 driver activities;
8 cars;
19 people;
daytime and nighttime conditions.

Figure 2. Examples of frames from the video clips that can be used for the driver activity recognition model training:

Additional augmentation was applied during the training and testing of the real-time driver activity recognition model to enrich the dataset and ensure the robustness of the model. The actual data augmentation techniques we used include: resizing input video, converting it to grayscale, applying random horizontal flipping, adjusting brightness and contrast levels.

Phase 4. Approaches to Activity Recognition

While developing the necessary toolkits and collecting testing and training data we have performed a thorough analysis of the state-of-the-art Deep Learning approaches to camera-based human activity recognition (HAR). We have considered such methods as two-stream inflated 3D CNNs, body and face landmarks recognition, pose estimation, and spatiotemporal models. We have also researched HOROVOD, a distributed Deep Learning training framework that makes it faster and easier to train AI models.

Investigated Machine Learning frameworks: TensorFlow, PyTorch, Keras.
Investigated Deep Learning architectures: Two-stream CNNs, ResNet, Inception v4, Inception ResNet.

Phase 5. AI-powered Driver Activity Recognition

Our rigorous review of the existing human activity recognition techniques allowed us to form a clear idea of the optimal approach to building a driver activity recognition model for the proposed Driver Monitoring System. We have devised a custom DNN architecture that uses two input streams, spatial and temporal, fusing them after the final layer to predict the driver’s activity type in real-time.

After training and validating the proposed real-time driver activity recognition model we have compared it with the results obtained during the International Challenge on Activity Recognition (ActivityNet) and performed a detailed error analysis to suggest improvements during the next stages of the project. We have delivered our findings to the customer in the form of a comprehensive report that also included the overview of studied state-of-the-art approaches to human activity recognition.

Phase 6. Toolkit for Adding Recognized Activities

As the developed Driver Monitoring System included recognition of only 6 activities, we upgraded it with the custom toolkit that allows adding new types of activities to the driver activity recognition model.

Technology stack:

TensorFlow
Two-stream CNN
Inception v4
OpenCV
optical flow

Project duration:

3 months

Team:

1 project manager/subject-matter expert
3 Deep Learning specialists
1 R&D Computer Vision specialist
2 software developers
1 QA
1 data annotator

Business value delivered

Abto Software has delivered a Driver Monitoring System capable of performing real-time driver’s activity type classification and its temporal localization with 90%+ accuracy. The software application recognizes 6 types of driver’s actions from a single video source and supports two possible camera positions in the vehicle cabin.

In combination with driver’s fatigue detection and driver’s health tracking technologies the developed Driver Monitoring System enables comprehensive identification of in-cabin situations and ensures safe driving in automated and semi-automated vehicles.