# Numbers matter: Moving Object Counting Using Image/Video Processing Part 1

Our world today is all about numbers. Numbers matter.

Abto Software experts in mathematical modelling and image processing came up with the idea of an algorithmic approach to such calculations based on slit imaging. The innovative concept with the operating name Object Counter makes it possible to count all the typical objects fast and safe and promises a significant real-world implication.

There often arises a need to know the number of objects that pass through a spotor point. Say, the statistics on people entering a building, the number of cars crossing a location, the number of items leaving the production line, etc.

One of our primary goals was to develop an automated counting system algorithm that would assist in tracking, separation and counting of objects and required minimal resources or additional expenses

After a thorough evaluation of some of the widely used approaches to image processing, we saw a lack of efficiency in the majority of today’s systems.

In this paper, we propose a novel image counting method with highly accurate results using morphological operations to count objects in a video stream.

It is easily described with the example of the moving cars on the street.

The task is to calculate all the vehicles crossing the camera view having a video file or live video stream of a street.

The method comprises two basic steps:

1. Slit image generation
2. Objects count

## 1. Slit image generation

What is slit image?  An imaginary line is drawn on the road in the video stream. From every frame of the video only pixels coinciding with that line are taken. The pixels from a frame are arranged in a single horizontal line. The same is done with every frame, thus there is one line per frame. So, with 1000 frames in the video there is an image consisting of 1000 rows of pixels. This image is known as slit image.

Every moving object crossing the line is to be reflected in the slit image in order to be counted.

The example below is a 1-minute video file of street traffic.

Here is the first shot in the video sequence. The so-called “region of interest” is defined and drawn on it:

Starting from this one, every frame is taken and pixels, marked with the region of interest, are extracted. Following N frames will give us a single slit image with the height of N rows.

The input video clip demonstrates how a slit image is being formed (region of interest is marked red):

Here is the resulting slit image:

So, a single slit image substitutes the whole video sequence, which means the video is no longer needed.

If the video file is too long or a live video stream is used, the source is spitted into equal parts and one part gives one slit image. Those images are successively processed and the results are constantly accumulated. When dealing with live stream, accumulation of 1-5 minutes video in a slit image proved to be the most efficient. Seamless slitting is gained when slit images are processed in a separate thread.

## 2. Object counting

The background of the slit image is quite homogeneous and all vehicles are clear and visible. The next step is to separate the cars and the background. The background of every column consists of the same pixel taken from different frames. Their distinctiveness in rows is determined by 2 additional and inconsiderable compared to foreground pixels factors: luminosity changes and camera noise.

The y-gradient threshold algorithm method was chosen in order to subtract the background from the foreground.

Firstly, we converted RGB into grayscale, after that we calculated the y-gradients (that is vertical differences in the value of neighbouring pixels), and finally, we defined foreground pixels as those for which differences are bigger than some little threshold. As the result, a binary mask of foreground objects is obtained.

We observed the biggest value of vertical gradients on the sides of every moving object. The centre is typically homogeneous which is why the gradient value does not exceed the threshold. So we did the following procedures:

1. Morphological dilation

2. Filling holes

3. Morphological erosion

4. Filter blobs by area

We used a lozenge-shaped structuring element slightly compressed horizontally for morphological dilation and erosion.

0          0          0          1          0          0          0

0          0          0          1         0          0          0

0          0          1          1          1          0          0

0          1          1          1          1          1        0

1          1          1          1          1          1          1

0          1          1          1          1          1        0

0          0          1          1          1         0          0

0          0          0          1         0          0         0

0          0          0          1         0          0          0

The reason for choosing this particular structural element is that we have performed filtering only by vertical gradients, so when executing morphological dilation it is crucial to band the vertically broken objects. Besides, the chosen form aids the splitting of the neighboring blobs with the next erosion that might have merged with the previous dilation.

When filtering blobs by size, those smaller than one-third of the average blob-size are omitted. This prevents the mistakes in identification of the splits-offs after morphological erosion or effect of camera noise.

The Figure below shows how the mask changes after applying 4 stages of morphological processing:

The result of morphological processing will look like this:

There are totally of 15 blobs (white solid regions on the black background). That is 15 objects crossed the line within the time frame of one minute. So the objects are counted and the task is fulfilled.

## Summary:

Object count is an important functional component in many vision-based systems performing a large number of social, corporate and commercial activities.

There are numerous existing methods of object count nowadays but they involve complex algorithms and solutions, a lot of hardware components, sensors and detectors

The most vivid advantage of our approach over other similar methods is its accuracy level of 95%.

It is non-intrusive as well as affordable which is bound to find valuable application in the enterprise, mass production and social spheres.

Summary
Title
Moving Object Counting Using Image/Video Processing
Description

Automated object counting and tracking system using image processing of video frames that requires minimal resources or additional expenses.