video object tracking

Our world today is all about numbers. Numbers matter.

Abto Software experts in mathematical modeling and image processing came up with the idea of an algorithmic approach to such calculations based on slit imaging. The innovative concept with the operating name Object Counter makes it possible to count all the typical objects fast and safe and promises a significant real-world implication.

There often arises a need to know the number of objects that pass through a spotor point. Say, the statistics on people entering a building, a quantity of cars crossing a location, a number of items leaving the production line, etc.

One of our primary goals was to develop an automated counting system algorithm which would assist in tracking, separation and counting of objects and required minimal resources or additional expenses

After thorough evaluation of some of the widely used approaches to image processing, we saw lack of efficiency of the majority of today’s systems.

In this paper we propose a novel image counting method with highly accurate results using morphological operations to count objects in a video stream.

It is easily described with the example of the moving cars on the street.

The task is to calculate all the vehicles crossing the camera view having a video file or live video stream of a street.

The method comprises two basic steps:

  1. Slit image generation
  2. Objects count

 1. Slit image generation

What is slit image?  An imaginary line is drawn on the road in the video stream. From every frame of the video only pixels coinciding with that line are taken. The pixels from a frame are arranged in a single horizontal line. The same is done with every frame, thus there is one line per frame. So, with 1000 frames in the video there is an image consisting of 1000 rows of pixels. This image is known as slit image.

Every moving object crossing the line is to be reflected in the slit image in order to be counted.

The example below is a 1-minute video file of street traffic.

Here is the first shot in the video sequence. The so-called “region of interest” is defined and drawn on it:

object counting in a video


Starting from this one, every frame is taken and pixels, marked with region of interest, are extracted. Following N frames will give us a single slit image with the height of N rows.

The input video clip demonstrates how slit image is being formed (region of interest is marked red):

Here is the resulting slit image:

object counting in a video

So, a single slit image substitutes the whole video sequence, which means the video is no longer needed.

If video file is too long or live video stream is used, the source is spitted into equal parts and one part gives one slit image. Those images are successively processed and the results are constantly accumulated. When dealing with live stream, accumulation of 1-5 minutes video in a slit image proved to be the most efficient. Seamless slitting is gained when slit images are processed in a separate thread.

2. Object counting

The background of the slit image is quite homogeneous and all vehicles are clear and visible. The next step is to separate the cars and the background. The background of every column consists of the same pixel taken from different frames. Their distinctiveness in rows is determined by 2 additional and inconsiderable compared to foreground pixels factors: luminosity changes and camera noise.

The y-gradient threshold algorithm method was chosen in order to subtract background from foreground.

Firstly, we converted RGB into grayscale, after that we calculated the y-gradients (that is vertical differences in value of neighboring pixels), and finally we defined foreground pixels as those for which differences are bigger than some little threshold. As the result, binary mask of foreground objects is obtained.

object counting in a video

We observed the biggest value of vertical gradients on the sides of every moving object. The center is typically homogeneous that is why the gradient value does not exceed the threshold. So we did the following procedures:

1. Morphological dilation

2. Filling holes

3. Morphological erosion

4. Filter blobs by area

We used lozenge-shaped structuring element slightly compressed horizontally for morphological dilation and erosion.

0          0          0          1          0          0          0

0          0          0          1         0          0          0

0          0          1          1          1          0          0

0          1          1          1          1          1        0

1          1          1          1          1          1          1

0          1          1          1          1          1        0

0          0          1          1          1         0          0

0          0          0          1         0          0         0

0          0          0          1         0          0          0

The reason for choosing this particular structural element is that we have performed filtering only by vertical gradients, so when executing morphological dilation it is crucial to band the vertically broken objects. Besides, the chosen form aids the splitting of the neighboring blobs with the next erosion that might have merged with the previous dilation.

When filtering blobs by size, those smaller than one-third of the average blob-size are omitted. This prevents from the mistakes in identification of the splits-offs after morphological erosion or effect of camera noise.

The Figure below shows how the mask changes after applying 4 stages of the morphological processing:

object counting in a video

The result of morphological processing will look like this:

object counting in a video

There are totally 15 blobs (white solid regions on the black background). That is 15 objects crossed the line within the time frame of one minute. So the objects are counted and the task is fulfilled.


Object count is an important functional component in many vision-based systems performing a large number of social, corporate and commercial activities.

There are numerous existing methods of object count nowadays but they involve complex algorithms and solutions, a lot of hardware components, sensors and detectors

The most vivid advantage of our approach over other similar methods is its accuracy level of 95%.

It is non-intrusive as well as affordable which is bound to find valuable application in enterprise, mass production and social spheres.


Insert math as
Additional settings
Formula color
Text color
Type math using LaTeX
Nothing to preview