During August 2018, 12 students completed the annual Computer Vision & AI internship at Abto Software. The interns, selected from among more than 120 candidates, attended a series of nine advanced lectures delivered by our R&D experts. Each of the lectures was followed by a lab section where interns discussed the presented material, asked questions, worked on problems, and received additional assistance.

Internship Description

The internship course was divided into several modules and covered the topics of computer vision, artificial intelligence, and machine learning. During the first module, our interns got acquainted with the main building blocks of the OpenCV library: they learned how to manipulate the images on a pixel level, explored different image processing and image segmentation techniques, and analyzed a variety of feature detection methods. The second module of the course was focused on machine learning, so the interns had the opportunity to work with neural networks and developed an understanding of the image classification techniques and regression models. During the third and final module of the course, our interns worked in groups of two to three people on five computer vision projects, the topics for which were provided by Abto R&D engineers.

As part of the internship, our trainees had to develop a computer vision technology that solves one of the outlined tasks and present it to the R&D department. We are very proud of the progress our interns showed during the course and want to share the achievements they have reached in their projects.

Graduation Projects Overview

Checkers Detection & Type Recognition

Task: to detect checkers positions and recognize their type from the real-time video stream of the checkerboard.

This project was undertaken by two of our interns, and they have decided on the next workflow for their solution:

  1. Pre-processing of the input video frame – the checkers are positioned on the light squares of the checkerboard to increase the detection accuracy of the dark pieces that become almost invisible otherwise.
  2. Detecting checkerboard contours.
  3. Homography – rendering of the checkerboard with the correct perspective.
  4. Checkers detection with Hough Circle Transform (Figure 1a).
  5. Distinguishing dark pieces from the light pieces.
  6. Rendering of the resulting virtual checkerboard (Figure 1b) – the radius of virtual checkers equals the average radius of detected pieces during the fourth step.

  

Figure 1. Checkers Recognition: a) processed frame of the input video stream; b) program output.

Tools & Technologies: OpenCV, image&video processing, camera calibration techniques, Canny edge detector, affine transformations, Hough Circle Transform.

Telling Time From Clock Photos

Task: to tell time from a clock photo.

The two interns who decided to work on this task have solved it for both analog and digital clock photos. The processing of analog clock photos is shown in Figure 2: after the initial pre-processing the algorithm applies Canny edge detector and Hough Line Transform to find clock hands. The distinguishing between the hour hand, marked green, and the minute hand, marked red, is carried out on the basis of their length. The final step is translating the angle between hands into time and showing it to the user in the international standard notation.

Figure 2. Processing of the analog clock photos.

For the processing of digital clock photos, the program first detects the display of the clock (Figure 3). After that, it separates each of the numbers and analyses their segments to determine the time shown.

       

Figure 3. Processing of the digital clock photo: a) input image; b) display detection; c) numbers recognition; d) resulting visualization.

As a result, the program translates a digital clock into an analog and draws the latter on the screen, as shown in Figure 3d.

Tools & Technologies: OpenCV, image processing, Canny edge detector, Hough Line Transform.

Game Bot Development

Task: to develop a Flappy Bird game bot.

The complexity of this task made the two interns who took it on learn and apply the most advanced approaches. They have split the project into two subtasks:

  1. Computer Vision part for extracting the info from the game scene;
  2. Artificial Intelligence part for teaching the bot how to behave.

The main technique used for the bot training is Q-learning: for each game state the bot finds an optimal policy on the basis of extracted info from the game scene – in this case, it decides whether to jump or not to jump. All the information is inputted in the Q-learning table which contains historical data about the transitions between game states that is recalculated each time the bot loses – this way it plays better and better with each iteration.

The next video shows how the bot handles the game after having been trained with the outlined approach.

Tools & Technologies: OpenCV, computer vision, reinforcement learning (RL): Q-learning, Neuroevolution.

Finding Way Through Maze

Task: to find a way between two marked points through the maze.

This task was solved by three of our interns. Their program processes the real-time video stream (Figure 4a) and defines the mask of the maze which is later closed by Convex Hull (marked green in the Figure 4b) to prevent the final route from breaking through the labyrinth borders. After that, the program locates the indicated starting and finishing points and finds the way between them by applying A* Search Algorithm.

Figure 4. Finding a way through the maze: a) a frame from the input video stream; b) visualization of the path.

Tools & Technologies: OpenCV, image&video processing, A* Search Algorithm.

Poker Cards Recognition

Task: to recognize a five-card poker combination from a photo.

The three interns that worked on this project have developed a robust solution that comprises the next steps:

  1. Thresholding and morphology of the input image (Figure 5a).
  2. Edges detection (Figure 5b) and their approximation with four points for corners detection (Figure 5c).
  3. Affine transformation of the cards (Figure 6a).
  4. Cards rank recognition (Figure 6b).

   

Figure 5. Image processing: a) source image; b) edge detection; c) corners detection.

   

Figure 6. Cards rank recognition: a) affine transformation; b) result of the recognition.

Tools & Technologies: OpenCV, PyTorch, image processing, artificial neural networks (ANNs) – a multilayer perceptron (MLP) and convolutional neural networks (CNNs), deep learning, k-nearest neighbors algorithm (k-NN).

Future Plans

Abto Software hopes that the graduates of its summer 2018 Computer Vision & AI internship will join the company in the near future.

For more employment opportunities visit our careers website, and apply for 2019 Artificial Intelligence & Computer Vision Summer Camp!

Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert