Real-time hockey player tracking: Getting the right trajectory
It was hockey players and coaches who were responsible for the development of the game of ice hockey since it was first played in Canada in 19th century. Introducing advanced equipment technology marked the first turning point in this fast fluid game. But the next revolutionising changes are happening in the world of computer vision that opens new horizons for one of the most spectacular sports in the world. Extracting reliable information from the visual data obtained from hockey matches is fundamental for developing computer systems that will give players, coaches, scouts and doctors invaluable insights for gaining competitive edge and performing at a high level.
Our R&D engineers experimented with the hockey game video clip with the aim to develop a real-time hockey player tracking technology, that is a multi-object tracking system. They focused on acquiring historical position of players – extracting their individual motion and converting it into trajectories. Let’s see what they have achieved.
For the experiment we used publicly available recording of the Gladiators vs Generic Hockey match within the ‘C’ League of SFAHL held on November 7, 2013. The demonstrations will be provided for a one-second excerpt (1:42-1:43) from the said recording. The 25 frames we selected represent typical hockey player tracking problems we aim to solve.
- Merging silhouettes of the hockey players
- High speed of the hockey game
- Low video resolution
- Improper camera position that leads to unfavorable field of view
- Automatic players selection and detection
- Real-time players historical positions definition
- Precise tracking of overlapping and intersecting trajectories
Therefore, the outlined problem split up into two subtasks:
- Player detection that is to be solved by foreground/background selection already tackled by Abto computer vision engineers
- Hockey players trajectories definition that is complicated both by unsuitable shooting angle and physical contact of the players
The state of the art approach supposes applying Kalman filtering and other similar techniques that work on the next principle: for each new position of the moving object they look for the closest in a certain way according to the certain criterion trajectory and add said position to the chosen trajectory. Ordinarily, such algorithms use the X/Y player coordinates. That lack of input data causes vast losses of the information available in the video stream. The frame sequence below shows how MATLAB motion-based multiple object tracking algorithm fails to track hockey players.
Straight from the first frames we face the problem of both false positive (1st and 5th players) and false negative (goalkeeper and extreme left player) detections. The next issue appears as the 3rd player merges with 2nd when they pass each other. The merged player becomes a referee upon leaving the field and the ‘lost’ player becomes a new one later on. Such mess results in detecting 9 players instead of the 6 present on the field (including a referee) and absolutely distorted trajectories as the tracks are assigned to the wrong detections.
Our own implementation of the state of the art approaches
Abto R&D engineers tried to improve the above algorithm to suit the particular task of tracking hockey players to increase its accuracy. However, as you can see from the frame sequence below, this approach proved to be erroneous by design – our enhancements didn’t eliminate the outlined problems: we adjusted the number of detected players but the trajectories are still skewed.
The lesson we have learned through this experiment is that applying solely geometrical feature space to the problem of advanced object tracking is doomed to failure.
Our R&D team worked out a way to get around this bottleneck. During the first stage, that is player detection, we applied machine learning approach based on R-CNN (region-based convolutional neural network). For the second stage, namely the tracking itself, we suggested extending the feature space that was consisting only of X/Y coordinates earlier on. Adding extra dimensions to this task substantially increased the tracking accuracy as it cut off erroneous detections without affecting the processing time. As you can see from the detailed description of the frames below, we solved all the problems we were facing. Moreover, the flexibility of the developed algorithm allows merging several video streams shot by non-fixed cameras.