Multi-Object Tracking Evaluation

Given an input video sequence, multi-object tracking aims to recover the trajectories of objects in the video. Notably, in the VisDrone2021 Challenge, we only consider five object categories in multi-object tracking evaluation, i.e., car, bus, truck, pedestrian, and van. An evaluated algorithm is required to recover the trajectories of objects in video sequences with/without taking the object detection results as input. 

We use the protocol in [1] to evaluate the tracking performance. Specifically, each algorithm is required to output a list of bounding box with confidence scores and the corresponding identities. We sort the tracklets (formed by the bounding box detections with the same identity) according to the average confidence of their bounding box detections. A tracklet is considered correct if the intersection over union (IoU) overlap with ground truth tracklet is larger than a threshold. Similar to [1], we use three thresholds in evaluation, i.e., 0.25, 0.50, and 0.75. The performance of an algorithm is evaluated by averaging the mean average precision (mAP) across object classes over different thresholds. The evaluation code is available on the VisDrone github.

References:

[1] E. Park, W. Liu, O. Russakovsky, J. Deng, F.-F. Li, and A. Berg, “Large Scale Visual Recognition Challenge 2017,” http://image-net.org/challenges/LSVRC/2017.