We require the participator to submit the results as a single .zip file. Each .txt file in the .zip file contains the results of the corresponding image or video clip. Notably, the results of each image/video clip must be stored in the archive’s root folder.

The results file for each task should be stored in the SAME format as the provided ground-truth file, i.e., the CSV (Comma-Separated Values) text-file containing one object instance per line. If there exists no output detection/tracking result, please provide an empty file. We suggest the participator reviewing the ground truth format before proceeding. For different tasks, each line in the text-file contains different content. The format of the text-file of different tasks is described below in detail.

Object Detection in Images

Both the ground truth annotations and the submission of results on test data have the same format for object detection in videos. That is, each text file stores the detection results of the corresponding image, with each line containing an object instance in the image. The format of each line is as follows:


Please find the example format of the submission of results for object detection in images here (BaiduYun|Google Drive).

Position NameDescription
1<bbox_left>The x coordinate of the top-left corner of the predicted bounding box
2<bbox_top>The y coordinate of the top-left corner of the predicted object bounding box
3<bbox_width>The width in pixels of the predicted object bounding box
4<bbox_height>The height in pixels of the predicted object bounding box
5<score>The score in the DETECTION result file indicates the confidence of the predicted bounding box enclosing an object instance.The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, while 0 indicates the bounding box will be ignored.
6<object_category>The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1), people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10), others (11))
7<truncation>The score in the DETECTION result file should be set to the constant -1. The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1(truncation ratio 1% ∼ 50%)).
8<occlusion>The score in the DETECTION result file should be set to the constant -1. The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1(occlusion ratio 1% ∼ 50%), and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).

Zero-shot Object Detection

The submission of results on test data should be pkl format file and should be named detection_results.pkl. A zip compressed file contains a file of pkl and a description pdf. Please find the example format of results for zero-shot object detection here.