We require the participator to submit the results as a single .zip file. Each .txt file in the .zip file contains the results of the corresponding image or video clip. Notably, the results of each image/video clip must be stored in the archive’s root folder.

The results file for each task should be stored in the SAME format as the provided ground-truth file, i.e., the CSV (Comma-Separated Values) text-file containing one object instance per line. If there exists no output detection/tracking result, please provide an empty file. We suggest the participator reviewing the ground truth format before proceeding. For different tasks, each line in the text-file contains different content. The format of the text-file of different tasks is described below in detail.

Object Detection in Images

Both the ground truth annotations and the submission of results on test data have the same format for object detection in videos. That is, each text file stores the detection results of the corresponding image, with each line containing an object instance in the image. The format of each line is as follows:

<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

Please find the example format of the submission of results for object detection in images here (BaiduYun|Google Drive).

Position NameDescription
1<bbox_left>The x coordinate of the top-left corner of the predicted bounding box
2<bbox_top>The y coordinate of the top-left corner of the predicted object bounding box
3<bbox_width>The width in pixels of the predicted object bounding box
4<bbox_height>The height in pixels of the predicted object bounding box
5<score>The score in the DETECTION result file indicates the confidence of the predicted bounding box enclosing an object instance.The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, while 0 indicates the bounding box will be ignored.
6<object_category>The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1), people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10), others (11))
7<truncation>The score in the DETECTION result file should be set to the constant -1. The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1(truncation ratio 1% ∼ 50%)).
8<occlusion>The score in the DETECTION result file should be set to the constant -1. The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1(occlusion ratio 1% ∼ 50%), and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).

Multi-Object Tracking

Both the ground truth annotations and the submission of results on test data have the same format for multi-object tracking. That is, each text file stores the multi-object tracking results of the corresponding video clip, with each line containing an object instance with the assigned identity in the video frame. The format of each line is as follows:

<frame_index>,<target_id>,<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

Please find the example format of the submission of results for multi-object tracking here(BaiduYun|Google Drive).

PositionNameDescription
1<frame_index>The frame index of the video frame
2<target_id>In the DETECTION result file, the identity of the target should be set to the constant -1.In the GROUNDTRUTH file, the identity of the target is used to provide the temporal corresponding relation of the bounding boxes in different frames.
3<bbox_left>The x coordinate of the top-left corner of the predicted bounding box
4<bbox_top>The y coordinate of the top-left corner of the predicted object bounding box
5<bbox_width>The width in pixels of the predicted object bounding box
6<bbox_height>The height in pixels of the predicted object bounding box
7<score>The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing an object instance.The score in the GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, while 0 indicates the bounding box will be ignored.
8<object_category>The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1), people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10), others (11))
9<truncation>The score in the DETECTION file should be set to the constant -1.The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ∼ 50%)).
10<occlusion>The score in the DETECTION file should be set to the constant -1.The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ∼ 50%), and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).

Crowd Counting

The submission of results on test data have the different format from the ground truth annotations for crowd counting. That is, each text file stores the counting numbers of the corresponding sequence, with each line representing the number of people heads in the frame. The format of each line is as follows:

<frame_index>,<counting_number>

Please find the example format of the submission of results for crowd counting here (BaiduYun (code:p7d7) | Google Drive).

Position Name Description
1 <frame_index> The frame index of the video frame arranged from smallest to largest
2 <counting_number> The counting number of people heads in the frame