We require the participator to submit the results as a single .zip file. Each .txt file in the .zip file contains the results of the corresponding image or video clip. Notably, the results of each image/video clip must be stored in the archive’s root folder.
The results file for each task should be stored in the SAME format as the provided ground-truth file, i.e., the CSV (Comma-Separated Values) text-file containing one object instance per line. If there exists no output detection/tracking result, please provide an empty file. We suggest the participator reviewing the ground truth format before proceeding. For different tasks, each line in the text-file contains different content. The format of the text-file of different tasks is described below in detail.
Object Detection in Images
Both the ground truth annotations and the submission of results on test data have the same format for object detection in videos. That is, each text file stores the detection results of the corresponding image, with each line containing an object instance in the image. The format of each line is as follows:
|1||<bbox_left>||The x coordinate of the top-left corner of the predicted bounding box|
|2||<bbox_top>||The y coordinate of the top-left corner of the predicted object bounding box|
|3||<bbox_width>||The width in pixels of the predicted object bounding box|
|4||<bbox_height>||The height in pixels of the predicted object bounding box|
|5||<score>||The score in the DETECTION result file indicates the confidence of the predicted bounding box enclosing an object instance.The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, while 0 indicates the bounding box will be ignored.|
|6||<object_category>||The object category indicates the type of annotated object, (i.e., ignored regions (0), pedestrian (1), people (2), bicycle (3), car (4), van (5), truck (6), tricycle (7), awning-tricycle (8), bus (9), motor (10), others (11))|
|7||<truncation>||The score in the DETECTION result file should be set to the constant -1. The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame (i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1(truncation ratio 1% ∼ 50%)).|
|8||<occlusion>||The score in the DETECTION result file should be set to the constant -1. The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0 (occlusion ratio 0%), partial occlusion = 1(occlusion ratio 1% ∼ 50%), and heavy occlusion = 2 (occlusion ratio 50% ~ 100%)).|
The submission of results on test data have the different format from the ground truth annotations for crowd counting. That is, each text file stores the counting numbers of the corresponding sequence, with each line representing the number of people heads in the frame. The format of each line is as follows:
|1||<frame_index>||The frame index of the video frame arranged from smallest to largest|
|2||<counting_number>||The counting number of people heads in the frame|