Smart device (drones and robots), equipped with embedded sensing devices have been fast deployed to a wide range of applications, including agricultural, aerial photography, fast delivery, motion obstacle avoidance, and industrial automation. Consequently, automatic understanding of visual data collected from these platforms become highly demanding, which brings computer vision to smart device more and more closely. We are excited to present a large-scale benchmark with carefully annotated ground-truth for various important computer vision tasks, named AGEP 2022 .

The VisDrone2022 dataset is collected by the AISKYEYE team at Lab of Machine Learning and Data Mining , Tianjin University, China. The benchmark dataset consists of 400 video clips formed by 265,228 frames and 10,209 static images, captured by various drone-mounted cameras, covering a wide range of aspects including location (taken from 14 different cities separated by thousands of kilometers in China), environment (urban and country), objects (pedestrian, vehicles, bicycles, etc.), and density (sparse and crowded scenes). Note that, the dataset was collected using various drone platforms (i.e., drones with different models), in different scenarios, and under various weather and lighting conditions. These frames are manually annotated with more than 2.6 million bounding boxes or points of targets of frequent interests, such as pedestrians, cars, bicycles, and tricycles. Some important attributes including scene visibility, object class and occlusion, are also provided for better data utilization.

This FusionPortable-VSLAM Challenge 2022 is based on the FusionPortable dataset, which has been collected by covering a variety of environments on The Hong Kong University of Science and Technology campus by exploiting multiple platforms for data collection. It provides a large range of difficult problems for SLAM. All these sequences are characterized by structure-less areas and varying illumination conditions that are typical in real-world scenarios and pose great challenges to SLAM algorithms that have been developed in confined lab environments. Accurate ground truth, at centimeter-level, is provided for each sequence. The sensor platform used to record the data includes 10Hz LiDAR point clouds, 20Hz stereo frame images, high-rate and asynchronous events from stereo event cameras, 200Hz acceleration and angular velocity readings from an IMU, and 10Hz GPS signal outdoors.