BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos

Duporge, Isla; Kholiavchenko, Maksim; Harel, Roi; Wolf, Scott; Rubenstein, Daniel I; Crofoot, Margaret C; Berger-Wolf, Tanya; Lee, Stephen J; Barreau, Julie; Kline, Jenna; Ramirez, Michelle; Stewart, Charles V

doi:10.1007/s11263-025-02493-5

Using unmanned aerial vehicles (UAVs) to track multiple individuals simultaneously in their natural environment is a powerful approach for better understanding the collective behavior of primates. Previous studies have demonstrated the feasibility of automating primate behavior classification from video data, but these studies have been carried out in captivity or from ground-based cameras. However, to understand group behavior and the self-organization of a collective, the whole troop needs to be seen at a scale where behavior can be seen in relation to the natural environment in which ecological decisions are made. To tackle this challenge, this study presents a novel dataset for baboon detection, tracking, and behavior recognition from drone videos where troops are observed on-the-move in their natural environment as they move to and from their sleeping sites. Videos were captured from drones at Mpala Research Centre, a research station located in Laikipia County, in central Kenya. The baboon detection dataset was created by manually annotating all baboons in drone videos with bounding boxes. A tiling method was subsequently applied to create a pyramid of images at various scales from the original 5.3K resolution images, resulting in approximately 30K images used for baboon detection. The baboon tracking dataset is derived from the baboon detection dataset, where bounding boxes are consistently assigned the same ID throughout the video. This process resulted in half an hour of dense tracking data. The baboon behavior recognition dataset was generated by converting tracks into mini-scenes, a video subregion centered on each animal. These mini-scenes were annotated with 12 distinct behavior types and one additional category for occlusion, resulting in over 20 hours of data. Benchmark results show mean average precision (mAP) of 92.62% for the YOLOv8-X detection model, multiple object tracking precision (MOTP) of 87.22% for the DeepSORT tracking algorithm, and micro top-1 accuracy of 64.89% for the X3D behavior recognition model. Using deep learning to rapidly and accurately classify wildlife behavior from drone footage facilitates non-invasive data collection on behavior enabling the behavior of a whole group to be systematically and accurately recorded. The dataset can be accessed at https://baboonland.xyz.

More Like this