Abstract Acoustic recordings of soundscapes are an important category of audio data that can be useful for answering a variety of questions, and an entire discipline within ecology, dubbed “soundscape ecology,” has risen to study them. Bird sound is often the focus of studies of soundscapes due to the ubiquitousness of birds in most terrestrial environments and their high vocal activity. Autonomous acoustic recorders have increased the quantity and availability of recordings of natural soundscapes while mitigating the impact of human observers on community behavior. However, such recordings are of little use without analysis of the sounds they contain. Manual analysis currently stands as the best means of processing this form of data for use in certain applications within soundscape ecology, but it is a laborious task, sometimes requiring many hours of human review to process comparatively few hours of recording. For this reason, few annotated data sets of soundscape recordings are publicly available. Further still, there are no publicly available strongly labeled soundscape recordings of bird sounds that contain information on timing, frequency, and species. Therefore, we present the first data set of strongly labeled bird sound soundscape recordings under free use license. These data were collected in the Northeastern United States at Powdermill Nature Reserve, Rector, Pennsylvania, USA. Recordings encompass 385 minutes of dawn chorus recordings collected by autonomous acoustic recorders between the months of April through July 2018. Recordings were collected in continuous bouts on four days during the study period and contain 48 species and 16,052 annotations. Applications of this data set may be numerous and include the training, validation, and testing of certain advanced machine‐learning models that detect or classify bird sounds. There are no copyright or propriety restrictions; please cite this paper when using materials within.
more »
« less
A Deep Learning Approach to the Automated Segmentation of Bird Vocalizations from Weakly Labeled Crowd-sourced Audio
Ecologists interested in monitoring the effects caused by climate change are increasingly turning to passive acoustic monitoring, the practice of placing autonomous audio recording units in ecosystems to monitor species richness and occupancy via species calls. However, identifying species calls in large datasets by hand is an expensive task, leading to a reliance on machine learning models. Due to a lack of annotated datasets of soundscape recordings, these models are often trained on large databases of community created focal recordings. A challenge of training on such data is that clips are given a "weak label," a single label that represents the whole clip. This includes segments that only have background noise but are labeled as calls in the training data, reducing model performance. Heuristic methods exist to convert clip-level labels to "strong" call-specific labels, where the label tightly bounds the temporal length of the call and better identifies bird vocalizations. Our work improves on the current weakly to strongly labeled method used on the training data for BirdNET, the current most popular model for audio species classification. We utilize an existing RNN-CNN hybrid, resulting in a precision improvement of 12% (going to 90% precision) against our new strongly hand-labeled dataset of Peruvian bird species.Jacob Ayers (Engineers for Exploration at UCSD); Sean Perry (University of California San Diego); Samantha Prestrelski (UC San Diego); Tianqi Zhang (Engineers for Exploration); Ludwig von Schoenfeldt (University of California San Diego); Mugen Blue (UC Merced); Gabriel Steinberg (Demining Research Community); Mathias Tobler (San Diego Zoo Wildlife Alliance); Ian Ingram (San Diego Zoo Wildlife Alliance); Curt Schurgers (UC San Diego); Ryan Kastner (University of California San Diego)
more »
« less
- Award ID(s):
- 2244123
- PAR ID:
- 10578873
- Publisher / Repository:
- NeurIPS 2024 Workshop
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Multi-label classification (MLC), which assigns multiple labels to each instance, is crucial to domains from computer vision to text mining. Conventional methods for MLC require huge amounts of labeled data to capture complex dependencies between labels. However, such labeled datasets are expensive, or even impossible, to acquire. Worse yet, these pre-trained MLC models can only be used for the particular label set covered in the training data. Despite this severe limitation, few methods exist for expanding the set of labels predicted by pre-trained models. Instead, we acquire vast amounts of new labeled data and retrain a new model from scratch. Here, we propose combining the knowledge from multiple pre-trained models (teachers) to train a new student model that covers the union of the labels predicted by this set of teachers. This student supports a broader label set than any one of its teachers without using labeled data. We call this new problem knowledge amalgamation for multi-label classification. Our new method, Adaptive KNowledge Transfer (ANT), trains a student by learning from each teacher’s partial knowledge of label dependencies to infer the global dependencies between all labels across the teachers. We show that ANT succeeds in unifying label dependencies among teachers, outperforming five state-of-the-art methods on eight real-world datasets.more » « less
-
In this poster, we review the adoption of the Early Research Scholars Program (ERSP), developed at the University of California San Diego, to our institution, the University of Illinois at Chicago (UIC). The program was designed to support retention of students from marginalized backgrounds in the field of computing especially during the second year of their major.more » « less
-
The millipede genus Apterourus Loomis, 1966, the only genus of the family Apterouridae Loomis, 1966 (Diplopoda: Chordeumatida: Striarioidea), contains two species and is rarely collected. We add a third species from Mt. Palomar, San Diego County, California, USA, Apterourus palomar Shear, Richart and Marek, new species.more » « less
-
This National Science Foundation (NSF) project focuses on creating an immersive international summer research experience for students enrolled in a primarily undergraduate institution (PUI). Over the course of a three-year grant period, this research seeks to: (1) train and mentor 18 diverse undergraduate students from PUIs in Southern California in bioinformatics research in a collaborative and international setting; (2) disseminate the research outcomes at conferences and in peer-reviewed journals; (3) encourage and prepare undergraduate students from PUIs for enrollment in graduate programs in bioinformatics, bioengineering, or related fields; (4) foster existing collaborations and develop new research collaborations between the PI at the University of San Diego (USD) and scientists at the Science for Life Laboratory (SciLifeLab) in Sweden; and (5) develop a diverse cohort of globally engaged scientists/engineers that seek career opportunities and collaborators throughout the world. This paper reports on the first year of the grant.more » « less
An official website of the United States government

