Adélie penguins (Pygoscelis adeliae) are bioindicators for the rapidly changing Antarctic environment, making understanding their population dynamics and behavior of high research priority. However, collecting detailed population data throughout the breeding season on many colonies is difficult due to Antarctica’s harsh conditions and remote location. The colonial breeding ecology of Adélie penguins has led to the evolution of a highly vocal species with individualized calls, making them well-suited for passive acoustic monitoring (PAM) with autonomous recording. PAM units can potentially provide an easily deployable and scalable way to collect fine-scale data on population estimates and breeding phenology. Here I present a framework for using acoustic indices to monitor phenology of dense penguin colonies even under high wind conditions. I evaluate the relationship between acoustic indices such as RMS amplitude and penguin colony size between distinct breeding stages (incubation, guard, crèche, and fledge) on Torgersen and Humble Islands in the West Antarctic Peninsula with an automated pipeline implemented in R. Using PAM to interpret penguin vocalizations for population size and breeding phenology estimates could lead to the development of a real-time remote monitoring system over a large spatial footprint, revealing Adélie penguin responses to climate change.
more »
« less
This content will become publicly available on June 1, 2026
Designing a BirdNET classifier for high wind detection in passive acoustic recordings to support wildlife monitoring
Passive acoustic monitoring (PAM) is a powerful tool for ecological research, but recordings can be compromised by background noise such as wind. Addressing wind noise (e.g., clipping and masking) in bioacoustic data remains a challenge, especially as climate change is predicted to increase wind speeds, particularly near the poles. Adélie penguins (Pygoscelis adeliae), key indicators of the Antarctic ecosystem, are well-suited for PAM, where large-scale monitoring could assess climate-driven population changes—if wind noise is managed effectively. In this study, the convolutional neural network, BirdNET, inversely identifies unwanted sounds in Adélie penguin colony recordings. Multiple custom models were developed in which the background nontarget noise was Adélie vocalizations, and wind conditions (low, medium, and high) were the target classes. The best-performing model achieved an F-score of 0.43 and accuracy of 0.53. The high wind class within this model had a precision of 0.76 and recall of 0.94. A six-step workflow is presented for creating custom BirdNET models, evaluating their performance and determining an optimal confidence threshold prior to model application on an entire dataset. By automating unwanted sound detection, this approach enables researchers to efficiently identify and remove affected files, streamline data cleaning, and focus on recordings of interest for further analysis.
more »
« less
- PAR ID:
- 10611331
- Publisher / Repository:
- Acoustical Society of America
- Date Published:
- Journal Name:
- The Journal of the Acoustical Society of America
- Volume:
- 157
- Issue:
- 6
- ISSN:
- 1520-8524
- Page Range / eLocation ID:
- 4502 to 4512
- Subject(s) / Keyword(s):
- Acoustics, sea birds
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract ContextThe interaction between topography and wind influences snow cover patterns, which can determine the distribution of species reliant on snow-free habitats. Past studies suggest snow accumulation creates suboptimal breeding habitats for Adélie penguins, leading to colony extinctions. However, evidence linking snow cover to landscape features is lacking. ObjectivesWe aimed to model landscape-driven snow cover patterns, identify long-term weather changes, and determine the impact of geomorphology and snow conditions on penguin colony persistence. MethodsWe combined remotely sensed imagery, digital surface models, and > 30 years of weather data with penguin population monitoring from 1975 to 2022 near Palmer Station, west Antarctic Peninsula. Using a multi-model approach, we identified landscape factors driving snow distribution on two islands. Historic and current penguin sub-colony perimeters were used to understand habitat selection, optimal habitat features, and factors associated with extinctions. ResultsDecadal and long-term trends in wind and snow conditions were detected. Snow accumulated on lower elevations and south-facing slopes driven by the north-northeasterly winds while Adélie penguins occupied higher elevations and more north-facing slopes. On Torgersen Island, sub-colonies on south aspects have gone extinct, and only five of the 23 historic sub-colonies remain active, containing 7% of the 1975 population. Adélie penguins will likely be extinct on this island in < 25 years. ConclusionsAdélie penguin populations are in decline throughout the west Antarctic Peninsula with multiple climate and human impacts likely driving Adélie penguins towards extinction in this region. We demonstrate precipitation has detrimental effects on penguins, an often overlooked yet crucial factor for bird studies.more » « less
-
Ecologists interested in monitoring the effects caused by climate change are increasingly turning to passive acoustic monitoring, the practice of placing autonomous audio recording units in ecosystems to monitor species richness and occupancy via species calls. However, identifying species calls in large datasets by hand is an expensive task, leading to a reliance on machine learning models. Due to a lack of annotated datasets of soundscape recordings, these models are often trained on large databases of community created focal recordings. A challenge of training on such data is that clips are given a "weak label," a single label that represents the whole clip. This includes segments that only have background noise but are labeled as calls in the training data, reducing model performance. Heuristic methods exist to convert clip-level labels to "strong" call-specific labels, where the label tightly bounds the temporal length of the call and better identifies bird vocalizations. Our work improves on the current weakly to strongly labeled method used on the training data for BirdNET, the current most popular model for audio species classification. We utilize an existing RNN-CNN hybrid, resulting in a precision improvement of 12% (going to 90% precision) against our new strongly hand-labeled dataset of Peruvian bird species.Jacob Ayers (Engineers for Exploration at UCSD); Sean Perry (University of California San Diego); Samantha Prestrelski (UC San Diego); Tianqi Zhang (Engineers for Exploration); Ludwig von Schoenfeldt (University of California San Diego); Mugen Blue (UC Merced); Gabriel Steinberg (Demining Research Community); Mathias Tobler (San Diego Zoo Wildlife Alliance); Ian Ingram (San Diego Zoo Wildlife Alliance); Curt Schurgers (UC San Diego); Ryan Kastner (University of California San Diego)more » « less
-
We introduce VoiceCraft, a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on audiobooks, internet videos, and podcasts. VoiceCraft employs a Transformer decoder architecture and introduces a token rearrangement procedure that combines causal masking and delayed stacking to enable generation within an existing sequence. On speech editing tasks, VoiceCraft produces edited speech that is nearly indistinguishable from unedited recordings in terms of naturalness, as evaluated by humans; for zero-shot TTS, our model outperforms prior SotA models including VALLE and the popular commercial model XTTS-v2. Crucially, the models are evaluated on challenging and realistic datasets, that consist of diverse accents, speaking styles, recording conditions, and background noise and music, and our model performs consistently well compared to other models and real recordings. In particular, for speech editing evaluation, we introduce a high quality, challenging, and realistic dataset named RealEdit.more » « less
-
Sequential memory, the ability to form and accurately recall a sequence of events or stimuli in the correct order, is a fundamental prerequisite for biological and artificial intelligence as it underpins numerous cognitive functions (e.g., language comprehension, planning, episodic memory formation, etc.) However, existing methods of sequential memory suffer from catastrophic forgetting, limited capacity, slow iterative learning procedures, low-order Markov memory, and, most importantly, the inability to represent and generate multiple valid future possibilities stemming from the same context. Inspired by biologically plausible neuroscience theories of cognition, we propose Predictive Attractor Models (PAM), a novel sequence memory architecture with desirable generative properties. PAM is a streaming model that learns a sequence in an online, continuous manner by observing each input only once. Additionally, we find that PAM avoids catastrophic forgetting by uniquely representing past context through lateral inhibition in cortical minicolumns, which prevents new memories from overwriting previously learned knowledge. PAM generates future predictions by sampling from a union set of predicted possibilities; this generative ability is realized through an attractor model trained alongside the predictor. We show that PAM is trained with local computations through Hebbian plasticity rules in a biologically plausible framework. Other desirable traits (e.g., noise tolerance, CPU-based learning, capacity scaling) are discussed throughout the paper. Our findings suggest that PAM represents a significant step forward in the pursuit of biologically plausible and computationally efficient sequential memory models, with broad implications for cognitive science and artificial intelligence research. Illustration videos and code are available on our project page: https://ramymounir.com/publications/pam.more » « less
An official website of the United States government
