skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: OpenSoundscape: An open‐source bioacoustics analysis package for Python
Abstract Landscape‐scale bioacoustic projects have become a popular approach to biodiversity monitoring. Combining passive acoustic monitoring recordings and automated detection provides an effective means of monitoring sound‐producing species' occupancy and phenology and can lend insight into unobserved behaviours and patterns. The availability of low‐cost recording hardware has lowered barriers to large‐scale data collection, but technological barriers in data analysis remain a bottleneck for extracting biological insight from bioacoustic datasets.We provide a robust and open‐source Python toolkit for detecting and localizing biological sounds in acoustic data.OpenSoundscape provides access to automated acoustic detection, classification and localization methods through a simple and easy‐to‐use set of tools. Extensive documentation and tutorials provide step‐by‐step instructions and examples of end‐to‐end analysis of bioacoustic data. Here, we describe the functionality of this package and provide concise examples of bioacoustic analyses with OpenSoundscape.By providing an interface for bioacoustic data and methods, we hope this package will lead to increased adoption of bioacoustics methods and ultimately to enhanced insights for ecology and conservation.  more » « less
Award ID(s):
1935507 2120084
PAR ID:
10441480
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Methods in Ecology and Evolution
Volume:
14
Issue:
9
ISSN:
2041-210X
Format(s):
Medium: X Size: p. 2321-2328
Size(s):
p. 2321-2328
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The interface between field biology and technology is energizing the collection of vast quantities of environmental data. Passive acoustic monitoring, the use of unattended recording devices to capture environmental sound, is an example where technological advances have facilitated an influx of data that routinely exceeds the capacity for analysis. Computational advances, particularly the integration of machine learning approaches, will support data extraction efforts. However, the analysis and interpretation of these data will require parallel growth in conceptual and technical approaches for data analysis. Here, we use a large hand‐annotated dataset to showcase analysis approaches that will become increasingly useful as datasets grow and data extraction can be partially automated.We propose and demonstrate seven technical approaches for analyzing bioacoustic data. These include the following: (1) generating species lists and descriptions of vocal variation, (2) assessing how abiotic factors (e.g., rain and wind) impact vocalization rates, (3) testing for differences in community vocalization activity across sites and habitat types, (4) quantifying the phenology of vocal activity, (5) testing for spatiotemporal correlations in vocalizations within species, (6) among species, and (7) using rarefaction analysis to quantify diversity and optimize bioacoustic sampling.To demonstrate these approaches, we sampled in 2016 and 2018 and used hand annotations of 129,866 bird vocalizations from two forests in New Hampshire, USA, including sites in the Hubbard Brook Experiment Forest where bioacoustic data could be integrated with more than 50 years of observer‐based avian studies. Acoustic monitoring revealed differences in community patterns in vocalization activity between forests of different ages, as well as between nearby similar watersheds. Of numerous environmental variables that were evaluated, background noise was most clearly related to vocalization rates. The songbird community included one cluster of species where vocalization rates declined as ambient noise increased and another cluster where vocalization rates declined over the nesting season. In some common species, the number of vocalizations produced per day was correlated at scales of up to 15 km. Rarefaction analyses showed that adding sampling sites increased species detections more than adding sampling days.Although our analyses used hand‐annotated data, the methods will extend readily to large‐scale automated detection of vocalization events. Such data are likely to become increasingly available as autonomous recording units become more advanced, affordable, and power efficient. Passive acoustic monitoring with human or automated identification at the species level offers growing potential to complement observer‐based studies of avian ecology. 
    more » « less
  2. Abstract The biodiversity crisis necessitates spatially extensive methods to monitor multiple taxonomic groups for evidence of change in response to evolving environmental conditions. Programs that combine passive acoustic monitoring and machine learning are increasingly used to meet this need. These methods require large, annotated datasets, which are time‐consuming and expensive to produce, creating potential barriers to adoption in data‐ and funding‐poor regions. Recently released pre‐trained avian acoustic classification models provide opportunities to reduce the need for manual labelling and accelerate the development of new acoustic classification algorithms through transfer learning. Transfer learning is a strategy for developing algorithms under data scarcity that uses pre‐trained models from related tasks to adapt to new tasks.Our primary objective was to develop a transfer learning strategy using the feature embeddings of a pre‐trained avian classification model to train custom acoustic classification models in data‐scarce contexts. We used three annotated avian acoustic datasets to test whether transfer learning and soundscape simulation‐based data augmentation could substantially reduce the annotated training data necessary to develop performant custom acoustic classifiers. We also conducted a sensitivity analysis for hyperparameter choice and model architecture. We then assessed the generalizability of our strategy to increasingly novel non‐avian classification tasks.With as few as two training examples per class, our soundscape simulation data augmentation approach consistently yielded new classifiers with improved performance relative to the pre‐trained classification model and transfer learning classifiers trained with other augmentation approaches. Performance increases were evident for three avian test datasets, including single‐class and multi‐label contexts. We observed that the relative performance among our data augmentation approaches varied for the avian datasets and nearly converged for one dataset when we included more training examples.We demonstrate an efficient approach to developing new acoustic classifiers leveraging open‐source sound repositories and pre‐trained networks to reduce manual labelling. With very few examples, our soundscape simulation approach to data augmentation yielded classifiers with performance equivalent to those trained with many more examples, showing it is possible to reduce manual labelling while still achieving high‐performance classifiers and, in turn, expanding the potential for passive acoustic monitoring to address rising biodiversity monitoring needs. 
    more » « less
  3. Abstract Monitoring wildlife abundance across space and time is an essential task to study their population dynamics and inform effective management. Acoustic recording units are a promising technology for efficiently monitoring bird populations and communities. While current acoustic data models provide information on the presence/absence of individual species, new approaches are needed to monitor population abundance, ideally across large spatio‐temporal regions.We present an integrated modelling framework that combines high‐quality but temporally sparse bird point count survey data with acoustic recordings. Our models account for imperfect detection in both data types and false positive errors in the acoustic data. Using simulations, we compare the accuracy and precision of abundance estimates using differing amounts of acoustic vocalizations obtained from a clustering algorithm, point count data, and a subset of manually validated acoustic vocalizations. We also use our modelling framework in a case study to estimate abundance of the Eastern Wood‐Pewee (Contopus virens) in Vermont, USA.The simulation study reveals that combining acoustic and point count data via an integrated model improves accuracy and precision of abundance estimates compared with models informed by either acoustic or point count data alone. Improved estimates are obtained across a wide range of scenarios, with the largest gains occurring when detection probability for the point count data is low. Combining acoustic data with only a small number of point count surveys yields estimates of abundance without the need for validating any of the identified vocalizations from the acoustic data. Within our case study, the integrated models provided moderate support for a decline of the Eastern Wood‐Pewee in this region.Our integrated modelling approach combines dense acoustic data with few point count surveys to deliver reliable estimates of species abundance without the need for manual identification of acoustic vocalizations or a prohibitively expensive large number of repeated point count surveys. Our proposed approach offers an efficient monitoring alternative for large spatio‐temporal regions when point count data are difficult to obtain or when monitoring is focused on rare species with low detection probability. 
    more » « less
  4. Abstract Occupancy modelling is a common approach to assess species distribution patterns, while explicitly accounting for false absences in detection–nondetection data. Numerous extensions of the basic single‐species occupancy model exist to model multiple species, spatial autocorrelation and to integrate multiple data types. However, development of specialized and computationally efficient software to incorporate such extensions, especially for large datasets, is scarce or absent.We introduce thespOccupancy Rpackage designed to fit single‐species and multi‐species spatially explicit occupancy models. We fit all models within a Bayesian framework using Pólya‐Gamma data augmentation, which results in fast and efficient inference.spOccupancyprovides functionality for data integration of multiple single‐species detection–nondetection datasets via a joint likelihood framework. The package leverages Nearest Neighbour Gaussian Processes to account for spatial autocorrelation, which enables spatially explicit occupancy modelling for potentially massive datasets (e.g. 1,000s–100,000s of sites).spOccupancyprovides user‐friendly functions for data simulation, model fitting, model validation (by posterior predictive checks), model comparison (using information criteria and k‐fold cross‐validation) and out‐of‐sample prediction. We illustrate the package's functionality via a vignette, simulated data analysis and two bird case studies.ThespOccupancypackage provides a user‐friendly platform to fit a variety of single and multi‐species occupancy models, making it straightforward to address detection biases and spatial autocorrelation in species distribution models even for large datasets. 
    more » « less
  5. Abstract Changes in land use and climate change threaten global biodiversity and ecosystems, calling for the urgent development of effective conservation strategies. Recognizing landscape heterogeneity, which refers to the variation in natural features within an area, is crucial for these strategies. While remote sensing images quantify landscape heterogeneity, they might fail to detect ecological patterns in moderately disturbed areas, particularly at minor spatial scales. This is partly because satellite imagery may not effectively capture undergrowth conditions due to its resolution constraints. In contrast, soundscape analysis, which studies environmental acoustic signals, emerges as a novel tool for understanding ecological patterns, providing reliable information on habitat conditions and landscape heterogeneity in complex environments across diverse scales and serving as a complement to remote sensing methods.We propose an unsupervised approach using passive acoustic monitoring data and network inference methods to analyse acoustic heterogeneity patterns based on biophony composition. This method uses sonotypes, unique acoustic entities characterized by their specific time‐frequency spaces, to establish the acoustic structure of a site through sonotype occurrences, focusing on general biophony rather than specific species and providing information on the acoustic footprint of a site. From a sonotype composition matrix, we use the Graphical Lasso method, a sparse Gaussian graphical model, to identify acoustic similarities across sites, map ecological complexity relationships through the nodes (sites) and edges (similarities), and transform acoustic data into a graphical representation of ecological interactions and landscape acoustic diversity.We implemented the proposed method across 17 sites within an oil palm plantation in Santander, Colombia. The resulting inferred graphs visualize the acoustic similarities among sites, reflecting the biophony achieved by characterizing the landscape through its acoustic structures. Correlating our findings with ecological metrics like the Bray–Curtis dissimilarity index and satellite imagery indices reveals significant insights into landscape heterogeneity.This unsupervised approach offers a new perspective on understanding ecological and biological interactions and advances soundscape analysis. The soundscape decomposition into sonotypes underscores the method's advantage, offering the possibility to associate sonotypes with species and identify their contribution to the similarity proposed by the graph. 
    more » « less