The interface between field biology and technology is energizing the collection of vast quantities of environmental data. Passive acoustic monitoring, the use of unattended recording devices to capture environmental sound, is an example where technological advances have facilitated an influx of data that routinely exceeds the capacity for analysis. Computational advances, particularly the integration of machine learning approaches, will support data extraction efforts. However, the analysis and interpretation of these data will require parallel growth in conceptual and technical approaches for data analysis. Here, we use a large hand‐annotated dataset to showcase analysis approaches that will become increasingly useful as datasets grow and data extraction can be partially automated. We propose and demonstrate seven technical approaches for analyzing bioacoustic data. These include the following: (1) generating species lists and descriptions of vocal variation, (2) assessing how abiotic factors (e.g., rain and wind) impact vocalization rates, (3) testing for differences in community vocalization activity across sites and habitat types, (4) quantifying the phenology of vocal activity, (5) testing for spatiotemporal correlations in vocalizations within species, (6) among species, and (7) using rarefaction analysis to quantify diversity and optimize bioacoustic sampling. To demonstrate these approaches, we sampled in 2016 and 2018 and used hand annotations of 129,866 bird vocalizations from two forests in New Hampshire, USA, including sites in the Hubbard Brook Experiment Forest where bioacoustic data could be integrated with more than 50 years of observer‐based avian studies. Acoustic monitoring revealed differences in community patterns in vocalization activity between forests of different ages, as well as between nearby similar watersheds. Of numerous environmental variables that were evaluated, background noise was most clearly related to vocalization rates. The songbird community included one cluster of species where vocalization rates declined as ambient noise increased and another cluster where vocalization rates declined over the nesting season. In some common species, the number of vocalizations produced per day was correlated at scales of up to 15 km. Rarefaction analyses showed that adding sampling sites increased species detections more than adding sampling days. Although our analyses used hand‐annotated data, the methods will extend readily to large‐scale automated detection of vocalization events. Such data are likely to become increasingly available as autonomous recording units become more advanced, affordable, and power efficient. Passive acoustic monitoring with human or automated identification at the species level offers growing potential to complement observer‐based studies of avian ecology.
Monitoring wildlife abundance across space and time is an essential task to study their population dynamics and inform effective management. Acoustic recording units are a promising technology for efficiently monitoring bird populations and communities. While current acoustic data models provide information on the presence/absence of individual species, new approaches are needed to monitor population abundance, ideally across large spatio‐temporal regions. We present an integrated modelling framework that combines high‐quality but temporally sparse bird point count survey data with acoustic recordings. Our models account for imperfect detection in both data types and false positive errors in the acoustic data. Using simulations, we compare the accuracy and precision of abundance estimates using differing amounts of acoustic vocalizations obtained from a clustering algorithm, point count data, and a subset of manually validated acoustic vocalizations. We also use our modelling framework in a case study to estimate abundance of the Eastern Wood‐Pewee ( The simulation study reveals that combining acoustic and point count data via an integrated model improves accuracy and precision of abundance estimates compared with models informed by either acoustic or point count data alone. Improved estimates are obtained across a wide range of scenarios, with the largest gains occurring when detection probability for the point count data is low. Combining acoustic data with only a small number of point count surveys yields estimates of abundance without the need for validating any of the identified vocalizations from the acoustic data. Within our case study, the integrated models provided moderate support for a decline of the Eastern Wood‐Pewee in this region. Our integrated modelling approach combines dense acoustic data with few point count surveys to deliver reliable estimates of species abundance without the need for manual identification of acoustic vocalizations or a prohibitively expensive large number of repeated point count surveys. Our proposed approach offers an efficient monitoring alternative for large spatio‐temporal regions when point count data are difficult to obtain or when monitoring is focused on rare species with low detection probability.
- PAR ID:
- 10450777
- Publisher / Repository:
- Wiley-Blackwell
- Date Published:
- Journal Name:
- Methods in Ecology and Evolution
- Volume:
- 12
- Issue:
- 6
- ISSN:
- 2041-210X
- Page Range / eLocation ID:
- p. 1040-1049
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Abstract Environmental and anthropogenic factors affect the population dynamics of migratory species throughout their annual cycles. However, identifying the spatiotemporal drivers of migratory species' abundances is difficult because of extensive gaps in monitoring data. The collection of unstructured opportunistic data by volunteer (citizen science) networks provides a solution to address data gaps for locations and time periods during which structured, design‐based data are difficult or impossible to collect.
To estimate population abundance and distribution at broad spatiotemporal extents, we developed an integrated model that incorporates unstructured data during time periods and spatial locations when structured data are unavailable. We validated our approach through simulations and then applied the framework to the eastern North American migratory population of monarch butterflies during their spring breeding period in eastern Texas. Spring climate conditions have been identified as a key driver of monarch population sizes during subsequent summer and winter periods. However, low monarch densities during the spring combined with very few design‐based surveys in the region have limited the ability to isolate effects of spring weather variables on monarchs.
Simulation results confirmed the ability of our integrated model to accurately and precisely estimate abundance indices and the effects of covariates during locations and time periods in which structured sampling are lacking. In our case study, we combined opportunistic monarch observations during the spring migration and breeding period with structured data from the summer Midwestern breeding grounds. Our model revealed a nonstationary relationship between weather conditions and local monarch abundance during the spring, driven by spatially varying vegetation and temperature conditions.
Data for widespread and migratory species are often fragmented across multiple monitoring programs, potentially requiring the use of both structured and unstructured data sources to obtain complete geographic coverage. Our integrated model can estimate population abundance at broad spatiotemporal extents despite structured data gaps during the annual cycle by leveraging opportunistic data.
-
Temporally adaptive acoustic sampling to maximize detection across a suite of focal wildlife species
Abstract Acoustic recordings of the environment can produce species presence–absence data for characterizing populations of sound‐producing wildlife over multiple spatial scales. If a species is present at a site but does not vocalize during a scheduled audio recording survey, researchers may incorrectly conclude that the species is absent (“false negative”). The risk of false negatives is compounded when audio devices have sampling constraints, do not record continuously, and must be manually scheduled to operate at pre‐selected times of day, particularly when research programs target multiple species with acoustic availability that varies across temporal conditions.
We developed a temporally adaptive acoustic sampling algorithm to maximize detection probabilities for a suite of focal species amid sampling constraints. The algorithm combines user‐supplied species vocalization models with site‐specific weather forecasts to set an optimized sampling schedule for the following day. To test our algorithm, we simulated hourly vocalization probabilities for a suite of focal species in a hypothetical monitoring area for the year 2016. We conducted a factorial experiment that sampled from the 2016 acoustic environment to compare the probability of acoustic detection by a fixed (stationary) schedule versus a temporally adaptive optimized schedule under several sampling efforts and monitoring durations.
We found that over the course of a study season, the probability of acoustically capturing a focal species (given presence) at least once via automated acoustic monitoring was greater (and acoustic capture occurred earlier in the season) when using the temporally adaptive optimized schedule as compared to a fixed schedule.
The advantages of a temporally adaptive optimized acoustic sampling schedule are magnified when a study duration is short, sampling effort is low, and/or species acoustic availability is minimal. This methodology presents the opportunity to maximize acoustic monitoring sampling efforts amid constraints.
-
Abstract Data deficiencies among rare or cryptic species preclude assessment of community‐level processes using many existing approaches, limiting our understanding of the trends and stressors for large numbers of species. Yet evaluating the dynamics of whole communities, not just common or charismatic species, is critical to understanding and the responses of biodiversity to ongoing environmental pressures.
A recent surge in both public science and government‐funded data collection efforts has led to a wealth of biodiversity data. However, these data collection programmes use a wide range of sampling protocols (from unstructured, opportunistic observations of wildlife to well‐structured, design‐based programmes) and record information at a variety of spatiotemporal scales. As a result, available biodiversity data vary substantially in quantity and information content, which must be carefully reconciled for meaningful ecological analysis.
Hierarchical modelling, including single‐species integrated models and hierarchical community models, has improved our ability to assess and predict biodiversity trends and processes. Here, we highlight the emerging ‘integrated community modelling’ framework that combines both data integration and community modelling to improve inferences on species‐ and community‐level dynamics.
We illustrate the framework with a series of worked examples. Our three case studies demonstrate how integrated community models can be used to extend the geographic scope when evaluating species distributions and community‐level richness patterns; discern population and community trends over time; and estimate demographic rates and population growth for communities of sympatric species. We implemented these worked examples using multiple software methods through the R platform via packages with formula‐based interfaces and through development of custom code in JAGS, NIMBLE and Stan.
Integrated community models provide an exciting approach to model biological and observational processes for multiple species using multiple data types and sources simultaneously, thus accounting for uncertainty and sampling error within a unified framework. By leveraging the combined benefits of both data integration and community modelling, integrated community models can produce valuable information about both common and rare species as well as community‐level dynamics, allowing for holistic evaluation of the effects of global change on biodiversity.
-
spOccupancy: An R package for single‐species, multi‐species, and integrated spatial occupancy models
Abstract Occupancy modelling is a common approach to assess species distribution patterns, while explicitly accounting for false absences in detection–nondetection data. Numerous extensions of the basic single‐species occupancy model exist to model multiple species, spatial autocorrelation and to integrate multiple data types. However, development of specialized and computationally efficient software to incorporate such extensions, especially for large datasets, is scarce or absent.
We introduce the
spOccupancy R package designed to fit single‐species and multi‐species spatially explicit occupancy models. We fit all models within a Bayesian framework using Pólya‐Gamma data augmentation, which results in fast and efficient inference.spOccupancy provides functionality for data integration of multiple single‐species detection–nondetection datasets via a joint likelihood framework. The package leverages Nearest Neighbour Gaussian Processes to account for spatial autocorrelation, which enables spatially explicit occupancy modelling for potentially massive datasets (e.g. 1,000s–100,000s of sites).spOccupancy provides user‐friendly functions for data simulation, model fitting, model validation (by posterior predictive checks), model comparison (using information criteria and k‐fold cross‐validation) and out‐of‐sample prediction. We illustrate the package's functionality via a vignette, simulated data analysis and two bird case studies.The
spOccupancy package provides a user‐friendly platform to fit a variety of single and multi‐species occupancy models, making it straightforward to address detection biases and spatial autocorrelation in species distribution models even for large datasets.