skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Scaper: A library for soundscape synthesis and augmentation
Sound event detection (SED) in environmental recordings is a key topic of research in machine listening, with applications in noise monitoring for smart cities, self-driving cars, surveillance, bioa-coustic monitoring, and indexing of large multimedia collections. Developing new solutions for SED often relies on the availability of strongly labeled audio recordings, where the annotation includes the onset, offset and source of every event. Generating such precise annotations manually is very time consuming, and as a result existing datasets for SED with strong labels are scarce and limited in size. To address this issue, we present Scaper, an open-source library for soundscape synthesis and augmentation. Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, 'specification'. To increase the variability of the output, Scaper supports the application of audio transformations such as pitch shifting and time stretching individually to every event. To illustrate the potential of the library, we generate a dataset of 10,000 sound-scapes and use it to compare the performance of two state-of-The-Art algorithms, including a breakdown by soundscape characteristics. We also describe how Scaper was used to generate audio stimuli for an audio labeling crowdsourcing experiment, and conclude with a discussion of Scaper's limitations and potential applications.  more » « less
Award ID(s):
1633259
PAR ID:
10074705
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA-17)
Page Range / eLocation ID:
344-348
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Acoustic recordings of soundscapes are an important category of audio data that can be useful for answering a variety of questions, and an entire discipline within ecology, dubbed “soundscape ecology,” has risen to study them. Bird sound is often the focus of studies of soundscapes due to the ubiquitousness of birds in most terrestrial environments and their high vocal activity. Autonomous acoustic recorders have increased the quantity and availability of recordings of natural soundscapes while mitigating the impact of human observers on community behavior. However, such recordings are of little use without analysis of the sounds they contain. Manual analysis currently stands as the best means of processing this form of data for use in certain applications within soundscape ecology, but it is a laborious task, sometimes requiring many hours of human review to process comparatively few hours of recording. For this reason, few annotated data sets of soundscape recordings are publicly available. Further still, there are no publicly available strongly labeled soundscape recordings of bird sounds that contain information on timing, frequency, and species. Therefore, we present the first data set of strongly labeled bird sound soundscape recordings under free use license. These data were collected in the Northeastern United States at Powdermill Nature Reserve, Rector, Pennsylvania, USA. Recordings encompass 385 minutes of dawn chorus recordings collected by autonomous acoustic recorders between the months of April through July 2018. Recordings were collected in continuous bouts on four days during the study period and contain 48 species and 16,052 annotations. Applications of this data set may be numerous and include the training, validation, and testing of certain advanced machine‐learning models that detect or classify bird sounds. There are no copyright or propriety restrictions; please cite this paper when using materials within. 
    more » « less
  2. Aquatic environments encompass the world’s most extensive habitats, rich with sounds produced by a diversity of animals. Passive acoustic monitoring (PAM) is an increasingly accessible remote sensing technology that uses hydrophones to listen to the underwater world and represents an unprecedented, non-invasive method to monitor underwater environments. This information can assist in the delineation of biologically important areas via detection of sound-producing species or characterization of ecosystem type and condition, inferred from the acoustic properties of the local soundscape. At a time when worldwide biodiversity is in significant decline and underwater soundscapes are being altered as a result of anthropogenic impacts, there is a need to document, quantify, and understand biotic sound sources–potentially before they disappear. A significant step toward these goals is the development of a web-based, open-access platform that provides: (1) a reference library of known and unknown biological sound sources (by integrating and expanding existing libraries around the world); (2) a data repository portal for annotated and unannotated audio recordings of single sources and of soundscapes; (3) a training platform for artificial intelligence algorithms for signal detection and classification; and (4) a citizen science-based application for public users. Although individually, these resources are often met on regional and taxa-specific scales, many are not sustained and, collectively, an enduring global database with an integrated platform has not been realized. We discuss the benefits such a program can provide, previous calls for global data-sharing and reference libraries, and the challenges that need to be overcome to bring together bio- and ecoacousticians, bioinformaticians, propagation experts, web engineers, and signal processing specialists (e.g., artificial intelligence) with the necessary support and funding to build a sustainable and scalable platform that could address the needs of all contributors and stakeholders into the future. 
    more » « less
  3. Musicians and nonmusicians alike use rhythmic sound gestures, such as tapping and beatboxing, to express drum patterns. While these gestures effectively communicate musical ideas, realizing these ideas as fully-produced drum recordings can be time-consuming, potentially disrupting many creative workflows. To bridge this gap, we present TRIA (The Rhythm In Anything), a masked transformer model for mapping rhythmic sound gestures to high-fidelity drum recordings. Given an audio prompt of the desired rhythmic pattern and a second prompt to represent drum kit timbre, TRIA produces audio of a drum kit playing the desired rhythm (with appropriate elaborations) in the desired timbre. Subjective and objective evaluations show that a TRIA model trained on less than 10 hours of publicly-available drum data can generate high-quality, faithful realizations of sound gestures across a wide range of timbres in a zero-shot manner. 
    more » « less
  4. Decadal variations of ocean soundscapes are intricately linked to large-scale climatic and economic fluctuations. This study draws on over 15 years of acoustic recordings at six sites within the Southern California Bight, investigating interannual, seasonal, and diel variations. By examining acoustic energy from fin and blue whales along with sounds from ships and wind, we identified changes in soundscape over time and space. This study reveals that sound levels associated with both biological and non-biological sound sources varied seasonally and correlated with large-scale climatic patterns and long-term oceanographic fluctuations. Baleen whale sound levels before, during, and after a marine heatwave were assessed; sound levels decreased in southern sites and increased in northern sites adjacent to the California Current, underscoring the potential for range shifts and habitat compression during warm years for these species. Ship-generated sound levels at high-traffic sites reflected economic events such as recessions, labor shortages and negotiations, and changes to port activities. Marine soundscapes offer an approach to assess the ocean's condition amid ongoing climatic and economic fluctuations. 
    more » « less
  5. null (Ed.)
    Smartphones and mobile applications have become an integral part of our daily lives. This is reflected by the increase in mobile devices, applications, and revenue generated each year. However, this growth is being met with an increasing concern for user privacy, and there have been many incidents of privacy and data breaches related to smartphones and mobile applications in recent years. In this work, we focus on improving privacy for audio-based mobile systems. These applications will generally listen to all sounds in the environment and may record privacy-sensitive signals, such as speech, that may not be needed for the application. We present PAMS, a software development package for mobile applications. PAMS integrates a novel sound source filtering algorithm called Probabilistic Template Matching to generate a set of privacy-enhancing filters that remove extraneous sounds using learned statistical "templates" of these sounds. We demonstrate the effectiveness of PAMS by integrating it into a sleep monitoring system, with the intent to remove extraneous speech from breathing, snoring, and other sleep sounds that the system is monitoring. By comparing our PAMS enhanced sleep monitoring system with existing mobile systems, we show that PAMS can reduce speech intelligibility by up to 74.3% while maintaining similar performance in detecting sleeping sounds. 
    more » « less