skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Transfer Learning from Youtube Soundtracks to Tag Arctic Ecoacoustic Recordings
Sound provides a valuable tool for long-term monitoring of sensitive animal habitats at a spatial scale larger than camera traps or field observations, while also providing more details than satellite imagery. Currently, the ability to collect such recordings outstrips the ability to analyze them manually, necessitating the development of automatic analysis methods. While several datasets and models of large corpora of video soundtracks have recently been released, it is not clear to what extent these models will generalize to environmental recordings and the scientific questions of interest in analyzing them. This paper investigates this generalization in several ways and finds that models themselves display limited performance, however, their intermediate representations can be used to train successful models on small sets of labeled data.  more » « less
Award ID(s):
1839185
PAR ID:
10189520
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Conference on Audio Speech and Signal Processing
Page Range / eLocation ID:
726 to 730
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Objective. To understand neural circuit dynamics, it is critical to manipulate and record many individual neurons. Traditional recording methods, such as glass microelectrodes, can only control a small number of neurons. More recently, devices with high electrode density have been developed, but few of them can be used for intracellular recording or stimulation in intact nervous systems. Carbon fiber electrodes (CFEs) are 8 µ m-diameter electrodes that can be assembled into dense arrays (pitches ⩾ 80 µ m). They have good signal-to-noise ratios (SNRs) and provide stable extracellular recordings both acutely and chronically in neural tissue in vivo (e.g. rat motor cortex). The small fiber size suggests that arrays could be used for intracellular stimulation. Approach. We tested CFEs for intracellular stimulation using the large identified and electrically compact neurons of the marine mollusk Aplysia californica . Neuron cell bodies in Aplysia range from 30 µ m to over 250 µ m. We compared the efficacy of CFEs to glass microelectrodes by impaling the same neuron’s cell body with both electrodes and connecting them to a DC coupled amplifier. Main results. We observed that intracellular waveforms were essentially identical, but the amplitude and SNR in the CFE were lower than in the glass microelectrode. CFE arrays could record from 3 to 8 neurons simultaneously for many hours, and many of these recordings were intracellular, as shown by simultaneous glass microelectrode recordings. CFEs coated with platinum-iridium could stimulate and had stable impedances over many hours. CFEs not within neurons could record local extracellular activity. Despite the lower SNR, the CFEs could record synaptic potentials. CFEs were less sensitive to mechanical perturbations than glass microelectrodes. Significance. The ability to do stable multi-channel recording while stimulating and recording intracellularly make CFEs a powerful new technology for studying neural circuit dynamics. 
    more » « less
  2. Abstract Acoustic recordings of soundscapes are an important category of audio data that can be useful for answering a variety of questions, and an entire discipline within ecology, dubbed “soundscape ecology,” has risen to study them. Bird sound is often the focus of studies of soundscapes due to the ubiquitousness of birds in most terrestrial environments and their high vocal activity. Autonomous acoustic recorders have increased the quantity and availability of recordings of natural soundscapes while mitigating the impact of human observers on community behavior. However, such recordings are of little use without analysis of the sounds they contain. Manual analysis currently stands as the best means of processing this form of data for use in certain applications within soundscape ecology, but it is a laborious task, sometimes requiring many hours of human review to process comparatively few hours of recording. For this reason, few annotated data sets of soundscape recordings are publicly available. Further still, there are no publicly available strongly labeled soundscape recordings of bird sounds that contain information on timing, frequency, and species. Therefore, we present the first data set of strongly labeled bird sound soundscape recordings under free use license. These data were collected in the Northeastern United States at Powdermill Nature Reserve, Rector, Pennsylvania, USA. Recordings encompass 385 minutes of dawn chorus recordings collected by autonomous acoustic recorders between the months of April through July 2018. Recordings were collected in continuous bouts on four days during the study period and contain 48 species and 16,052 annotations. Applications of this data set may be numerous and include the training, validation, and testing of certain advanced machine‐learning models that detect or classify bird sounds. There are no copyright or propriety restrictions; please cite this paper when using materials within. 
    more » « less
  3. Abstract As devices with always-on microphones located in people’s homes, smart speakers have significant privacy implications. We surveyed smart speaker owners about their beliefs, attitudes, and concerns about the recordings that are made and shared by their devices. To ground participants’ responses in concrete interactions, rather than collecting their opinions abstractly, we framed our survey around randomly selected recordings of saved interactions with their devices. We surveyed 116 owners of Amazon and Google smart speakers and found that almost half did not know that their recordings were being permanently stored and that they could review them; only a quarter reported reviewing interactions, and very few had ever deleted any. While participants did not consider their own recordings especially sensitive, they were more protective of others’ recordings (such as children and guests) and were strongly opposed to use of their data by third parties or for advertising. They also considered permanent retention, the status quo, unsatisfactory. Based on our findings, we make recommendations for more agreeable data retention policies and future privacy controls. 
    more » « less
  4. Modern recordings of neural activity provide diverse observations of neurons across brain areas, behavioral conditions, and subjects; presenting an exciting opportunity to reveal the fundamentals of brain-wide dynamics. Current analysis methods, however, often fail to fully harness the richness of such data, as they provide either uninterpretable representations (e.g., via deep networks) or oversimplify models (e.g., by assuming stationary dynamics or analyzing each session independently). Here, instead of regarding asynchronous neural recordings that lack alignment in neural identity or brain areas as a limitation, we leverage these diverse views into the brain to learn a unified model of neural dynamics. Specifically, we assume that brain activity is driven by multiple hidden global sub-circuits. These sub-circuits represent global basis interactions between neural ensembles—functional groups of neurons—such that the time-varying decomposition of these sub-circuits defines how the ensembles’ interactions evolve over time non-stationarily and non-linearly. We discover the neural ensembles underlying non-simultaneous observations, along with their non-stationary evolving interactions, with our new model, CREIMBO (Cross-Regional Ensemble Interactions in Multi-view Brain Observations). CREIMBO identifies the hidden composition of per-session neural ensembles through novel graph-driven dictionary learning and models the ensemble dynamics on a low-dimensional manifold spanned by a sparse time-varying composition of the global sub-circuits. Thus, CREIMBO disentangles overlapping temporal neural processes while preserving interpretability due to the use of a shared underlying sub-circuit basis. Moreover, CREIMBO distinguishes session-specific computations from global (session-invariant) ones by identifying session covariates and variations in sub-circuit activations. We demonstrate CREIMBO’s ability to recover true components in synthetic data, and uncover meaningful brain dynamics in human high-density electrode recordings, including cross-subject neural mechanisms as well as inter- vs. intra-region dynamical motifs. Furthermore, using mouse whole-brain recordings, we show CREIMBO’s ability to discover dynamical interactions that capture task and behavioral variables and meaningfully align with the biological importance of the brain areas they represent 
    more » « less
  5. Various artificial neural networks developed by engineers have been evaluated as models of the brain, such as the ventral stream in the primate visual cortex. After being trained on large datasets, the network outputs are compared to recordings of biological neurons. Good performance in reproducing neural responses is taken as validation for the model. This system identification approach is different from the traditional ways to test theories and associated models in the natural sciences. Furthermore, it lacks a clear foundation in terms of theory and empirical validation. Here we begin characterizing some of these emerging approaches: what do they tell us? To address this question, we benchmark their ability to correctly identify a model by replacing the brain recordings with recordings from a known ground truth model. We evaluate commonly used identification techniques such as neural regression (linear regression on a population of model units) and centered kernel alignment (CKA). Even in the setting where the correct model is among the candidates, we find that the performance of these approaches at system identification is quite variable; it also depends significantly on factors independent of the ground truth architecture, such as scoring function and dataset. 
    more » « less