skip to main content


Title: Binding the Acoustic Features of an Auditory Source through Temporal Coherence
Abstract Numerous studies have suggested that the perception of a target sound stream (or source) can only be segregated from a complex acoustic background mixture if the acoustic features underlying its perceptual attributes (e.g., pitch, location, and timbre) induce temporally modulated responses that are mutually correlated (or coherent), and that are uncorrelated (incoherent) from those of other sources in the mixture. This “temporal coherence” hypothesis asserts that attentive listening to one acoustic feature of a target enhances brain responses to that feature but would also concomitantly (1) induce mutually excitatory influences with other coherently responding neurons, thus enhancing (or binding) them all as they respond to the attended source; by contrast, (2) suppressive interactions are hypothesized to build up among neurons driven by temporally incoherent sound features, thus relatively reducing their activity. In this study, we report on EEG measurements in human subjects engaged in various sound segregation tasks that demonstrate rapid binding among the temporally coherent features of the attended source regardless of their identity (pure tone components, tone complexes, or noise), harmonic relationship, or frequency separation, thus confirming the key role temporal coherence plays in the analysis and organization of auditory scenes.  more » « less
Award ID(s):
1764010
NSF-PAR ID:
10356204
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Cerebral Cortex Communications
Volume:
2
Issue:
4
ISSN:
2632-7376
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Acoustic devices have played a major role in telecommunications for decades as the leading technology for filtering in RF and microwave frequencies. While filter requirements for insertion loss and bandwidth become more stringent, more functionality is desired for many applications to improve overall system level performance. For instance, a filter with non-reciprocal transmission can minimize losses due to mismatch and protect the source from reflections while also performing its filtering duties. A device such as this one was originally researched by scientists decades ago. These devices were based on the acoustoelectric effect where surface acoustic waves (SAW) traveling in the same direction are as drift carriers in a nearby semiconductor are amplified. While several experiments were successfully demonstrated in [1], [2], [3]. these devices suffered from extremely high operating electric fields and noise figure [4], [5]. In the past few years, new techniques have been developed for implementing non-reciprocal devices such as isolators and circulators without utilizing magnetic materials [6], [7], [8], [9]. The most popular technique has been spatio-temporal modulation (STM) where commutated clock signals synchronized with delay elements result in non-reciprocal transmission through the network. STM has also been adapted by researchers to create non-reciprocal filters. The work in [10] utilizes 4 clocks signals to obtain a non-reciprocal filter with an insertion loss of -6.6 dB an isolation of 25.4 dB. Another filter demonstrated in [11] utilizes 6 synchronized clock signals to obtain a non-reciprocal filter with an insertion loss of -5.6 dB and an Isolation of 20 dB. In this work, a novel non-reciprocal topology is explored with the use of only one modulation signal. The design is based on asymmetrical SAW delay lines with a parametric amplifier. The device can operate in two different modes: phase coherent mode and phase incoherent mode. In phase coherent mode, the device is capable of over +12 dB of gain and 20.2 dB of isolation. A unique feature of this mode is that the phase of the pump signal can be utilized to tune the frequency response of the filter. Under the phase-incoherent mode, the pump frequency remains constant and the device behaves as a normal filter with non-reciprocal transmission exhibiting over +7 dB of gain and 17.33 dB of isolation. While the tuning capability is lost in this mode, phase-coherence is no longer necessary so the device can be utilized in most filtering applications. 
    more » « less
  2. The concept of stimulus feature tuning isfundamental to neuroscience. Cortical neurons acquire their feature-tuning properties by learning from experience and using proxy signs of tentative features’ potential usefulness that come from the spatial and/or temporal context in which these features occur. According to this idea, local but ultimately behaviorally useful features should be the ones that are predictably related to other such features either preceding them in time or taking place side-by-side with them. Inspired by this idea, in this paper, deep neural networks are combined with Canonical Correlation Analysis (CCA) for feature extraction and the power of the features is demonstrated using unsupervised cross-modal prediction tasks. CCA is a multi-view feature extraction method that finds correlated features across multiple datasets (usually referred to as views or modalities). CCA finds linear transformations of each view such that the extracted principal components, or features, have a maximal mutual correlation. CCA is a linear method, and the features are computed by a weighted sum of each view's variables. Once the weights are learned, CCA can be applied to new examples and used for cross-modal prediction by inferring the target-view features of an example from its given variables in a source (query) view. To test the proposed method, it was applied to the unstructured CIFAR-100 dataset of 60,000 images categorized into 100 classes, which are further grouped into 20 superclasses and used to demonstrate the mining of image-tag correlations. CCA was performed on the outputs of three pre-trained CNNs: AlexNet, ResNet, and VGG. Taking advantage of the mutually correlated features extracted with CCA, a search for nearest neighbors was performed in the canonical subspace common to both the query and the target views to retrieve the most matching examples in the target view, which successfully predicted the superclass membership of the tested views without any supervised training. 
    more » « less
  3. SUMMARY

    Infrasound sensors are deployed in a variety of spatial configurations and scales for geophysical monitoring, including networks of single sensors and networks of multisensor infrasound arrays. Infrasound signal detection strategies exploiting these data commonly make use of intersensor correlation and coherence (array processing, multichannel correlation); network-based tracking of signal features (e.g. reverse time migration); or a combination of these such as backazimuth cross-bearings for multiple arrays. Single-sensor trace-based denoising techniques offer significant potential to improve all of these various infrasound data processing strategies, but have not previously been investigated in detail. Single-sensor denoising represents a pre-processing step that could reduce the effects of ambient infrasound and wind noise in infrasound signal association and location workflows. We systematically investigate the utility of a range of single-sensor denoising methods for infrasound data processing, including noise gating, non-negative matrix factorization, and data-adaptive Wiener filtering. For the data testbed, we use the relatively dense regional infrasound network in Alaska, which records a high rate of volcanic eruptions with signals varying in power, duration, and waveform and spectral character. We primarily use data from the 2016–2017 Bogoslof volcanic eruption, which included multiple explosions, and synthetics. The Bogoslof volcanic sequence provides an opportunity to investigate regional infrasound detection, association, and location for a set of real sources with varying source spectra subject to anisotropic atmospheric propagation and varying noise levels (both incoherent wind noise and coherent ambient infrasound, primarily microbaroms). We illustrate the advantages and disadvantages of the different denoising methods in categories such as event detection, waveform distortion, the need for manual data labelling, and computational cost. For all approaches, denoising generally performs better for signals with higher signal-to-noise ratios and with less spectral and temporal overlap between signals and noise. Microbaroms are the most globally pervasive and repetitive coherent ambient infrasound noise source, with such noise often referred to as clutter or interference. We find that denoising offers significant potential for microbarom clutter reduction. Single-channel denoising of microbaroms prior to standard array processing enhances both the quantity and bandwidth of detectable volcanic events. We find that reduction of incoherent wind noise is more challenging using the denoising methods we investigate; thus, station hardware (wind noise reduction systems) and site selection remain critical and cannot be replaced by currently available digital denoising methodologies. Overall, we find that adding single-channel denoising as a component in the processing workflow can benefit a variety of infrasound signal detection, association, and location schemes. The denoising methods can also isolate the noise itself, with utility in statistically characterizing ambient infrasound noise.

     
    more » « less
  4. The discrimination of complex sounds is a fundamental function of the auditory system. This operation must be robust in the presence of noise and acoustic clutter. Echolocating bats are auditory specialists that discriminate sonar objects in acoustically complex environments. Bats produce brief signals, interrupted by periods of silence, rendering echo snapshots of sonar objects. Sonar object discrimination requires that bats process spatially and temporally overlapping echoes to make split-second decisions. The mechanisms that enable this discrimination are not well understood, particularly in complex environments. We explored the neural underpinnings of sonar object discrimination in the presence of acoustic scattering caused by physical clutter. We performed electrophysiological recordings in the inferior colliculus of awake big brown bats, to broadcasts of prerecorded echoes from physical objects. We acquired single unit responses to echoes and discovered a subpopulation of IC neurons that encode acoustic features that can be used to discriminate between sonar objects. We further investigated the effects of environmental clutter on this population’s encoding of acoustic features. We discovered that the effect of background clutter on sonar object discrimination is highly variable and depends on object properties and target-clutter spatiotemporal separation. In many conditions, clutter impaired discrimination of sonar objects. However, in some instances clutter enhanced acoustic features of echo returns, enabling higher levels of discrimination. This finding suggests that environmental clutter may augment acoustic cues used for sonar target discrimination and provides further evidence in a growing body of literature that noise is not universally detrimental to sensory encoding. 
    more » « less
  5. Abstract

    Speech processing often occurs amid competing inputs from other modalities, for example, listening to the radio while driving. We examined the extent to which dividing attention between auditory and visual modalities (bimodal divided attention) impacts neural processing of natural continuous speech from acoustic to linguistic levels of representation. We recorded electroencephalographic (EEG) responses when human participants performed a challenging primary visual task, imposing low or high cognitive load while listening to audiobook stories as a secondary task. The two dual-task conditions were contrasted with an auditory single-task condition in which participants attended to stories while ignoring visual stimuli. Behaviorally, the high load dual-task condition was associated with lower speech comprehension accuracy relative to the other two conditions. We fitted multivariate temporal response function encoding models to predict EEG responses from acoustic and linguistic speech features at different representation levels, including auditory spectrograms and information-theoretic models of sublexical-, word-form-, and sentence-level representations. Neural tracking of most acoustic and linguistic features remained unchanged with increasing dual-task load, despite unambiguous behavioral and neural evidence of the high load dual-task condition being more demanding. Compared to the auditory single-task condition, dual-task conditions selectively reduced neural tracking of only some acoustic and linguistic features, mainly at latencies >200 ms, while earlier latencies were surprisingly unaffected. These findings indicate that behavioral effects of bimodal divided attention on continuous speech processing occur not because of impaired early sensory representations but likely at later cognitive processing stages. Crossmodal attention-related mechanisms may not be uniform across different speech processing levels.

     
    more » « less