Abstract Numerous studies have suggested that the perception of a target sound stream (or source) can only be segregated from a complex acoustic background mixture if the acoustic features underlying its perceptual attributes (e.g., pitch, location, and timbre) induce temporally modulated responses that are mutually correlated (or coherent), and that are uncorrelated (incoherent) from those of other sources in the mixture. This “temporal coherence” hypothesis asserts that attentive listening to one acoustic feature of a target enhances brain responses to that feature but would also concomitantly (1) induce mutually excitatory influences with other coherently responding neurons, thus enhancing (or binding) them all as they respond to the attended source; by contrast, (2) suppressive interactions are hypothesized to build up among neurons driven by temporally incoherent sound features, thus relatively reducing their activity. In this study, we report on EEG measurements in human subjects engaged in various sound segregation tasks that demonstrate rapid binding among the temporally coherent features of the attended source regardless of their identity (pure tone components, tone complexes, or noise), harmonic relationship, or frequency separation, thus confirming the key role temporal coherence plays in the analysis and organization of auditory scenes.
more »
« less
This content will become publicly available on April 1, 2026
Auditory streaming and rhythmic masking release in Cope's gray treefrog
Auditory streaming involves perceptually assigning overlapping sound sequences to their respective sources. Although critical for acoustic communication, few studies have investigated the role of auditory streaming in nonhuman animals. This study used the rhythmic masking release paradigm to investigate auditory streaming in Cope's gray treefrog (Hyla chrysoscelis). In this paradigm, the temporal rhythm of a Target sequence is masked in the presence of a Distractor sequence. A release from masking can be induced by adding a Captor sequence that perceptually “captures” the Distractor into an auditory stream segregated from the Target. Here, the Target was a sequence of repeated pulses mimicking the rhythm of the species' advertisement call. Gravid females exhibited robust phonotaxis to the Target alone, but responses declined significantly when Target pulses were interleaved with those of a Distractor at the same frequency, indicating the Target's attractive temporal rhythm was masked. However, addition of a remote-frequency Captor resulted in a significant increase in responses to the Target, suggesting the Target could be segregated from a separate stream consisting of integrated Distractor and Captor sequences. This result sheds light on how auditory streaming may facilitate acoustic communication in frogs and other animals.
more »
« less
- Award ID(s):
- 2022253
- PAR ID:
- 10584895
- Publisher / Repository:
- Acoustical Society of America
- Date Published:
- Journal Name:
- The Journal of the Acoustical Society of America
- Volume:
- 157
- Issue:
- 4
- ISSN:
- 1520-8524
- Page Range / eLocation ID:
- 2319 to 2329
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Multilingual speakers can find speech recognition in everyday environments like restaurants and open-plan offices particularly challenging. In a world where speaking multiple languages is increasingly common, effective clinical and educational interventions will require a better understanding of how factors like multilingual contexts and listeners’ language proficiency interact with adverse listening environments. For example, word and phrase recognition is facilitated when competing voices speak different languages. Is this due to a “release from masking” from lower-level acoustic differences between languages and talkers, or higher-level cognitive and linguistic factors? To address this question, we created a “one-man bilingual cocktail party” selective attention task using English and Mandarin speech from one bilingual talker to reduce low-level acoustic cues. In Experiment 1, 58 listeners more accurately recognized English targets when distracting speech was Mandarin compared to English. Bilingual Mandarin–English listeners experienced significantly more interference and intrusions from the Mandarin distractor than did English listeners, exacerbated by challenging target-to-masker ratios. In Experiment 2, 29 Mandarin–English bilingual listeners exhibited linguistic release from masking in both languages. Bilinguals experienced greater release from masking when attending to English, confirming an influence of linguistic knowledge on the “cocktail party” paradigm that is separate from primarily energetic masking effects. Effects of higher-order language processing and expertise emerge only in the most demanding target-to-masker contexts. The “one-man bilingual cocktail party” establishes a useful tool for future investigations and characterization of communication challenges in the large and growing worldwide community of Mandarin–English bilinguals.more » « less
-
Little is known about the neural mechanisms that mediate differential action–selection responses to communication and echolocation calls in bats. For example, in the big brown bat, frequency modulated (FM) food-claiming communication calls closely resemble FM echolocation calls, which guide social and orienting behaviors, respectively. Using advanced signal processing methods, we identified fine differences in temporal structure of these natural sounds that appear key to auditory discrimination and behavioral decisions. We recorded extracellular potentials from single neurons in the midbrain inferior colliculus (IC) of passively listening animals, and compared responses to playbacks of acoustic signals used by bats for social communication and echolocation. We combined information obtained from spike number and spike triggered averages (STA) to reveal a robust classification of neuron selectivity for communication or echolocation calls. These data highlight the importance of temporal acoustic structure for differentiating echolocation and food-claiming social calls and point to general mechanisms of natural sound processing across species.more » « less
-
Self-supervised skeleton-based action recognition has attracted more attention in recent years. By utilizing the unlabeled data, more generalizable features can be learned to alleviate the overfitting problem and reduce the demand for massive labeled training data. Inspired by the MAE [1], we propose a spatial-temporal masked autoencoder framework for self-supervised 3D skeleton-based action recognition (SkeletonMAE). Following MAE's masking and reconstruction pipeline, we utilize a skeleton-based encoder-decoder transformer architecture to reconstruct the masked skeleton sequences. A novel masking strategy, named Spatial-Temporal Masking, is introduced in terms of both joint-level and frame-level for the skeleton sequence. This pre-training strategy makes the encoder output generalizable skeleton features with spatial and temporal dependencies. Given the unmasked skeleton sequence, the encoder is fine-tuned for the action recognition task. Extensive ex- periments show that our SkeletonMAE achieves remarkable performance and outperforms the state-of-the-art methods on both NTU RGB+D 60 and NTU RGB+D 120 datasets.more » « less
-
Acoustic communication is a fundamental component of mate and competitor recognition in a variety of taxa and requires animals to detect and differentiate among acoustic stimuli (Bradbury and Vehrencamp 2011). The matched filter hypothesis predicts a correspondence between peripheral auditory tuning of receivers and properties of species-specific acoustic signals, but few studies have assessed this relationship in rodents. We recorded vocalizations and measured auditory brainstem responses (ABRs) in northern grasshopper mice (Onychomys leucogaster), a species that produces long-distance calls to advertise their presence to rivals and potential mates. ABR data indicate the highest sensitivity (28.33 9.07 dB SPL re: 20 Pa) at 10 kHz, roughly corresponding to the fundamental frequency (11.6 ± 0.63 kHz) of longdistance calls produced by conspecifics. However, the frequency range of peripheral auditory sensitivity was broad (8-24 kHz), indicating the potential to detect both the harmonics of conspecific calls and vocalizations of sympatric heterospecifics. Our findings provide support for the matched filter hypothesis extended to include other ecologically relevant stimuli. Our study contributes important baseline information about the sensory ecology of a unique rodent to the study of sound perception.more » « less
An official website of the United States government
