skip to main content

Title: Exaggerated cortical representation of speech in older listeners: mutual information analysis
Aging is associated with an exaggerated representation of the speech envelope in auditory cortex. The relationship between this age-related exaggerated response and a listener’s ability to understand speech in noise remains an open question. Here, information-theory-based analysis methods are applied to magnetoencephalography recordings of human listeners, investigating their cortical responses to continuous speech, using the novel nonlinear measure of phase-locked mutual information between the speech stimuli and cortical responses. The cortex of older listeners shows an exaggerated level of mutual information, compared with younger listeners, for both attended and unattended speakers. The mutual information peaks for several distinct latencies: early (∼50 ms), middle (∼100 ms), and late (∼200 ms). For the late component, the neural enhancement of attended over unattended speech is affected by stimulus signal-to-noise ratio, but the direction of this dependency is reversed by aging. Critically, in older listeners and for the same late component, greater cortical exaggeration is correlated with decreased behavioral inhibitory control. This negative correlation also carries over to speech intelligibility in noise, where greater cortical exaggeration in older listeners is correlated with worse speech intelligibility scores. Finally, an age-related lateralization difference is also seen for the ∼100 ms latency peaks, where older listeners show more » a bilateral response compared with younger listeners’ right lateralization. Thus, this information-theory-based analysis provides new, and less coarse-grained, results regarding age-related change in auditory cortical speech processing, and its correlation with cognitive measures, compared with related linear measures. NEW & NOTEWORTHY Cortical representations of natural speech are investigated using a novel nonlinear approach based on mutual information. Cortical responses, phase-locked to the speech envelope, show an exaggerated level of mutual information associated with aging, appearing at several distinct latencies (∼50, ∼100, and ∼200 ms). Critically, for older listeners only, the ∼200 ms latency response components are correlated with specific behavioral measures, including behavioral inhibition and speech comprehension. « less
; ; ;
Award ID(s):
Publication Date:
Journal Name:
Journal of Neurophysiology
Page Range or eLocation-ID:
1152 to 1164
Sponsoring Org:
National Science Foundation
More Like this
  1. Voice pitch carries linguistic as well as non-linguistic information. Previous studies have described cortical tracking of voice pitch in clean speech, with responses reflecting both pitch strength and pitch value. However, pitch is also a powerful cue for auditory stream segregation, especially when competing streams have pitch differing in fundamental frequency, as is the case when multiple speakers talk simultaneously. We therefore investigated how cortical speech pitch tracking is affected in the presence of a second, task-irrelevant speaker. We analyzed human magnetoencephalography (MEG) responses to continuous narrative speech, presented either as a single talker in a quiet background, or as a two-talker mixture of a male and a female speaker. In clean speech, voice pitch was associated with a right-dominant response, peaking at a latency of around 100 ms, consistent with previous EEG and ECoG results. The response tracked both the presence of pitch as well as the relative value of the speaker’s fundamental frequency. In the two-talker mixture, pitch of the attended speaker was tracked bilaterally, regardless of whether or not there was simultaneously present pitch in the speech of the irrelevant speaker. Pitch tracking for the irrelevant speaker was reduced: only the right hemisphere still significantly tracked pitchmore »of the unattended speaker, and only during intervals in which no pitch was present in the attended talker’s speech. Taken together, these results suggest that pitch-based segregation of multiple speakers, at least as measured by macroscopic cortical tracking, is not entirely automatic but strongly dependent on selective attention.« less
  2. The mismatch negativity (MMN) event-related potential (ERP) indexes relatively automatic detection of changes in sensory stimuli and is typically attenuated in individuals with schizophrenia. However, contributions of different frequencies of electroencephalographic (EEG) activity to the MMN and the later P3a attentional orienting response in schizophrenia are poorly understood and were the focus of the present study. Participants with a schizophrenia-spectrum disorder ( n = 85) and non-psychiatric control participants ( n = 74) completed a passive auditory oddball task containing 10% 50 ms “deviant” tones and 90% 100 ms “standard” tones. EEG data were analyzed using spatial principal component analysis (PCA) applied to wavelet-based time-frequency analysis and MMN and P3a ERPs. The schizophrenia group compared to the control group had smaller MMN amplitudes and lower deviant-minus-standard theta but not alpha event-related spectral perturbation (ERSP) after accounting for participant age and sex. Larger MMN and P3a amplitudes but not latencies were correlated with greater theta and alpha time-frequency activity. Multiple linear regression analyses revealed that control participants showed robust relationships between larger MMN amplitudes and greater deviant-minus-standard theta inter-trial coherence (ITC) and between larger P3a amplitudes and greater deviant-minus-standard theta ERSP, whereas these dynamic neural processes were less tightly coupled in participants with a schizophrenia-spectrum disorder. Studymore »results help clarify frequency-based contributions of time-domain (ie, ERP) responses and indicate a potential disturbance in the neural dynamics of detecting change in sensory stimuli in schizophrenia. Overall, findings add to the growing body of evidence that psychotic illness is associated with widespread neural dysfunction in the theta frequency band.

    « less
  3. Humans engagement in music rests on underlying elements such as the listeners’ cultural background and interest in music. These factors modulate how listeners anticipate musical events, a process inducing instantaneous neural responses as the music confronts these expectations. Measuring such neural correlates would represent a direct window into high-level brain processing. Here we recorded cortical signals as participants listened to Bach melodies. We assessed the relative contributions of acoustic versus melodic components of the music to the neural signal. Melodic features included information on pitch progressions and their tempo, which were extracted from a predictive model of musical structure based on Markov chains. We related the music to brain activity with temporal response functions demonstrating, for the first time, distinct cortical encoding of pitch and note-onset expectations during naturalistic music listening. This encoding was most pronounced at response latencies up to 350 ms, and in both planum temporale and Heschl’s gyrus.
  4. This study tests speech-in-noise perception and social ratings of speech produced by different text-to-speech (TTS) synthesis methods. We used identical speaker training datasets for a set of 4 voices (using AWS Polly TTS), generated using neural and concatenative TTS. In Experiment 1, listeners identified target words in semantically predictable and unpredictable sentences in concatenative and neural TTS at two noise levels (-3 dB, -6 dB SNR). Correct word identification was lower for neural TTS than for concatenative TTS, in the lower SNR, and for semantically unpredictable sentences. In Experiment 2, listeners rated the voices on 4 social attributes. Neural TTS was rated as more human-like, natural, likeable, and familiar than concatenative TTS. Furthermore, how natural listeners rated the neural TTS voice was positively related to their speech-in-noise accuracy. Together, these findings show that the TTS method influences both intelligibility and social judgments of speech — and that these patterns are linked. Overall, this work contributes to our understanding of the of the nexus of speech technology and human speech perception.
  5. This exploratory study examined the simultaneous interactions and relative contributions of bottom-up social information (regional dialect, speaking style), top-down contextual information (semantic predictability), and the internal dynamics of the lexicon (neighborhood density, lexical frequency) to lexical access and word recognition. Cross-modal matching and intelligibility in noise tasks were conducted with a community sample of adults at a local science museum. Each task featured one condition in which keywords were presented in isolation and one condition in which they were presented within a multiword phrase. Lexical processing was slower and more accurate when keywords were presented in their phrasal context, and was both faster and more accurate for auditory stimuli produced in the local Midland dialect. In both tasks, interactions were observed among stimulus dialect, speaking style, semantic predictability, phonological neighborhood density, and lexical frequency. These interactions revealed that bottom-up social information and top-down contextual information contribute more to speech processing than the internal dynamics of the lexicon. Moreover, the relatively stronger bottom-up social effects were observed in both the isolated word and multiword phrase conditions, suggesting that social variation is central to speech processing, even in non-interactive laboratory tasks. At the same time, the specific interactions observed differed between themore »two experiments, reflecting task-specific demands related to processing time constraints and signal degradation.

    « less