Primary auditory cortex is a critical stage in the human auditory pathway, a gateway between subcortical and higher-level cortical areas. Receiving the output of all subcortical processing, it sends its output on to higher-level cortex. Non-invasive physiological recordings of primary auditory cortex using electroencephalography (EEG) and magnetoencephalography (MEG), however, may not have sufficient specificity to separate responses generated in primary auditory cortex from those generated in underlying subcortical areas or neighboring cortical areas. This limitation is important for investigations of effects of top-down processing (e.g., selective-attention-based) on primary auditory cortex: higher-level areas are known to be strongly influenced by top-down processes, but subcortical areas are often assumed to perform strictly bottom-up processing. Fortunately, recent advances have made it easier to isolate the neural activity of primary auditory cortex from other areas. In this perspective, we focus on time-locked responses to stimulus features in the high gamma band (70–150 Hz) and with early cortical latency (∼40 ms), intermediate between subcortical and higher-level areas. We review recent findings from physiological studies employing either repeated simple sounds or continuous speech, obtaining either a frequency following response (FFR) or temporal response function (TRF). The potential roles of top-down processing are underscored, and comparisons with invasive intracranial EEG (iEEG) and animal model recordings are made. We argue that MEG studies employing continuous speech stimuli may offer particular benefits, in that only a few minutes of speech generates robust high gamma responses from bilateral primary auditory cortex, and without measurable interference from subcortical or higher-level areas.
more »
« less
Comparing methods for deriving the auditory brainstem response to continuous speech in human listeners
Abstract Several methods have recently been developed to derive the auditory brainstem response (ABR) from continuous natural speech, facilitating investigation into subcortical encoding of speech. These tools rely on deconvolution to compute the temporal response function (TRF), which models the subcortical auditory pathway as a linear system, where a nonlinearly processed stimulus is taken as the input (i.e., regressor), the electroencephalogram (EEG) data as the output, and the ABR as the impulse response deconvolved from the recorded EEG and the regressor. In this study, we analyzed EEG recordings from subjects listening to both unaltered natural speech and synthesized “peaky speech.” We compared the derived ABR TRFs using three regressors: the half-wave rectified stimulus (HWR) from Maddox and Lee (2018), the glottal pulse train (GP) from Polonenko and Maddox (2021), and the auditory nerve modeled response (ANM; Zilany et al. (2014); (2009)) used in Shan et al. (2024). Our evaluation focused on the signal-to-noise ratio, prediction accuracy, efficiency, and practicality of applying each regressor in both unaltered and peaky speech. The results indicate that the ANM regressor with peaky speech provides the best performance, with the ANM for unaltered speech and the GP regressor for peaky speech close behind, whereas the HWR regressor demonstrated relatively poorer performance. There are, thus, multiple stimulus and analysis tools that can provide high-quality subcortical TRFs, with the choices for which to use dictated by experimental needs. The findings in this study will guide future research and clinical use in selecting the most appropriate paradigm for ABR derivation from continuous, naturalistic speech.
more »
« less
- PAR ID:
- 10597098
- Publisher / Repository:
- DOI PREFIX: 10.1162
- Date Published:
- Journal Name:
- Imaging Neuroscience
- Volume:
- 3
- ISSN:
- 2837-6056
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Music and speech are encountered daily and are unique to human beings. Both are transformed by the auditory pathway from an initial acoustical encoding to higher level cognition. Studies of cortex have revealed distinct brain responses to music and speech, but differences may emerge in the cortex or may be inherited from different subcortical encoding. In the first part of this study, we derived the human auditory brainstem response (ABR), a measure of subcortical encoding, to recorded music and speech using two analysis methods. The first method, described previously and acoustically based, yielded very different ABRs between the two sound classes. The second method, however, developed here and based on a physiological model of the auditory periphery, gave highly correlated responses to music and speech. We determined the superiority of the second method through several metrics, suggesting there is no appreciable impact of stimulus class (i.e., music vs speech) on the way stimulus acoustics are encoded subcortically. In this study’s second part, we considered the cortex. Our new analysis method resulted in cortical music and speech responses becoming more similar but with remaining differences. The subcortical and cortical results taken together suggest that there is evidence for stimulus-class dependent processing of music and speech at the cortical but not subcortical level.more » « less
-
The magnetoencephalography (MEG) response to continuous auditory stimuli, such as speech, is commonly described using a linear filter, the auditory temporal response function (TRF). Though components of the sensor level TRFs have been well characterized, the underlying neural sources responsible for these components are not well understood. In this work, we provide a unified framework for determining the TRFs of neural sources directly from the MEG data, by integrating the TRF and distributed forward source models into one, and casting the joint estimation task as a Bayesian optimization problem. Though the resulting problem emerges as non-convex, we propose efficient solutions that leverage recent advances in evidence maximization. We demonstrate the effectiveness of the resulting algorithm in both simulated and experimentally recorded MEG data from humans.more » « less
-
Abstract In this paper, we introduce a new, open-source software developed in Python for analyzing Auditory Brainstem Response (ABR) waveforms. ABRs are a far-field recording of synchronous neural activity generated by the auditory fibers in the ear in response to sound, and used to study acoustic neural information traveling along the ascending auditory pathway. Common ABR data analysis practices are subject to human interpretation and are labor-intensive, requiring manual annotations and visual estimation of hearing thresholds. The proposed new Auditory Brainstem Response Analyzer (ABRA) software is designed to facilitate the analysis of ABRs by supporting batch data import/export, waveform visualization, and statistical analysis. Techniques implemented in this software include algorithmic peak finding, threshold estimation, latency estimation, time warping for curve alignment, and 3D plotting of ABR waveforms over stimulus frequencies and decibels. The excellent performance on a large dataset of ABR collected from three labs in the field of hearing research that use different experimental recording settings illustrates the efficacy, flexibility, and wide utility of ABRA.more » « less
-
Abstract Speech processing often occurs amid competing inputs from other modalities, for example, listening to the radio while driving. We examined the extent to which dividing attention between auditory and visual modalities (bimodal divided attention) impacts neural processing of natural continuous speech from acoustic to linguistic levels of representation. We recorded electroencephalographic (EEG) responses when human participants performed a challenging primary visual task, imposing low or high cognitive load while listening to audiobook stories as a secondary task. The two dual-task conditions were contrasted with an auditory single-task condition in which participants attended to stories while ignoring visual stimuli. Behaviorally, the high load dual-task condition was associated with lower speech comprehension accuracy relative to the other two conditions. We fitted multivariate temporal response function encoding models to predict EEG responses from acoustic and linguistic speech features at different representation levels, including auditory spectrograms and information-theoretic models of sublexical-, word-form-, and sentence-level representations. Neural tracking of most acoustic and linguistic features remained unchanged with increasing dual-task load, despite unambiguous behavioral and neural evidence of the high load dual-task condition being more demanding. Compared to the auditory single-task condition, dual-task conditions selectively reduced neural tracking of only some acoustic and linguistic features, mainly at latencies >200 ms, while earlier latencies were surprisingly unaffected. These findings indicate that behavioral effects of bimodal divided attention on continuous speech processing occur not because of impaired early sensory representations but likely at later cognitive processing stages. Crossmodal attention-related mechanisms may not be uniform across different speech processing levels.more » « less
An official website of the United States government
