Hearing one’s own voice is critical for fluent speech production as it allows for the detection and correction of vocalization errors in real time. This behavior known as the auditory feedback control of speech is impaired in various neurological disorders ranging from stuttering to aphasia; however, the underlying neural mechanisms are still poorly understood. Computational models of speech motor control suggest that, during speech production, the brain uses an efference copy of the motor command to generate an internal estimate of the speech output. When actual feedback differs from this internal estimate, an error signal is generated to correct the internal estimate and update necessary motor commands to produce intended speech. We were able to localize the auditory error signal using electrocorticographic recordings from neurosurgical participants during a delayed auditory feedback (DAF) paradigm. In this task, participants hear their voice with a time delay as they produced words and sentences (similar to an echo on a conference call), which is well known to disrupt fluency by causing slow and stutter-like speech in humans. We observed a significant response enhancement in auditory cortex that scaled with the duration of feedback delay, indicating an auditory speech error signal. Immediately following auditory cortex, dorsal precentral gyrus (dPreCG), a region that has not been implicated in auditory feedback processing before, exhibited a markedly similar response enhancement, suggesting a tight coupling between the 2 regions. Critically, response enhancement in dPreCG occurred only during articulation of long utterances due to a continuous mismatch between produced speech and reafferent feedback. These results suggest that dPreCG plays an essential role in processing auditory error signals during speech production to maintain fluency.
- Award ID(s):
- 2029245
- NSF-PAR ID:
- 10338801
- Date Published:
- Journal Name:
- Journal of Speech, Language, and Hearing Research
- Volume:
- 65
- Issue:
- 5
- ISSN:
- 1092-4388
- Page Range / eLocation ID:
- 1800 to 1821
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Bizley, Jennifer K. (Ed.)
-
Purpose To develop and evaluate a technique for 3D dynamic MRI of the full vocal tract at high temporal resolution during natural speech.
Methods We demonstrate 2.4 × 2.4 × 5.8 mm3spatial resolution, 61‐ms temporal resolution, and a 200 × 200 × 70 mm3FOV. The proposed method uses 3D gradient‐echo imaging with a custom upper‐airway coil, a minimum‐phase slab excitation, stack‐of‐spirals readout, pseudo golden‐angle view order in
kx ‐ky , linear Cartesian order alongkz , and spatiotemporal finite difference constrained reconstruction, with 13‐fold acceleration. This technique is evaluated using in vivo vocal tract airway data from 2 healthy subjects acquired at 1.5T scanner, 1 with synchronized audio, with 2 tasks during production of natural speech, and via comparison with interleaved multislice 2D dynamic MRI.Results This technique captured known dynamics of vocal tract articulators during natural speech tasks including tongue gestures during the production of consonants “s” and “l” and of consonant–vowel syllables, and was additionally consistent with 2D dynamic MRI. Coordination of lingual (tongue) movements for consonants is demonstrated via volume‐of‐interest analysis. Vocal tract area function dynamics revealed critical lingual constriction events along the length of the vocal tract for consonants and vowels.
Conclusion We demonstrate feasibility of 3D dynamic MRI of the full vocal tract, with spatiotemporal resolution adequate to visualize lingual movements for consonants and vocal tact shaping during natural productions of consonant–vowel syllables, without requiring multiple repetitions.
-
Differences in prosody (e.g., intonation, rhythm) are among the most obvious language‐related impairments in autism spectrum disorder (ASD), and significantly impact communication. Subtle prosodic differences have also been identified in a subset of clinically unaffected first‐degree relatives of individuals with ASD, and may reflect genetic liability to ASD. This study investigated the neural basis of prosodic differences in ASD and first‐degree relatives through analysis of feedforward and feedback control involved in the planning, production, self‐monitoring, and self‐correction of speech by using a pitch‐perturbed auditory feedback paradigm during sustained vowel and speech production. Results revealed larger vocal response magnitudes to pitch‐perturbed auditory feedback across tasks in ASD and ASD parent groups, with differences in sustained vowel production driven by parents who displayed subclinical personality and language features associated with ASD (i.e., broad autism phenotype). Both ASD and ASD parent groups exhibited increased response onset latencies during sustained vowel production, while the ASD parent group exhibited decreased response onset latencies during speech production. Vocal response magnitudes across tasks were associated with prosodic atypicalities in both individuals with ASD and their parents. Exploratory event‐related potential (ERP) analyses in a subgroup of participants during the sustained vowel task revealed reduced P1 ERP amplitudes in the ASD group, with similar trends observed in parents. Overall, results suggest underdeveloped feedforward systems and neural attenuation in detecting audio‐vocal feedback may contribute to ASD‐related prosodic atypicalities. Importantly, results implicate atypical audio‐vocal integration as a marker of genetic risk to ASD, evident in ASD and among clinically unaffected relatives.
. © 2019 The Authors.Autism Res 2019, 12: 1192–1210Autism Research published by International Society for Autism Research published by Wiley Periodicals, Inc.Lay Summary Previous research has identified atypicalities in prosody (e.g., intonation) in individuals with ASD and a subset of their first‐degree relatives. In order to better understand the mechanisms underlying prosodic differences in ASD, this study examined how individuals with ASD and their parents responded to unexpected differences in what they heard themselves say to modify control of their voice (i.e., audio‐vocal integration). Results suggest that disruptions to audio‐vocal integration in individuals with ASD contribute to ASD‐related prosodic atypicalities, and the more subtle differences observed in parents could reflect underlying genetic liability to ASD.
-
The way listeners perceive speech sounds is largely determined by the language(s) they were exposed to as a child. For example, native speakers of Japanese have a hard time discriminating between American English /ɹ/ and /l/, a phonetic contrast that has no equivalent in Japanese. Such effects are typically attributed to knowledge of sounds in the native language, but quantitative models of how these effects arise from linguistic knowledge are lacking. One possible source for such models is Automatic Speech Recognition (ASR) technology. We implement models based on two types of systems from the ASR literature—hidden Markov models (HMMs) and the more recent, and more accurate, neural network systems—and ask whether, in addition to showing better performance, the neural network systems also provide better models of human perception. We find that while both types of systems can account for Japanese natives’ difficulty with American English /ɹ/ and /l/, only the neural network system successfully accounts for Japanese natives’ facility with Japanese vowel length contrasts. Our work provides a new example, in the domain of speech perception, of an often observed correlation between task performance and similarity to human behavior.more » « less
-
Acoustic analysis of typically developing elementary school-aged (prepubertal) children’s speech has been primarily performed on cross-sectional data in the past. Few studies have examined longitudinal data in this age group. For this presentation, we analyze the developmental changes in the acoustic properties of children’s speech using data collected longitudinally over four years (from first grade to fourth grade). Four male and four female children participated in this study. Data were collected once every year for each child. Using these data, we measured the four-year development of subglottal acoustics (first two subglottal resonances) and vowel acoustics (first four formants and fundamental frequency). Subglottal acoustic measurements are relatively independent of context, and average values were obtained for each child in each year. Vowel acoustics measurements were made for seven vowels (i, ɪ, ɛ, æ, ʌ, ɑ, u), each occurring in two different words in the stressed syllable. We investigated the correlations between the children’s subglottal acoustics, vowel acoustics, and growth-related variables such as standing height, sitting height, and chronological age. Gender-, vowel-, and child-specific analyses were carried out in order to shed light on how typically developing speech acoustics depend on such variables. [Work supported, in part, by the NSF.]more » « less