skip to main content

This content will become publicly available on May 11, 2023

Title: Delayed Auditory Feedback Elicits Specific Patterns of Serial Order Errors in a Paced Syllable Sequence Production Task
Purpose: Delayed auditory feedback (DAF) interferes with speech output. DAF causes distorted and disfluent productions and errors in the serial order of produced sounds. Although DAF has been studied extensively, the specific patterns of elicited speech errors are somewhat obscured by relatively small speech samples, differences across studies, and uncontrolled variables. The goal of this study was to characterize the types of serial order errors that increase under DAF in a systematic syllable sequence production task, which used a closed set of sounds and controlled for speech rate. Method: Sixteen adult speakers repeatedly produced CVCVCV (C = consonant, V = vowel) sequences, paced to a “visual metronome,” while hearing self-generated feedback with delays of 0–250 ms. Listeners transcribed recordings, and speech errors were classified based on the literature surrounding naturally occurring slips of the tongue. A series of mixed-effects models were used to assess the effects of delay for different error types, for error arrival time, and for speaking rate. Results: DAF had a significant effect on the overall error rate for delays of 100 ms or greater. Statistical models revealed significant effects (relative to zero delay) for vowel and syllable repetitions, vowel exchanges, vowel omissions, onset disfluencies, and distortions. more » Serial order errors were especially dominated by vowel and syllable repetitions. Errors occurred earlier on average within a trial for longer feedback delays. Although longer delays caused slower speech, this effect was mediated by the run number (time in the experiment) and small compared with those in previous studies. Conclusions: DAF drives a specific pattern of serial order errors. The dominant pattern of vowel and syllable repetition errors suggests possible mechanisms whereby DAF drives changes to the activity in speech planning representations, yielding errors. These mechanisms are outlined with reference to the GODIVA (Gradient Order Directions Into Velocities of Articulators) model of speech planning and production. Supplemental Material: « less
; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
Journal of Speech, Language, and Hearing Research
Page Range or eLocation-ID:
1800 to 1821
Sponsoring Org:
National Science Foundation
More Like this
  1. The way listeners perceive speech sounds is largely determined by the language(s) they were exposed to as a child. For example, native speakers of Japanese have a hard time discriminating between American English /r/ and /l/, a phonetic contrast that has no equivalent in Japanese. Such effects are typically attributed to knowledge of sounds in the native language, but quantitative models of how these effects arise from linguistic knowledge are lacking. One possible source for such models is Automatic Speech Recognition (ASR) technology. We implement models based on two types of systems from the ASR literature---hidden Markov models (HMMs) and the more recent, and more accurate, neural network systems---and ask whether, in addition to showing better performance, the neural network systems also provide better models of human perception. We find that while both types of systems can account for Japanese natives' difficulty with American English /r/ and /l/, only the neural network system successfully accounts for Japanese natives' facility with Japanese vowel length contrasts. Our work provides a new example, in the domain of speech perception, of an often observed correlation between task performance and similarity to human behavior.
  2. Bilinguals occasionally produce language intrusion errors (inadvertent translations of the intended word), especially when attempting to produce function word targets, and often when reading aloud mixed-language paragraphs. We investigate whether these errors are due to a failure of attention during speech planning, or failure of monitoring speech output by classifying errors based on whether and when they were corrected, and investigating eye movement behaviour surrounding them. Prior research on this topic has primarily tested alphabetic languages (e.g., Spanish–English bilinguals) in which part of speech is confounded with word length, which is related to word skipping (i.e., decreased attention). Therefore, we tested 29 Chinese–English bilinguals whose languages differ in orthography, visually cueing language membership, and for whom part of speech (in Chinese) is less confounded with word length. Despite the strong orthographic cue, Chinese–English bilinguals produced intrusion errors with similar effects as previously reported (e.g., especially with function word targets written in the dominant language). Gaze durations did differ by whether errors were made and corrected or not, but these patterns were similar for function and content words and therefore cannot explain part of speech effects. However, bilinguals regressed to words produced as errors more often than to correctly produced words,more »but regressions facilitated correction of errors only for content, not for function words. These data suggest that the vulnerability of function words to language intrusion errors primarily reflects automatic retrieval and failures of speech monitoring mechanisms from stopping function versus content word errors after they are planned for production.

    « less
  3. Acoustic analysis of typically developing elementary school-aged (prepubertal) children’s speech has been primarily performed on cross-sectional data in the past. Few studies have examined longitudinal data in this age group. For this presentation, we analyze the developmental changes in the acoustic properties of children’s speech using data collected longitudinally over four years (from first grade to fourth grade). Four male and four female children participated in this study. Data were collected once every year for each child. Using these data, we measured the four-year development of subglottal acoustics (first two subglottal resonances) and vowel acoustics (first four formants and fundamental frequency). Subglottal acoustic measurements are relatively independent of context, and average values were obtained for each child in each year. Vowel acoustics measurements were made for seven vowels (i, ɪ, ɛ, æ, ʌ, ɑ, u), each occurring in two different words in the stressed syllable. We investigated the correlations between the children’s subglottal acoustics, vowel acoustics, and growth-related variables such as standing height, sitting height, and chronological age. Gender-, vowel-, and child-specific analyses were carried out in order to shed light on how typically developing speech acoustics depend on such variables. [Work supported, in part, by the NSF.]
  4. Motor behavior often occurs in environments with multiple goal options that can vary during the ongoing action. We explored this situation by requiring subjects to select between different target options during an ongoing reach. During split trials the original target was replaced with a left and a right flanking target, and participants had to select between them. This contrasted with the standard jump trials, where the original target would be replaced with a single flanking target, left or right. When participants were instructed to follow their natural tendency, they all tended to select the split target nearest the original. The near-target preference was more prominent with increased spatial disparity between the options and when participants could preview the potential options. Moreover, explicit instruction to obtain the “far” target during split trials resulted many errors compared with a “near” instruction, ~50% vs. ~15%. Online reaction times to target change were delayed in split trials compared with jump trials, ~200 ms vs. ~150 ms, but also highly automatic. Trials in which the instructed far target was correctly obtained were delayed by a further ~50 ms, unlike those in which the near target was incorrectly obtained. We also observed nonspecific responses from armmore »muscles at the jump trial latency during split trials. Taken together, our results indicate that online selection of reach targets is automatically linked to the spatial distribution of the options, though at greater delays than redirecting to a single target. NEW & NOTEWORTHY This work demonstrates that target selection during an ongoing reach is automatically linked to the option nearest a voided target. Online reaction times for two options are longer than redirection to a single option. Attempts to override the near-target tendency result in a high number of errors at the normal delay and further delays when the attempt is successful.« less
  5. Humans are born as “universal listeners” without a bias toward any particular language. However, over the first year of life, infants’ perception is shaped by learning native speech categories. Acoustically different sounds—such as the same word produced by different speakers—come to be treated as functionally equivalent. In natural environments, these categories often emerge incidentally without overt categorization or explicit feedback. However, the neural substrates of category learning have been investigated almost exclusively using overt categorization tasks with explicit feedback about categorization decisions. Here, we examined whether the striatum, previously implicated in category learning, contributes to incidental acquisition of sound categories. In the fMRI scanner, participants played a videogame in which sound category exemplars aligned with game actions and events, allowing sound categories to incidentally support successful game play. An experimental group heard nonspeech sound exemplars drawn from coherent category spaces, whereas a control group heard acoustically similar sounds drawn from a less structured space. Although the groups exhibited similar in-game performance, generalization of sound category learning and activation of the posterior striatum were significantly greater in the experimental than control group. Moreover, the experimental group showed brain–behavior relationships related to the generalization of all categories, while in the control groupmore »these relationships were restricted to the categories with structured sound distributions. Together, these results demonstrate that the striatum, through its interactions with the left superior temporal sulcus, contributes to incidental acquisition of sound category representations emerging from naturalistic learning environments.

    « less