Adults struggle to learn non-native speech categories in many experimental settings (Goto, 1971), but learn efficiently in a video game paradigm where non-native speech sounds have functional significance (Lim and Holt, 2011). Behavioral and neural evidence from this and other paradigms point toward the involvement of reinforcement learning mechanisms in speech category learning. We formalize this hypothesis computationally and present two simulations. The first simulates the findings of Lim et al. (2019), providing proof in principle that a reinforcement learning algorithm can successfully capture human results in a video game where people are learning novel categories of noise tokens. Our second simulation extends this to speech sounds and demonstrates that our algorithm mimics second language learners’ improvement on discrimination of a non-native speech contrast. Together these two simulations show that reinforcement learning provides an accurate model of human learning in this paradigm and provide evidence supporting the hypothesis that this mechanism could play a key role in effective speech category learning in adults. Being able to identify the algorithms employed in this paradigm could provide many avenues for pedagogical changes in second language learning and let teachers harness the processes that allow for efficient learning and improvement of non-native perceptual ability.
more »
« less
Neural Representation of Articulable and Inarticulable Novel Sound Contrasts: The Role of the Dorsal Stream
The extent that articulatory information embedded in incoming speech contributes to the formation of new perceptual categories for speech sounds has been a matter of discourse for decades. It has been theorized that the acquisition of new speech sound categories requires a network of sensory and speech motor cortical areas (the “dorsal stream”) to successfully integrate auditory and articulatory information. However, it is possible that these brain regions are not sensitive specifically to articulatory information, but instead are sensitive to the abstract phonological categories being learned. We tested this hypothesis by training participants over the course of several days on an articulable non-native speech contrast and acoustically matched inarticulable nonspeech analogues. After reaching comparable levels of proficiency with the two sets of stimuli, activation was measured in fMRI as participants passively listened to both sound types. Decoding of category membership for the articulable speech contrast alone revealed a series of left and right hemisphere regions outside of the dorsal stream that have previously been implicated in the emergence of non-native speech sound categories, while no regions could successfully decode the inarticulable nonspeech contrast. Although activation patterns in the left inferior frontal gyrus (IFG), the middle temporal gyrus (MTG), and the supplementary motor area (SMA) provided better information for decoding articulable (speech) sounds compared to the inarticulable (sine wave) sounds, the finding that dorsal stream regions do not emerge as good decoders of the articulable contrast alone suggests that other factors, including the strength and structure of the emerging speech categories are more likely drivers of dorsal stream activation for novel sound learning.
more »
« less
- Award ID(s):
- 1735225
- PAR ID:
- 10177581
- Date Published:
- Journal Name:
- Neurobiology of Language
- ISSN:
- 2641-4368
- Page Range / eLocation ID:
- 1 to 26
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Bizley, Jennifer K. (Ed.)Hearing one’s own voice is critical for fluent speech production as it allows for the detection and correction of vocalization errors in real time. This behavior known as the auditory feedback control of speech is impaired in various neurological disorders ranging from stuttering to aphasia; however, the underlying neural mechanisms are still poorly understood. Computational models of speech motor control suggest that, during speech production, the brain uses an efference copy of the motor command to generate an internal estimate of the speech output. When actual feedback differs from this internal estimate, an error signal is generated to correct the internal estimate and update necessary motor commands to produce intended speech. We were able to localize the auditory error signal using electrocorticographic recordings from neurosurgical participants during a delayed auditory feedback (DAF) paradigm. In this task, participants hear their voice with a time delay as they produced words and sentences (similar to an echo on a conference call), which is well known to disrupt fluency by causing slow and stutter-like speech in humans. We observed a significant response enhancement in auditory cortex that scaled with the duration of feedback delay, indicating an auditory speech error signal. Immediately following auditory cortex, dorsal precentral gyrus (dPreCG), a region that has not been implicated in auditory feedback processing before, exhibited a markedly similar response enhancement, suggesting a tight coupling between the 2 regions. Critically, response enhancement in dPreCG occurred only during articulation of long utterances due to a continuous mismatch between produced speech and reafferent feedback. These results suggest that dPreCG plays an essential role in processing auditory error signals during speech production to maintain fluency.more » « less
-
Infants learn the sound categories of their language and adults successfully process the sounds they hear, even though sound categories often overlap in their acoustics. Most researchers agree that listeners use context to disambiguate overlapping categories. However, they differ in their ideas about how context is used. One idea is that listeners normalize out the systematic effects of context from the acoustics of a sound. Another idea is that contextual information may itself be an informative cue to category membership, due to patterns in the types of contexts that particular sounds occur in. We directly contrast these two ways of using context by applying each one to the test case of Japanese vowel length. We find that normalizing out contextual variability from the acoustics does not improve categorization, but using context in a top-down fashion does so substantially. This reveals a limitation of normalization in phonetic acquisition and processing and suggests that approaches that make use of top-down contextual information are promising to pursue.more » « less
-
Decoding auditory stimulus from neural activity can enable neuroprosthetics and direct communication with the brain. Some recent studies have shown successful speech decoding from intracranial recording using deep learning models. However, scarcity of training data leads to low quality speech reconstruction which prevents a complete brain-computer-interface (BCI) application. In this work, we propose a transfer learning approach with a pre-trained GAN to disentangle representation and generation layers for decoding. We first pre-train a generator to produce spectrograms from a representation space using a large corpus of natural speech data. With a small amount of paired data containing the stimulus speech and corresponding ECoG signals, we then transfer it to a bigger network with an encoder attached before, which maps the neural signal to the representation space. To further improve the network generalization ability, we introduce a Gaussian prior distribution regularizer on the latent representation during the transfer phase. With at most 150 training samples for each tested subject, we achieve a state-of-the-art decoding performance. By visualizing the attention mask embedded in the encoder, we observe brain dynamics that are consistent with findings from previous studies investigating dynamics in the superior temporal gyrus (STG), pre-central gyrus (motor) and inferior frontal gyrus (IFG). Our findings demonstrate a high reconstruction accuracy using deep learning networks together with the potential to elucidate interactions across different brain regions during a cognitive task.more » « less
-
This paper focuses on decoding the process of face verification in the human brain using fMRI responses. 2400 fMRI responses are collected from different participants while they perform face verification on genuine and imposter stimuli face pairs. The first part of the paper analyzes the responses covering both cognitive and fMRI neuro-imaging results. With an average verification accuracy of 64.79% by human participants, the results of the cognitive analysis depict that the performance of female participants is significantly higher than the male participants with respect to imposter pairs. The results of the neuroimaging analysis identifies regions of the brain such as the left fusiform gyrus, caudate nucleus, and superior frontal gyrus that are activated when participants perform face verification tasks. The second part of the paper proposes a novel two-level fMRI dictionary learning approach to predict if the stimuli observed is genuine or imposter using the brain activation data for selected regions. A comparative analysis with existing machine learning techniques illustrates that the proposed approach yields at least 4.5% higher classification accuracy than other algorithms. It is envisioned that the result of this study is the first step in designing brain-inspired automatic face verification algorithms.more » « less