Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Self-supervised speech representations display some human-like cross-linguistic perceptual abilitiesState of the art models in automatic speech recognition have shown remarkable improvements due to modern self-supervised (SSL) transformer-based architectures such as wav2vec 2.0 (Baevski et al., 2020). However, how these models encode phonetic information is still not well understood. We explore whether SSL speech models display a linguistic property that characterizes human speech perception: language specificity. We show that while wav2vec 2.0 displays an overall language specificity effect when tested on Hindi vs. English, it does not resemble human speech perception when tested on finer-grained differences in Hindi speech contrasts.more » « lessFree, publicly-accessible full text available November 15, 2025
-
Linking cognitive and neural models of audiovisual processing to explore speech perception in autismAutistic and neurotypical children do not handle audiovisual speech in the same manner. Current evidence suggests that this difference occurs at the level of cue combination. Here, we test whether differences in autistic and neurotypical audiovisual speech perception can be explained by a neural theory of sensory perception in autism, which proposes that heightened levels of neural excitation can account for sensory differences in autism. Through a linking hypothesis that integrates a standard probabilistic cognitive model of cue integration with representations of neural activity, we derive a model that can simulate audio-visual speech perception at a neural population level. Simulations of an audiovisual lexical identification task demonstrate that heightened levels of neural excitation at the level of cue combination cannot account for the observed differences in autistic and neurotypical children's audiovisual speech perception.more » « lessFree, publicly-accessible full text available July 24, 2025
-
It has long been assumed that infants' ability to discriminate between languages stems from their sensitivity to speech rhythm, i.e., organized temporal structure of vowels and consonants in a language. However, the relationship between speech rhythm and language discrimination has not been directly demonstrated. Here, we use computational modeling and train models of speech perception with and without access to information about rhythm. We test these models on language discrimination, and find that access to rhythm does not affect the success of the model in replicating infant language discrimination results. Our findings challenge the relationship between rhythm and language discrimination, and have implications for theories of language acquisition.more » « lessFree, publicly-accessible full text available July 24, 2025
-
Speech perception is complex and demands constant adaptations to the speaker and the environment (i.e. noisy speech, accent, etc.). To adapt, the listener relies on one speech feature more than another. This cognitive mechanism is called selective attention. We present a model that captures the idea of selective attention: we show that this dynamic adaptation process can be captured in a neural architecture by using a multiple encoder beta variational auto encoder (beta-ME-VAE), which is based on rate distortion theory. This model implements the idea that optimal feature weighting looks different under different listening conditions and provides insight into how listeners can adapt their listening strategy on a moment-to-moment basis, even in listening situations they haven't experienced before.more » « less
-
Listeners typically rely more on one aspect of the speech signal than another when categorizing speech sounds. This is known as feature weighting. We present a rate distortion theory model of feature weighting and use it to ask whether human listeners select feature weights simply by mirroring the feature reliabilities that are present in their input. We show that there is an additional component (selective attention) listeners appear to use that is not reflected by the input statistics. This suggests that an internal mechanism is at play in governing listeners' weighting of different aspects of the speech signal, in addition to tracking statistics.more » « less
-
At birth, infants discriminate most of the sounds of the world’s languages, but by age 1, infants become language-specific listeners. This has generally been taken as evidence that infants have learned which acoustic dimensions are contrastive, or useful for distinguishing among the sounds of their language(s), and have begun focusing primarily on those dimensions when perceiving speech. However, speech is highly variable, with different sounds overlapping substantially in their acoustics, and after decades of research, we still do not know what aspects of the speech signal allow infants to differentiate contrastive from noncontrastive dimensions. Here we show that infants could learn which acoustic dimensions of their language are contrastive, despite the high acoustic variability. Our account is based on the cross-linguistic fact that even sounds that overlap in their acoustics differ in the contexts they occur in. We predict that this should leave a signal that infants can pick up on and show that acoustic distributions indeed vary more by context along contrastive dimensions compared with noncontrastive dimensions. By establishing this difference, we provide a potential answer to how infants learn about sound contrasts, a question whose answer in natural learning environments has remained elusive.more » « less
-
Rhythm plays an important role in language perception and learning, with infants perceiving rhythmic differences across languages at birth. While the mechanisms underlying rhythm perception in speech remain unclear, one interesting possibility is that these mechanisms are similar to those involved in the perception of musical rhythm. In this work, we adopt a model originally designed for musical rhythm to simulate speech rhythm perception. We show that this model replicates the behavioral results of language discrimination in newborns, and outperforms an existing model of infant language discrimination. We also find that percussives — fast-changing components in the acoustics — are necessary for distinguishing languages of different rhythms, which suggests that percussives are essential for rhythm perception. Our music-inspired model of speech rhythm may be seen as a first step towards a unified theory of how rhythm is represented in speech and music.more » « less
-
Adults struggle to learn non-native speech categories in many experimental settings (Goto, 1971), but learn efficiently in a video game paradigm where non-native speech sounds have functional significance (Lim and Holt, 2011). Behavioral and neural evidence from this and other paradigms point toward the involvement of reinforcement learning mechanisms in speech category learning. We formalize this hypothesis computationally and present two simulations. The first simulates the findings of Lim et al. (2019), providing proof in principle that a reinforcement learning algorithm can successfully capture human results in a video game where people are learning novel categories of noise tokens. Our second simulation extends this to speech sounds and demonstrates that our algorithm mimics second language learners’ improvement on discrimination of a non-native speech contrast. Together these two simulations show that reinforcement learning provides an accurate model of human learning in this paradigm and provide evidence supporting the hypothesis that this mechanism could play a key role in effective speech category learning in adults. Being able to identify the algorithms employed in this paradigm could provide many avenues for pedagogical changes in second language learning and let teachers harness the processes that allow for efficient learning and improvement of non-native perceptual ability.more » « less