Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Linking cognitive and neural models of audiovisual processing to explore speech perception in autismAutistic and neurotypical children do not handle audiovisual speech in the same manner. Current evidence suggests that this difference occurs at the level of cue combination. Here, we test whether differences in autistic and neurotypical audiovisual speech perception can be explained by a neural theory of sensory perception in autism, which proposes that heightened levels of neural excitation can account for sensory differences in autism. Through a linking hypothesis that integrates a standard probabilistic cognitive model of cue integration with representations of neural activity, we derive a model that can simulate audio-visual speech perception at a neural population level. Simulations of an audiovisual lexical identification task demonstrate that heightened levels of neural excitation at the level of cue combination cannot account for the observed differences in autistic and neurotypical children's audiovisual speech perception.more » « lessFree, publicly-accessible full text available July 24, 2025
-
It has long been assumed that infants' ability to discriminate between languages stems from their sensitivity to speech rhythm, i.e., organized temporal structure of vowels and consonants in a language. However, the relationship between speech rhythm and language discrimination has not been directly demonstrated. Here, we use computational modeling and train models of speech perception with and without access to information about rhythm. We test these models on language discrimination, and find that access to rhythm does not affect the success of the model in replicating infant language discrimination results. Our findings challenge the relationship between rhythm and language discrimination, and have implications for theories of language acquisition.more » « lessFree, publicly-accessible full text available July 24, 2025
-
Listeners typically rely more on one aspect of the speech signal than another when categorizing speech sounds. This is known as feature weighting. We present a rate distortion theory model of feature weighting and use it to ask whether human listeners select feature weights simply by mirroring the feature reliabilities that are present in their input. We show that there is an additional component (selective attention) listeners appear to use that is not reflected by the input statistics. This suggests that an internal mechanism is at play in governing listeners' weighting of different aspects of the speech signal, in addition to tracking statistics.more » « less
-
Speech perception is complex and demands constant adaptations to the speaker and the environment (i.e. noisy speech, accent, etc.). To adapt, the listener relies on one speech feature more than another. This cognitive mechanism is called selective attention. We present a model that captures the idea of selective attention: we show that this dynamic adaptation process can be captured in a neural architecture by using a multiple encoder beta variational auto encoder (beta-ME-VAE), which is based on rate distortion theory. This model implements the idea that optimal feature weighting looks different under different listening conditions and provides insight into how listeners can adapt their listening strategy on a moment-to-moment basis, even in listening situations they haven't experienced before.more » « less
-
At birth, infants discriminate most of the sounds of the world’s languages, but by age 1, infants become language-specific listeners. This has generally been taken as evidence that infants have learned which acoustic dimensions are contrastive, or useful for distinguishing among the sounds of their language(s), and have begun focusing primarily on those dimensions when perceiving speech. However, speech is highly variable, with different sounds overlapping substantially in their acoustics, and after decades of research, we still do not know what aspects of the speech signal allow infants to differentiate contrastive from noncontrastive dimensions. Here we show that infants could learn which acoustic dimensions of their language are contrastive, despite the high acoustic variability. Our account is based on the cross-linguistic fact that even sounds that overlap in their acoustics differ in the contexts they occur in. We predict that this should leave a signal that infants can pick up on and show that acoustic distributions indeed vary more by context along contrastive dimensions compared with noncontrastive dimensions. By establishing this difference, we provide a potential answer to how infants learn about sound contrasts, a question whose answer in natural learning environments has remained elusive.more » « less
-
Rhythm plays an important role in language perception and learning, with infants perceiving rhythmic differences across languages at birth. While the mechanisms underlying rhythm perception in speech remain unclear, one interesting possibility is that these mechanisms are similar to those involved in the perception of musical rhythm. In this work, we adopt a model originally designed for musical rhythm to simulate speech rhythm perception. We show that this model replicates the behavioral results of language discrimination in newborns, and outperforms an existing model of infant language discrimination. We also find that percussives — fast-changing components in the acoustics — are necessary for distinguishing languages of different rhythms, which suggests that percussives are essential for rhythm perception. Our music-inspired model of speech rhythm may be seen as a first step towards a unified theory of how rhythm is represented in speech and music.more » « less
-
null (Ed.)We incorporate social reasoning about groups of informants into a model of word learning, and show that the model accounts for infant looking behavior in tasks of both word learning and recognition. Simulation 1 models an experiment where 16-month-old infants saw familiar objects labeled either correctly or incorrectly, by either adults or audio talkers. Simulation 2 reinterprets puzzling data from the Switch task, an audiovisual habituation procedure wherein infants are tested on familiarized associations between novel objects and labels. Eight-month-olds outperform 14-month-olds on the Switch task when required to distinguish labels that are minimal pairs (e.g., “buk” and “puk”), but 14-month-olds' performance is improved by habituation stimuli featuring multiple talkers. Our modeling results support the hypothesis that beliefs about knowledgeability and group membership guide infant looking behavior in both tasks. These results show that social and linguistic development interact in non-trivial ways, and that social categorization findings in developmental psychology could have substantial implications for understanding linguistic development in realistic settings where talkers vary according to observable features correlated with social groupings, including linguistic, ethnic, and gendered groups.more » « less
-
null (Ed.)Non-native speakers show difficulties with spoken word processing. Many studies attribute these difficulties to imprecise phonological encoding of words in the lexical memory. We test an alternative hypothesis: that some of these difficulties can arise from the non-native speakers' phonetic perception. We train a computational model of phonetic learning, which has no access to phonology, on either one or two languages. We first show that the model exhibits predictable behaviors on phone-level and word-level discrimination tasks. We then test the model on a spoken word processing task, showing that phonology may not be necessary to explain some of the word processing effects observed in non-native speakers. We run an additional analysis of the model's lexical representation space, showing that the two training languages are not fully separated in that space, similarly to the languages of a bilingual human speaker.more » « less