The way listeners perceive speech sounds is largely determined by the language(s) they were exposed to as a child. For example, native speakers of Japanese have a hard time discriminating between American English /ɹ/ and /l/, a phonetic contrast that has no equivalent in Japanese. Such effects are typically attributed to knowledge of sounds in the native language, but quantitative models of how these effects arise from linguistic knowledge are lacking. One possible source for such models is Automatic Speech Recognition (ASR) technology. We implement models based on two types of systems from the ASR literature—hidden Markov models (HMMs) and the more recent, and more accurate, neural network systems—and ask whether, in addition to showing better performance, the neural network systems also provide better models of human perception. We find that while both types of systems can account for Japanese natives’ difficulty with American English /ɹ/ and /l/, only the neural network system successfully accounts for Japanese natives’ facility with Japanese vowel length contrasts. Our work provides a new example, in the domain of speech perception, of an often observed correlation between task performance and similarity to human behavior.
more »
« less
A quantitative model of the language familiarity effect in infancy
Human listeners are better at telling apart speakers of their native language than speakers of other languages, a phenomenon known as the language familiarity effect. The recent observation of such an effect in infants as young as 4.5 months of age (Fecher & Johnson, in press) has led to new difficulties for theories of the effect. On the one hand, retaining classical accounts—which rely on sophisticated knowledge of the native language (Goggin, Thompson, Strube, & Simental, 1991)–requires an explanation of how infants could acquire this knowledge so early. On the other hand, letting go of these accounts requires an explanation of how the effect could arise in the absence of such knowledge. In this paper, we build on algorithms from unsupervised machine learning and zero-resource speech technology to propose, for the first time, a feasible acquisition mechanism for the language familiarity effect in infants. Our results show how, without relying on sophisticated linguistic knowledge, infants could develop a language familiarity effect through statistical modeling at multiple time-scales of the acoustics of the speech signal to which they are exposed.
more »
« less
- Award ID(s):
- 1734245
- PAR ID:
- 10110336
- Date Published:
- Journal Name:
- Proceedings of the Conference on Cognitive Computational Neuroscience
- Page Range / eLocation ID:
- 457-460
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In the language development literature, studies often make inferences about infants’ speech perception abilities based on their responses to a single speaker. However, there can be significant natural variability across speakers in how speech is produced (i.e., inter-speaker differences). The current study examined whether inter-speaker differences can affect infants’ ability to detect a mismatch between the auditory and visual components of vowels. Using an eye-tracker, 4.5-month-old infants were tested on auditory-visual (AV) matching for two vowels (/i/ and /u/). Critically, infants were tested with two speakers who naturally differed in how distinctively they articulated the two vowels within and across the categories. Only infants who watched and listened to the speaker whose visual articulations of the two vowels were most distinct from one another were sensitive to AV mismatch. This speaker also produced a visually more distinct /i/ as compared to the other speaker. This finding suggests that infants are sensitive to the distinctiveness of AV information across speakers, and that when making inferences about infants’ perceptual abilities, characteristics of the speaker should be taken into account.more » « less
-
This study examined the immediate effects of mask-wearing on infant selective visual attention to audiovisual speech in familiar and unfamiliar languages. Infants distribute their selective attention to regions of a speaker's face differentially based on their age and language experience. However, the potential impact wearing a face mask may have on infants' selective attention to audiovisual speech has not been systematically studied. We utilized eye tracking to examine the proportion of infant looking time to the eyes and mouth of a masked or unmasked actress speaking in a familiar or unfamiliar language. Six-month-old and 12-month-old infants (n= 42, 55% female, 91% White Non-Hispanic/Latino) were shown videos of an actress speaking in a familiar language (English) with and without a mask on, as well as videos of the same actress speaking in an unfamiliar language (German) with and without a mask. Overall, infants spent more time looking at the unmasked presentations compared to the masked presentations. Regardless of language familiarity or age, infants spent more time looking at the mouth area of an unmasked speaker and they spent more time looking at the eyes of a masked speaker. These findings indicate mask-wearing has immediate effects on the distribution of infant selective attention to different areas of the face of a speaker during audiovisual speech.more » « less
-
Pragmatics and social meaning: Understanding under-informativeness in native and non-native speakersForeign-accented non-native speakers sometimes face negative biases compared to native speakers. Here we report an advantage in how comprehenders process the speech of non-native compared to native speakers. In a series of four experiments, we find that under-informative sentences are interpreted differently when attributed to non-native compared to native speakers. Specifically, under-informativeness is more likely to be attributed to inability (rather than unwillingness) to say more in non-native as compared to native speakers. This asymmetry has implications for learning: under-informative teachers are more likely to be given a second chance in case they are non-native speakers of the language (presumably because their prior under-informativeness is less likely to be intentional). Our results suggest strong effects of non-native speech on social-pragmatic inferences. Because these effects emerge for written stimuli, they support theories that stress the role of expectations on non-native comprehension, even in the absence of experience with foreign accents. Finally, our data bear on pragmatic theories of how speaker identity affects language comprehension and show how such theories offer an integrated framework for explaining how non-native language can lead to (sometimes unexpected) social meanings.more » « less
-
In the first year of life, infants' speech perception becomes attuned to the sounds of their native language. Many accounts of this early phonetic learning exist, but computational models predicting the attunement patterns observed in infants from the speech input they hear have been lacking. A recent study presented the first such model, drawing on algorithms proposed for unsupervised learning from naturalistic speech, and tested it on a single phone contrast. Here we study five such algorithms, selected for their potential cognitive relevance. We simulate phonetic learning with each algorithm and perform tests on three phone contrasts from different languages, comparing the results to infants' discrimination patterns. The five models display varying degrees of agreement with empirical observations, showing that our approach can help decide between candidate mechanisms for early phonetic learning, and providing insight into which aspects of the models are critical for capturing infants' perceptual development.more » « less
An official website of the United States government

