The way listeners perceive speech sounds is largely determined by the language(s) they were exposed to as a child. For example, native speakers of Japanese have a hard time discriminating between American English /ɹ/ and /l/, a phonetic contrast that has no equivalent in Japanese. Such effects are typically attributed to knowledge of sounds in the native language, but quantitative models of how these effects arise from linguistic knowledge are lacking. One possible source for such models is Automatic Speech Recognition (ASR) technology. We implement models based on two types of systems from the ASR literature—hidden Markov models (HMMs) and the more recent, and more accurate, neural network systems—and ask whether, in addition to showing better performance, the neural network systems also provide better models of human perception. We find that while both types of systems can account for Japanese natives’ difficulty with American English /ɹ/ and /l/, only the neural network system successfully accounts for Japanese natives’ facility with Japanese vowel length contrasts. Our work provides a new example, in the domain of speech perception, of an often observed correlation between task performance and similarity to human behavior.
more »
« less
A quantitative model of the language familiarity effect in infancy
Human listeners are better at telling apart speakers of their native language than speakers of other languages, a phenomenon known as the language familiarity effect. The recent observation of such an effect in infants as young as 4.5 months of age (Fecher & Johnson, in press) has led to new difficulties for theories of the effect. On the one hand, retaining classical accounts—which rely on sophisticated knowledge of the native language (Goggin, Thompson, Strube, & Simental, 1991)–requires an explanation of how infants could acquire this knowledge so early. On the other hand, letting go of these accounts requires an explanation of how the effect could arise in the absence of such knowledge. In this paper, we build on algorithms from unsupervised machine learning and zero-resource speech technology to propose, for the first time, a feasible acquisition mechanism for the language familiarity effect in infants. Our results show how, without relying on sophisticated linguistic knowledge, infants could develop a language familiarity effect through statistical modeling at multiple time-scales of the acoustics of the speech signal to which they are exposed.
more »
« less
- Award ID(s):
- 1734245
- PAR ID:
- 10110336
- Date Published:
- Journal Name:
- Proceedings of the Conference on Cognitive Computational Neuroscience
- Page Range / eLocation ID:
- 457-460
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In the language development literature, studies often make inferences about infants’ speech perception abilities based on their responses to a single speaker. However, there can be significant natural variability across speakers in how speech is produced (i.e., inter-speaker differences). The current study examined whether inter-speaker differences can affect infants’ ability to detect a mismatch between the auditory and visual components of vowels. Using an eye-tracker, 4.5-month-old infants were tested on auditory-visual (AV) matching for two vowels (/i/ and /u/). Critically, infants were tested with two speakers who naturally differed in how distinctively they articulated the two vowels within and across the categories. Only infants who watched and listened to the speaker whose visual articulations of the two vowels were most distinct from one another were sensitive to AV mismatch. This speaker also produced a visually more distinct /i/ as compared to the other speaker. This finding suggests that infants are sensitive to the distinctiveness of AV information across speakers, and that when making inferences about infants’ perceptual abilities, characteristics of the speaker should be taken into account.more » « less
-
Pragmatics and social meaning: Understanding under-informativeness in native and non-native speakersForeign-accented non-native speakers sometimes face negative biases compared to native speakers. Here we report an advantage in how comprehenders process the speech of non-native compared to native speakers. In a series of four experiments, we find that under-informative sentences are interpreted differently when attributed to non-native compared to native speakers. Specifically, under-informativeness is more likely to be attributed to inability (rather than unwillingness) to say more in non-native as compared to native speakers. This asymmetry has implications for learning: under-informative teachers are more likely to be given a second chance in case they are non-native speakers of the language (presumably because their prior under-informativeness is less likely to be intentional). Our results suggest strong effects of non-native speech on social-pragmatic inferences. Because these effects emerge for written stimuli, they support theories that stress the role of expectations on non-native comprehension, even in the absence of experience with foreign accents. Finally, our data bear on pragmatic theories of how speaker identity affects language comprehension and show how such theories offer an integrated framework for explaining how non-native language can lead to (sometimes unexpected) social meanings.more » « less
-
In the first year of life, infants' speech perception becomes attuned to the sounds of their native language. Many accounts of this early phonetic learning exist, but computational models predicting the attunement patterns observed in infants from the speech input they hear have been lacking. A recent study presented the first such model, drawing on algorithms proposed for unsupervised learning from naturalistic speech, and tested it on a single phone contrast. Here we study five such algorithms, selected for their potential cognitive relevance. We simulate phonetic learning with each algorithm and perform tests on three phone contrasts from different languages, comparing the results to infants' discrimination patterns. The five models display varying degrees of agreement with empirical observations, showing that our approach can help decide between candidate mechanisms for early phonetic learning, and providing insight into which aspects of the models are critical for capturing infants' perceptual development.more » « less
-
Abstract The current study utilized eye-tracking to investigate the effects of intersensory redundancy and language on infant visual attention and detection of a change in prosody in audiovisual speech. Twelve-month-old monolingual English-learning infants viewed either synchronous (redundant) or asynchronous (non-redundant) presentations of a woman speaking in native or non-native speech. Halfway through each trial, the speaker changed prosody from infant-directed speech (IDS) to adult-directed speech (ADS) or vice versa. Infants focused more on the mouth of the speaker on IDS trials compared to ADS trials regardless of language or intersensory redundancy. Additionally, infants demonstrated greater detection of prosody changes from IDS speech to ADS speech in native speech. Planned comparisons indicated that infants detected prosody changes across a broader range of conditions during redundant stimulus presentations. These findings shed light on the influence of language and prosody on infant attention and highlight the complexity of audiovisual speech processing in infancy.more » « less