skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM to 12:00 PM ET on Tuesday, March 25 due to maintenance. We apologize for the inconvenience.


Title: Evaluating computational models of infant phonetic learning across languages
In the first year of life, infants' speech perception becomes attuned to the sounds of their native language. Many accounts of this early phonetic learning exist, but computational models predicting the attunement patterns observed in infants from the speech input they hear have been lacking. A recent study presented the first such model, drawing on algorithms proposed for unsupervised learning from naturalistic speech, and tested it on a single phone contrast. Here we study five such algorithms, selected for their potential cognitive relevance. We simulate phonetic learning with each algorithm and perform tests on three phone contrasts from different languages, comparing the results to infants' discrimination patterns. The five models display varying degrees of agreement with empirical observations, showing that our approach can help decide between candidate mechanisms for early phonetic learning, and providing insight into which aspects of the models are critical for capturing infants' perceptual development.  more » « less
Award ID(s):
1734245
PAR ID:
10176647
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the Annual Conference of the Cognitive Science Society
ISSN:
1069-7977
Page Range / eLocation ID:
571-577
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Before they even speak, infants become attuned to the sounds of the language(s) they hear, processing native phonetic contrasts more easily than nonnative ones. For example, between 6 to 8 mo and 10 to 12 mo, infants learning American English get better at distinguishing English and [l], as in “rock” vs. “lock,” relative to infants learning Japanese. Influential accounts of this early phonetic learning phenomenon initially proposed that infants group sounds into native vowel- and consonant-like phonetic categories—like and [l] in English—through a statistical clustering mechanism dubbed “distributional learning.” The feasibility of this mechanism for learning phonetic categories has been challenged, however. Here, we demonstrate that a distributional learning algorithm operating on naturalistic speech can predict early phonetic learning, as observed in Japanese and American English infants, suggesting that infants might learn through distributional learning after all. We further show, however, that, contrary to the original distributional learning proposal, our model learns units too brief and too fine-grained acoustically to correspond to phonetic categories. This challenges the influential idea that what infants learn are phonetic categories. More broadly, our work introduces a mechanism-driven approach to the study of early phonetic learning, together with a quantitative modeling framework that can handle realistic input. This allows accounts of early phonetic learning to be linked to concrete, systematic predictions regarding infants’ attunement. 
    more » « less
  2. null (Ed.)
    Abstract Early changes in infants’ ability to perceive native and nonnative speech sound contrasts are typically attributed to their developing knowledge of phonetic categories. We critically examine this hypothesis and argue that there is little direct evidence of category knowledge in infancy. We then propose an alternative account in which infants’ perception changes because they are learning a perceptual space that is appropriate to represent speech, without yet carving up that space into phonetic categories. If correct, this new account has substantial implications for understanding early language development. 
    more » « less
  3. Human listeners are better at telling apart speakers of their native language than speakers of other languages, a phenomenon known as the language familiarity effect. The recent observation of such an effect in infants as young as 4.5 months of age (Fecher & Johnson, in press) has led to new difficulties for theories of the effect. On the one hand, retaining classical accounts—which rely on sophisticated knowledge of the native language (Goggin, Thompson, Strube, & Simental, 1991)–requires an explanation of how infants could acquire this knowledge so early. On the other hand, letting go of these accounts requires an explanation of how the effect could arise in the absence of such knowledge. In this paper, we build on algorithms from unsupervised machine learning and zero-resource speech technology to propose, for the first time, a feasible acquisition mechanism for the language familiarity effect in infants. Our results show how, without relying on sophisticated linguistic knowledge, infants could develop a language familiarity effect through statistical modeling at multiple time-scales of the acoustics of the speech signal to which they are exposed. 
    more » « less
  4. In acquiring language, differences in input can greatly affect learning outcomes, but which aspects of language learning are most sensitive to input variations, and which are robust, remains debated. A recent modeling study successfully reproduced a phenomenon empirically observed in early phonetic learning---learning about the sounds of the native language in the first year of life---despite using input that differed in quantity and speaker composition from what a typical infant would hear. In this paper, we carry out a direct test of that model's robustness to input variations. We find that, despite what the original result suggested, the learning outcomes are sensitive to properties of the input and that more plausible input leads to a better fit with empirical observations. This has implications for understanding early phonetic learning in infants and underscores the importance of using realistic input in models of language acquisition. 
    more » « less
  5. Abstract Psycholinguistic research on children's early language environments has revealed many potential challenges for language acquisition. One is that in many cases, referents of linguistic expressions are hard to identify without prior knowledge of the language. Likewise, the speech signal itself varies substantially in clarity, with some productions being very clear, and others being phonetically reduced, even to the point of uninterpretability. In this study, we sought to better characterize the language‐learning environment of American English‐learning toddlers by testing how well phonetic clarity and referential clarity align in infant‐directed speech. Using an existing Human Simulation Paradigm (HSP) corpus with referential transparency measurements and adding new measures of phonetic clarity, we found that the phonetic clarity of words’ first mentions significantly predicted referential clarity (how easy it was to guess the intended referent from visual information alone) at that moment. Thus, when parents’ speech was especially clear, the referential semantics were also clearer. This suggests that young children could use the phonetics of speech to identify globally valuable instances that support better referential hypotheses, by homing in on clearer instances and filtering out less‐clear ones. Such multimodal “gems” offer special opportunities for early word learning. Research HighlightsIn parent‐infant interaction, parents’ referential intentions are sometimes clear and sometimes unclear; likewise, parents’ pronunciation is sometimes clear and sometimes quite difficult to understand.We find that clearer referential instances go along with clearer phonetic instances, more so than expected by chance.Thus, there are globally valuable instances (“gems”) from which children could learn about words’ pronunciations and words’ meanings at the same time.Homing in on clear phonetic instances and filtering out less‐clear ones would help children identify these multimodal “gems” during word learning. 
    more » « less