skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Predicting Age of Acquisition for Children's Early Vocabulary in Five Languages Using Language Model Surprisal
Abstract What makes a word easy to learn? Early‐learned words are frequent and tend to name concrete referents. But words typically do not occur in isolation. Some words are predictable from their contexts; others are less so. Here, we investigate whether predictability relates to when children start producing different words (age of acquisition; AoA). We operationalized predictability in terms of a word's surprisal in child‐directed speech, computed using n‐gram and long‐short‐term‐memory (LSTM) language models. Predictability derived from LSTMs was generally a better predictor than predictability derived from n‐gram models. Across five languages, average surprisal was positively correlated with the AoA of predicates and function words but not nouns. Controlling for concreteness and word frequency, more predictable predicates and function words were learned earlier. Differences in predictability between languages were associated with cross‐linguistic differences in AoA: the same word (when it was a predicate) was produced earlier in languages where the word was more predictable.  more » « less
Award ID(s):
2020969
PAR ID:
10478983
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Cognitive Science Society
Date Published:
Journal Name:
Cognitive Science
Volume:
47
Issue:
9
ISSN:
0364-0213
Page Range / eLocation ID:
e13334
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Surprisal theory posits that less-predictable words should take more time to process, with word predictability quantified as surprisal, i.e., negative log probability in context. While evidence supporting the predictions of surprisal theory has been replicated widely, much of it has focused on a very narrow slice of data: native English speakers reading English texts. Indeed, no comprehensive multilingual analysis exists. We address this gap in the current literature by investigating the relationship between surprisal and reading times in eleven different languages, distributed across five language families. Deriving estimates from language models trained on monolingual and multilingual corpora, we test three predictions associated with surprisal theory: (i) whether surprisal is predictive of reading times, (ii) whether expected surprisal, i.e., contextual entropy, is predictive of reading times, and (iii) whether the linking function between surprisal and reading times is linear. We find that all three predictions are borne out crosslinguistically. By focusing on a more diverse set of languages, we argue that these results offer the most robust link to date between information theory and incremental language processing across languages. 
    more » « less
  2. Abstract Source code is a form of human communication, albeit one where the information shared between the programmers reading and writing the code is constrained by the requirement that the code executes correctly. Programming languages are more syntactically constrained than natural languages, but they are also very expressive, allowing a great many different ways to express even very simple computations. Still, code written by developers is highly predictable, and many programming tools have taken advantage of this phenomenon, relying on language modelsurprisalas a guiding mechanism. While surprisal has been validated as a measure of cognitive load in natural language, its relation to human cognitive processes in code is still poorly understood. In this paper, we explore the relationship between surprisal and programmer preference at a small granularity—do programmers prefer more predictable expressions in code? Usingmeaning‐preserving transformations, we produce equivalent alternatives to developer‐written code expressions and run a corpus study on Java and Python projects. In general, language models rate the code expressions developerschooseto write as more predictable than these transformed alternatives. Then, we perform two human subject studies asking participants to choose between two equivalent snippets of Java code with different surprisal scores (one original and transformed). We find that programmersdoprefer more predictable variants, and that stronger language models like the transformer align more often and more consistently with these preferences. 
    more » « less
  3. Abstract Partial speech input is often understood to trigger rapid and automatic activation of successively higher-level representations of words, from sound to meaning. Here we show evidence from magnetoencephalography that this type of incremental processing is limited when words are heard in isolation as compared to continuous speech. This suggests a less unified and automatic word recognition process than is often assumed. We present evidence from isolated words that neural effects of phoneme probability, quantified by phoneme surprisal, are significantly stronger than (statistically null) effects of phoneme-by-phoneme lexical uncertainty, quantified by cohort entropy. In contrast, we find robust effects of both cohort entropy and phoneme surprisal during perception of connected speech, with a significant interaction between the contexts. This dissociation rules out models of word recognition in which phoneme surprisal and cohort entropy are common indicators of a uniform process, even though these closely related information-theoretic measures both arise from the probability distribution of wordforms consistent with the input. We propose that phoneme surprisal effects reflect automatic access of a lower level of representation of the auditory input (e.g., wordforms) while the occurrence of cohort entropy effects is task sensitive, driven by a competition process or a higher-level representation that is engaged late (or not at all) during the processing of single words. 
    more » « less
  4. To learn new words, particularly verbs, child learners have been shown to benefit from the linguistic contexts in which the words appear. However, cross-linguistic differences affect how this process unfolds. One previous study found that children’s abilities to learn a new verb differed across Korean and English as a function of the sentence in which the verb occurred. The authors hypothesized that the properties of word order and argument drop, which vary systematically in these two languages, were driving the differences. In the current study, we pursued this finding to ask if the difference persists later in development, or if children acquiring different languages come to appear more similar as their linguistic knowledge and learning capacities increase. Preschool-aged monolingual English learners (N = 80) and monolingual Korean learners (N = 64) were presented with novel verbs in contexts that varied in word order and argument drop and accompanying visual stimuli. We assessed their learning by measuring accuracy in a forced-choice pointing task, and we measured eye gaze during the learning phase as an indicator of the processes by which they mapped the novel verbs to meaning. Unlike previous studies which identified differences between English and Korean learning 2-year-olds in a similar task, our results revealed similarities between the two language groups with these older preschoolers. We interpret our results as evidence that over the course of early childhood, children become adept at learning from a large variety of contexts, such that differences between learners of different languages are attenuated. 
    more » « less
  5. Nölle, J; Raviv, L; Graham, E; Hartmann, S; Jadoul, Y; Josserand, M; Matzinger, T; Mudd, K; Pleyer, M; Slonimska, A (Ed.)
    Why are some words more frequent than others? Surprisingly, the obvious answers to this seemingly simple question, e.g., that frequent words reflect greater communicative needs, are either wrong or incomplete. We show that a word’s frequency is strongly associated with its position in a semantic association network. More centrally located words are more frequent. But is a word’s centrality in a network merely a reflection of inherent centrality of the word’s meaning? Through cross-linguistic comparisons, we found that differences in the frequency of translation-equivalents are predicted by differences in the word’s network structures in the different languages. Specifically, frequency was linked to how many connections a word had and to its capacity to bridge words that are typically not linked. This hints that a word’s frequency (and with it, its meaning) may change as a function of the word’s association with other words. 
    more » « less