skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Quantifying the role of vocabulary knowledge in predicting future word learning
Can we predict the words a child is going to learn next given information about the words that a child knows now? Do different representations of a child’s vocabulary knowledge affect our ability to predict the acquisition of lexical items for individual children? Past research has often focused on population statistics of vocabulary growth rather than prediction of words an individual child is likely to learn next. We consider a neural network approach to predict vocabulary acquisition. Specifically, we investigate how best to represent the child’s current vocabulary in order to accurately predict future learning. The models we consider are based on qualitatively different sources of information: descriptive information about the child, the specific words a child knows, and representations that aim to capture the child’s aggregate lexical knowledge. Using longitudinal vocabulary data from children aged 15-36 months, we construct neural network models to predict which words are likely to be learned by a particular child in the coming month. Many models based on child-specific vocabulary information outperform models with child information only, suggesting that the words a child knows influence prediction of future language learning. These models provide an understanding of the role of current vocabulary knowledge on future lexical growth.  more » « less
Award ID(s):
1631428
PAR ID:
10113806
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE transactions on cognitive and developmental systems
ISSN:
2379-8920
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Intelligent tutoring systems (ITS) provide educational benefits through one-on-one tutoring by assessing children’s existing knowledge and providing tailored educational content. In the domain of language acquisition, several studies have shown that children often learn new words by forming semantic relationships with words they already know. In this paper, we present a model that uses word semantics (semantics-based model) to make inferences about a child’s vocabulary from partial information about their existing vocabulary knowledge. We show that the proposed semantics-based model outperforms models that do not use word semantics (semantics-free models) on average. A subject-level analysis of results reveals that different models perform well for different children, thus motivating the need to combine predictions. To this end, we use two methods to combine predictions from semantics-based and semantics-free models and show that these methods yield better predictions of a child’s vocabulary knowledge. Our results motivate the use of semantics-based models to assess children’s vocabulary knowledge and build ITS that maximizes children’s semantic understanding of words. 
    more » « less
  2. Speech and language development in children are crucial for ensuring effective skills in their long-term learning ability. A child’s vocabulary size at the time of entry into kindergarten is an early indicator of their learning ability to read and potential long-term success in school. The preschool classroom is thus a promising venue for assessing growth in young children by measuring their interactions with teachers as well as classmates. However, to date limited studies have explored such naturalistic audio communications. Automatic Speech Recognition (ASR) technologies provide an opportunity for ’Early Childhood’ researchers to obtain knowledge through automatic analysis of naturalistic classroom recordings in measuring such interactions. For this purpose, 208 hours of audio recordings across 48 daylong sessions are collected in a childcare learning center in the United States using Language Environment Analysis (LENA) devices worn by the preschool children. Approximately 29 hours of adult speech and 26 hours of child speech is segmented using manual transcriptions provided by CRSS transcription team. Traditional as well as End-to-End ASR models are trained on adult/child speech data subset. Factorized Time Delay Neural Network provides a best Word-Error-Rate (WER) of 35.05% on the adult subset of the test set. End-to-End transformer models achieve 63.5% WER on the child subset of the test data. Next, bar plots demonstrating the frequency of WH-question words in Science vs. Reading activity areas of the preschool are presented for sessions in the test set. It is suggested that learning spaces could be configured to encourage greater adult-child conversational engagement given such speech/audio assessment strategies. 
    more » « less
  3. null (Ed.)
    Before formal education begins, children typically acquire a vocabulary of thousands of words. This learning process requires the use of many different information sources in their social environment, including their current state of knowledge and the context in which they hear words used. How is this information integrated? We specify a developmental model according to which children consider information sources in an age-specific way and integrate them via Bayesian inference. This model accurately predicted 2–5-year-old children’s word learning across a range of experimental conditions in which they had to integrate three information sources. Model comparison suggests that the central locus of development is an increased sensitivity to individual information sources, rather than changes in integration ability. This work presents a developmental theory of information integration during language learning and illustrates how formal models can be used to make a quantitative test of the predictive and explanatory power of competing theories. 
    more » « less
  4. null (Ed.)
    Abstract Lexical tones are widely believed to be a formidable learning challenge for adult speakers of nontonal languages. While difficulties—as well as rapid improvements—are well documented for beginning second language (L2) learners, research with more advanced learners is needed to understand how tone perception difficulties impact word recognition once learners have a substantial vocabulary. The present study narrows in on difficulties suggested in previous work, which found a dissociation in advanced L2 learners between highly accurate tone identification and largely inaccurate lexical decision for tone words. We investigate a “best-case scenario” for advanced L2 tone word processing by testing performance in nearly ideal listening conditions—with words spoken clearly and in isolation. Under such conditions, do learners still have difficulty in lexical decision for tone words? If so, is it driven by the quality of lexical representations or by L2 processing routines? Advanced L2 and native Chinese listeners made lexical decisions while an electroencephalogram was recorded. Nonwords had a first syllable with either a vowel or tone that differed from that of a common disyllabic word. As a group, L2 learners performed less accurately when tones were manipulated than when vowels were manipulated. Subsequent analyses showed that this was the case even in the subset of items for which learners showed correct and confident tone identification in an offline written vocabulary test. Event-related potential results indicated N400 effects for both nonword conditions in L1, but only vowel N400 effects in L2, with tone responses intermediate between those of real words and vowel nonwords. These results are evidence of the persistent difficulty most L2 learners have in using tones for online word recognition, and indicate it is driven by a confluence of factors related to both L2 lexical representations and processing routines. We suggest that this tone nonword difficulty has real-world implications for learners: It may result in many toneless word representations in their mental lexicons, and is likely to affect the efficiency with which they can learn new tone words. 
    more » « less
  5. Socioeconomic status (SES) has been repeatedly linked to the developmental trajectory of vocabulary acquisition in young children. However, the nature of this relationship remains underspecified. In particular, despite an extensive literature documenting young children's reliance on a host of skills and strategies to learn new words, little attention has been paid to whether and how these skills relate to measures of SES and vocabulary acquisition. To evaluate these relationships, we conducted two studies. In Study 1, 205 2.5‐ to 3.5‐year‐old children from widely varying socioeconomic backgrounds were tested on a broad range of word‐learning skills that tap their ability to resolve cases of ambiguous reference and to extend words appropriately. Children's executive functioning and phonological memory skills were also assessed. In Study 2, 77 of those children returned for a follow‐up session several months later, at which time two additional measures of vocabulary were obtained. Using Structural Equation Modeling (SEM) and multivariate regression, we provide evidence of the mediating role of word‐learning skills on the relationship between SES and vocabulary skill over the course of early development. 
    more » « less