Can we predict the words a child is going to learn next given information about the words that a child knows now? Do different representations of a child’s vocabulary knowledge affect our ability to predict the acquisition of lexical items for individual children? Past research has often focused on population statistics of vocabulary growth rather than prediction of words an individual child is likely to learn next. We consider a neural network approach to predict vocabulary acquisition. Specifically, we investigate how best to represent the child’s current vocabulary in order to accurately predict future learning. The models we consider are based on qualitatively different sources of information: descriptive information about the child, the specific words a child knows, and representations that aim to capture the child’s aggregate lexical knowledge. Using longitudinal vocabulary data from children aged 15-36 months, we construct neural network models to predict which words are likely to be learned by a particular child in the coming month. Many models based on child-specific vocabulary information outperform models with child information only, suggesting that the words a child knows influence prediction of future language learning. These models provide an understanding of the role of current vocabulary knowledge on future lexical growth.
more »
« less
A Semantics-based Model for Predicting Children's Vocabulary
Intelligent tutoring systems (ITS) provide educational benefits through one-on-one tutoring by assessing children’s existing knowledge and providing tailored educational content. In the domain of language acquisition, several studies have shown that children often learn new words by forming semantic relationships with words they already know. In this paper, we present a model that uses word semantics (semantics-based model) to make inferences about a child’s vocabulary from partial information about their existing vocabulary knowledge. We show that the proposed semantics-based model outperforms models that do not use word semantics (semantics-free models) on average. A subject-level analysis of results reveals that different models perform well for different children, thus motivating the need to combine predictions. To this end, we use two methods to combine predictions from semantics-based and semantics-free models and show that these methods yield better predictions of a child’s vocabulary knowledge. Our results motivate the use of semantics-based models to assess children’s vocabulary knowledge and build ITS that maximizes children’s semantic understanding of words.
more »
« less
- Award ID(s):
- 1734443
- PAR ID:
- 10108262
- Date Published:
- Journal Name:
- IJCAI
- ISSN:
- 1045-0823
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Autonomous educational social robots can be used to help promote literacy skills in young children. Such robots, which emulate the emotive, perceptual, and empathic abilities of human teachers, are capable of replicating some of the benefits of one-on-one tutoring from human teachers, in part by leveraging individual student’s behavior and task performance data to infer sophisticated models of their knowledge. These student models are then used to provide personalized educational experiences by, for example, determining the optimal sequencing of curricular material. In this paper, we introduce an integrated system for autonomously analyzing and assessing children’s speech and pronunciation in the context of an interactive word game between a social robot and a child. We present a novel game environment and its computational formulation, an integrated pipeline for capturing and analyzing children’s speech in real-time, and an autonomous robot that models children’s word pronunciation via Gaussian Process Regression (GPR), augmented with an Active Learning protocol that informs the robot’s behavior. We show that the system is capable of autonomously assessing children’s pronunciation ability, with ground truth determined by a post-experiment evaluation by human raters. We also compare phoneme- and word-level GPR models and discuss trade-offs of each approach in modeling children’s pronunciation. Finally, we describe and analyze a pipeline for automatic analysis of children’s speech and pronunciation, including an evaluation of Speech Ace as a tool for future development of autonomous, speech-based language tutors.more » « less
-
null (Ed.)Building a robust vocabulary in grade school is essential for academic success. Children from lower socioeconomic status (SES) households on average perform below their higher SES peers on word learning tasks, negatively impacting their vocabulary; however, significant variability exists within this group. Many children from low SES homes perform as well as, or better than, their higher SES peers on measures of word learning. The current study addresses what processes underlie this variability, by comparing the neural oscillations of 44 better versus worse word learners (ages 8–15 years) from lower SES households as they infer the meaning of unknown words. Better word learners demonstrated increases in theta and beta power as a word was learned, whereas worse word learners exhibited decreases in alpha power. These group differences in neural oscillatory engagement during word learning indicate there may be different strategies employed based on differences in children’s skills. Notably, children with greater vocabulary knowledge are more likely to exhibit larger beta increases, a strategy that is associated with better word learning. This sheds new light on the mechanisms that support word learning in children from low SES households.more » « less
-
null (Ed.)Before formal education begins, children typically acquire a vocabulary of thousands of words. This learning process requires the use of many different information sources in their social environment, including their current state of knowledge and the context in which they hear words used. How is this information integrated? We specify a developmental model according to which children consider information sources in an age-specific way and integrate them via Bayesian inference. This model accurately predicted 2–5-year-old children’s word learning across a range of experimental conditions in which they had to integrate three information sources. Model comparison suggests that the central locus of development is an increased sensitivity to individual information sources, rather than changes in integration ability. This work presents a developmental theory of information integration during language learning and illustrates how formal models can be used to make a quantitative test of the predictive and explanatory power of competing theories.more » « less
-
Many changes in our digital corpus have been brought about by the interplay between rapid advances in digital communication and the current environment characterized by pandemics, political polarization, and social unrest. One such change is the pace with which new words enter the mass vocabulary and the frequency at which meanings, perceptions, and interpretations of existing expressions change. The current state-of-the-art algorithms do not allow for an intuitive and rigorous detection of these changes in word meanings over time. We propose a dynamic graph-theoretic approach to inferring the semantics of words and phrases (“terms”) and detecting temporal shifts. Our approach represents each term as a stochastic time-evolving set of contextual words and is a count-based distributional semantic model in nature. We use local clustering techniques to assess the structural changes in a given word’s contextual words. We demonstrate the efficacy of our method by investigating the changes in the semantics of the phrase “Chinavirus”. We conclude that the term took on a much more pejorative meaning when the White House used the term in the second half of March 2020, although the effect appears to have been temporary. We make both the dataset and the code used to generate this paper’s results available.more » « less