Can we predict the words a child is going to learn next given information about the words that a child knows now? Do different representations of a child’s vocabulary knowledge affect our ability to predict the acquisition of lexical items for individual children? Past research has often focused on population statistics of vocabulary growth rather than prediction of words an individual child is likely to learn next. We consider a neural network approach to predict vocabulary acquisition. Specifically, we investigate how best to represent the child’s current vocabulary in order to accurately predict future learning. The models we consider are based on qualitatively different sources of information: descriptive information about the child, the specific words a child knows, and representations that aim to capture the child’s aggregate lexical knowledge. Using longitudinal vocabulary data from children aged 15-36 months, we construct neural network models to predict which words are likely to be learned by a particular child in the coming month. Many models based on child-specific vocabulary information outperform models with child information only, suggesting that the words a child knows influence prediction of future language learning. These models provide an understanding of the role of current vocabulary knowledge on future lexical growth.
more »
« less
Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition
Human language acquisition is an efficient, supervised, and continual process. In this work, we took inspiration from how human babies acquire their first language, and developed a computational process for word acquisition through comparative learning. Motivated by cognitive findings, we generated a small dataset that enables the computation models to compare the similarities and differences of various attributes, learn to filter out and extract the common information for each shared linguistic label. We frame the acquisition of words as not only the information filtration process, but also as representation-symbol mapping. This procedure does not involve a fixed vocabulary size, nor a discriminative objective, and allows the models to continually learn more concepts efficiently. Our results in controlled experiments have shown the potential of this approach for efficient continual learning of grounded words.
more »
« less
- Award ID(s):
- 1949634
- PAR ID:
- 10472504
- Publisher / Repository:
- ACL 2023
- Date Published:
- Journal Name:
- The 61th Annual Meeting of the Association for Computational Linguistics
- ISBN:
- 978-1-959429-72-2
- Format(s):
- Medium: X
- Location:
- Toronto, Canada
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
To ascertain the importance of phonetic information in the form of phonological distinctive features for the purpose of segment-level phonotactic acquisition, we compare the performance of two recurrent neural network models of phonotactic learning: one that has access to distinctive features at the start of the learning process, and one that does not. Though the predictions of both models are significantly correlated with human judgments of non-words, the feature-naive model significantly outperforms the feature-aware one in terms of probability assigned to a held-out test set of English words, suggesting that distinctive features are not obligatory for learning phonotactic patterns at the segment level.more » « less
-
The ability to connect language units to their referents in the physical world, referred to as grounding, is crucial to learning and understanding grounded meanings of words. While humans demonstrate fast mapping in new word learning, it remains unclear whether modern vision-language models can truly represent language with their grounded meanings, and how grounding may further bootstrap new word learning. To this end, we introduce Grounded Open Vocabulary Acquisition (GOVA) to examine grounding and bootstrapping in open-world language learning. As an initial attempt, we propose object-oriented BERT (OctoBERT), a novel visually-grounded language model by pre-training on image-text pairs highlighting grounding as an objective. Through extensive experiments and analysis, we demonstrate that OctoBERT is a more coherent and fast grounded word learner, and that the grounding ability acquired during pre-training helps the model to learn unseen words more rapidly and robustly.more » « less
-
Cascadilla Press (Ed.)The morphosyntactic information in grammatical number marking may be a useful cue for children in the process of acquiring number words. A language with dual marking, like Slovenian, may help children to bootstrap the meaning of the word “two” by drawing their attention to sets of two as a referent of language. If the dual marker indeed facilitates number learning, then we hypothesized that “two” should be acquired earlier in populations exposed to the dual marker; the dual should be learned before “two”; and knowledge of the dual form should be correlated with knowledge of “two”. We tested these hypotheses by having Slovenian and English-speaking children complete the Give-a-Number and Give-Morphology tasks. We analyzed the Give-Morphology in a new way, using stricter criteria to determine that children “know” the morphological markers than simple percent correct. In this sample, Slovenian children exposed to the dual marker did not show evidence of knowing “two” (i.e., being 2-knowers) at very young ages or earlier than English-speaking children. Knowledge of the dual marker did not precede nor correlate with the acquisition of “two”; indeed, the dual form was only acquired after the singular and plural. These analyses were conducted using an open data set with more Slovenian 2-knowers, yielding similar results. These findings present challenges for the view that grammatical number plays a role in number acquisition. This theory requires articulation about how a dual-marked language can facilitate number acquisition if children do not notice or learn the dual form. The information in grammatical number marking may be a useful cue for children in the process of acquiring number words. A language with dual marking, like Slovenian, may help children to bootstrap the meaning of the word “two” by drawing their attention to sets of two as a referent of language. If the dual marker indeed facilitates number learning, we hypothesized that “two” should be acquired earlier in populations exposed to the dual marker; the dual should be learned before “two”; and knowledge of the dual form should be correlated with knowledge of “two”. We tested these hypotheses by having Slovenian and English-speaking children complete the Give-a-Number and Give-Morphology tasks. We analyzed the Give-Morphology in a new way, using stricter criteria to determine that children “know” the morphological markers than simple percent correct. In this sample, Slovenian children exposed to the dual marker did not show evidence of knowing “two” (i.e., being 2-knowers) at very young ages or earlier than English-speaking children. Knowledge of the dual marker did not precede nor correlate with the acquisition of “two”. Indeed, the dual form was acquired only after the singular and plural. Parallel analyses were also conducted using an open data set with more Slovenian 2-knowers, yielding similar results. These findings present challenges for the claim that grammatical number plays a role in number acquisition. Specifically, this theory requires better articulation about how a dual-marked language can facilitate number acquisition if children do not notice or learn the dual form.more » « less
-
Abstract Each language has its unique way to mark grammatical information such as gender, number and tense. For example, English marks number and tense/aspect information with morphological suffixes (e.g., ‐sor ‐ed). These morphological suffixes are crucial for language acquisition as they are the basic building blocks of syntax, encode relationships, and convey meaning. Previous research shows that English‐learning infants recognize morphological suffixes attached to nonce words by the end of the first year, although even 8‐month‐olds recognize them when they are attached to known words. These results support an acquisition trajectory where discovery of meaning guides infants' acquisition of morphological suffixes. In this paper, we re‐evaluated English–learning infants' knowledge of morphological suffixes in the first year of life. We found that 6–month–olds successfully segmented nonce words suffixed with–s,–ing,–edand a pseudo‐morpheme ‐sh. Additionally, they related nonce words suffixed with–s, but not ‐ing, ‐edor a pseudo‐morpheme–shand stems. By 8–months, infants were also able to relate nonce words suffixed with–ingand stems. Our results show that infants demonstrate knowledge of morphological relatedness from the earliest stages of acquisition. They do so even in the absence of access to meaning. Based on these results, we argue for a developmental timeline where the acquisition of morphology is, at least, concurrent with the acquisition of phonology and meaning.more » « less