skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Role of Input in Language Revitalization: The Case of Lexical Development
Immersion programs have long been considered the gold standard for school-based language revitalization, but surprisingly little attention has been paid to the quantity and quality of the input that they provide to young language learners. Drawing on new data from three such programs (Kaqchikel, Western Subanon, and Māori), each with its own particular motivation, objectives, and pedagogical practices, we examine a key component of this revitalization strategy, namely the amount and type of lexical input that children receive. Our findings include previously unknown facts about the number of words that children in these programs hear per hour, the ratio of word tokens to word types, and the skewed frequency distribution of the particular words that make up the input. We discuss our findings with reference both to comparable measures for first language acquisition in a home setting and to their relevance for pedagogical strategies in the classroom.  more » « less
Award ID(s):
1926376
PAR ID:
10340313
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Language documentation and conservation
Volume:
15
ISSN:
1934-5275
Page Range / eLocation ID:
433-457
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We investigate the roles of linguistic and sensory experience in the early-produced visual, auditory, and abstract words of congenitally-blind toddlers, deaf toddlers, and typicallysighted/ hearing peers. We also assess the role of language access by comparing early word production in children learning English or American Sign Language (ASL) from birth, versus at a delay. Using parental report data on child word production from the MacArthur-Bates Communicative Development Inventory, we found evidence that while children produced words referring to imperceptible referents before age 2, such words were less likely to be produced relative to words with perceptible referents. For instance, blind (vs. sighted) children said fewer highly visual words like “blue” or “see”; deaf signing (vs. hearing) children produced fewer auditory signs like HEAR. Additionally, in spoken English and ASL, children who received delayed language access were less likely to produce words overall. These results demonstrate and begin to quantify how linguistic and sensory access may influence which words young children produce. 
    more » « less
  2. Abstract What is vision's role in driving early word production? To answer this, we assessed parent‐report vocabulary questionnaires administered to congenitally blind children (N = 40, Mean age = 24 months [R: 7–57 months]) and compared the size and contents of their productive vocabulary to those of a large normative sample of sighted children (N = 6574). We found that on average, blind children showed a roughly half‐year vocabulary delay relative to sighted children, amid considerable variability. However, the content of blind and sighted children's vocabulary was statistically indistinguishable in word length, part of speech, semantic category, concreteness, interactiveness, and perceptual modality. At a finer‐grained level, we also found that words’ perceptual properties intersect with children's perceptual abilities. Our findings suggest that while an absence of visual input may initially make vocabulary development more difficult, the content of the early productive vocabulary is largely resilient to differences in perceptual access. Research HighlightsInfants and toddlers born blind (with no other diagnoses) show a 7.5 month productive vocabulary delay on average, with wide variability.Across the studied age range (7–57 months), vocabulary delays widened with age.Blind and sighted children's early vocabularies contain similar distributions of word lengths, parts of speech, semantic categories, and perceptual modalities.Blind children (but not sighted children) were more likely to say visual words which could also be experienced through other senses. 
    more » « less
  3. It is well-known that children rapidly learn words, following a range of heuristics. What is less well appreciated is that – because most words are polysemous and have multiple meanings (e.g., ‘glass’ can label a material and drinking vessel) – children will often be learning a new meaning for a known word, rather than an entirely new word. Across four experiments we show that children flexibly adapt a well-known heuristic – the shape bias – when learning polysemous words. Consistent with previous studies, we find that children and adults preferentially extend a new object label to other objects of the same shape. But we also find that when a new word for an object (‘a gup’) has previously been used to label the material composing that object (‘some gup’), children and adults override the shape bias, and are more likely to extend the object label by material (Experiments 1 and 3). Further, we find that, just as an older meaning of a polysemous word constrains interpretations of a new word meaning, encountering a new word meaning leads learners to update their interpretations of an older meaning (Experiment 2). Finally, we find that these effects only arise when learners can perceive that a word’s meanings are related, not when they are arbitrarily paired (Experiment 4). Together, these findings show that children can exploit cues from polysemy to infer how new word meanings should be extended, suggesting that polysemy may facilitate word learning and invite children to construe categories in new ways. 
    more » « less
  4. Pre-trained language models (PLMs) aim to learn universal language representations by conducting self-supervised training tasks on large-scale corpora. Since PLMs capture word semantics in different contexts, the quality of word representations highly depends on word frequency, which usually follows a heavy-tailed distributions in the pre-training corpus. Therefore, the embeddings of rare words on the tail are usually poorly optimized. In this work, we focus on enhancing language model pre-training by leveraging definitions of the rare words in dictionaries (e.g., Wiktionary). To incorporate a rare word definition as a part of input, we fetch its definition from the dictionary and append it to the end of the input text sequence. In addition to training with the masked language modeling objective, we propose two novel self-supervised pre-training tasks on word and sentence-level alignment between input text sequence and rare word definitions to enhance language modeling representation with dictionary. We evaluate the proposed Dict-BERT model on the language understanding benchmark GLUE and eight specialized domain benchmark datasets. Extensive experiments demonstrate that Dict-BERT can significantly improve the understanding of rare words and boost model performance on various NLP downstream tasks. 
    more » « less
  5. We compared everyday language input to young congenitally-blind children with no addi- tional disabilities (N=15, 6–30 mo., M:16 mo.) and demographically-matched sighted peers (N=15, 6–31 mo., M:16 mo.). By studying whether the language input of blind children differs from their sighted peers, we aimed to determine whether, in principle, the language acquisition patterns observed in blind and sighted children could be explained by aspects of the speech they hear. Children wore LENA recorders to capture the auditory language environment in their homes. Speech in these recordings was then analyzed with a mix of automated and manually-transcribed measures across various subsets and dimensions of language input. These included measures of quantity (adult words), interaction (conversational turns and child-directed speech), linguistic properties (lexical diversity and mean length of utterance), and conceptual features (talk centered around the here-and-now; talk focused on visual referents that would be inaccessible to the blind but not sighted children). Overall, we found broad similarity across groups in speech quantitative, interactive, and linguistic properties. The only exception was that blind children’s language environments contained slightly but significantly more talk about past/future/hypothetical events than sighted children’s input; both groups received equiva- lent quantities of “visual” speech input. The findings challenge the notion that blind children’s lan- guage input diverges substantially from sighted children’s; while the input is highly variable across children, it is not systematically so across groups, across nearly all measures. The findings suggest instead that blind children and sighted children alike receive input that readily supports their language development, with open questions remaining regarding how this input may be differentially leveraged by language learners in early childhood. 
    more » « less