Dominant theories of language production suggest that word choice—lexical selection—is driven by alignment with the intended message: To talk about a young feline, we choose the most aligned word, kitten. Another factor that could shape lexical selection is word accessibility, or how easy it is to produce a given word (e.g., cat is more accessible than kitten). To test whether producers are also influenced by word accessibility, we designed an artificial lexicon containing high- and low-frequency words whose meanings correspond to compass directions. Participants in a communication game (total N = 181 adults) earned points by producing compass directions, which often required an implicit decision between a high- and low-frequency word. A trade-off was observed across four experiments; specifically, high-frequency words were produced even when less aligned with messages. These results suggest that implicit decisions between words are impacted by accessibility. Of all the times that people have produced cat, sometimes they likely meant kitten.
more »
« less
A chimpanzee by any other name: The contributions of utterance context and information density on word choice
An important feature of language production is the flexibility of lexical selection; producers could refer to an animal as chimpanzee, chimp, ape, she, and so on. Thus, a key question for psycholinguistic research is how and why producers make the lexical selections that they do. Information theoretic approaches have argued that producers regulate the uncertainty of the utterance for comprehenders, for example using longer words like chimpanzee if their messages are likely to be misunderstood, and shorter ones like chimp when the message is easy to understand. In this work, we test for the relative contributions of the information theoretic approach and an approach more aligned with psycholinguistic models of language production. We examine the effect on lexical selection of whole utterance-level factors that we take as a proxy for register or style in message-driven production accounts. Using a modern machine learning-oriented approach, we show that for both naturalistic stimuli and real-world corpora, producers prefer words to be longer in systematically different contexts, independent of the specific message they are trying to convey. We do not find evidence for regulation of uncertainty, as in information theoretic approaches. We offer suggestions for modification of the standard psycholinguistic production approach that emphasizes the need for the field to specify how message formulation influences lexical choice in multiword utterances.
more »
« less
- Award ID(s):
- 1849236
- PAR ID:
- 10506964
- Publisher / Repository:
- Elsevier
- Date Published:
- Journal Name:
- Cognition
- Volume:
- 230
- Issue:
- C
- ISSN:
- 0010-0277
- Page Range / eLocation ID:
- 105265
- Subject(s) / Keyword(s):
- Language production Surprisal Neural network Reduction Lexical selection
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Language is a remarkably efficient tool for transmitting information. Yet human speakers make statements that are inefficient, imprecise, or even contrary to their own beliefs, all in the service of being polite. What rational machinery underlies polite language use? Here, we show that polite speech emerges from the competition of three communicative goals: to convey information, to be kind, and to present oneself in a good light. We formalize this goal tradeoff using a probabilistic model of utterance production, which predicts human utterance choices in socially sensitive situations with high quantitative accuracy, and we show that our full model is superior to its variants with subsets of the three goals. This utility-theoretic approach to speech acts takes a step toward explaining the richness and subtlety of social language use.more » « less
-
Purpose: Stuttering-like disfluencies (SLDs) and typical disfluencies (TDs) are both more likely to occur as utterance length increases. However, longer and shorter utterances differ by more than the number of morphemes: They may also serve different communicative functions or describe different ideas. Decontextualized language, or language that describes events and concepts outside of the “here and now,” is associated with longer utterances. Prior work has shown that language samples taken in decontextualized contexts contain more disfluencies, but averaging across an entire language sample creates a confound between utterance length and decontextualization as contributors to stuttering. We coded individual utterances from naturalistic play samples to test the hypothesis that decontextualized language leads to increased disfluencies above and beyond the effects of utterance length. Method: We used archival transcripts of language samples from 15 preschool children who stutter (CWS) and 15 age- and sex-matched children who do not stutter (CWNS). Utterances were coded as either contextualized or decontextualized, and we used mixed-effects logistic regression to investigate the impact of utterance length and decontextualization on SLDs and TDs. Results: CWS were more likely to stutter when producing decontextualized utterances, even when controlling for utterance length. An interaction between decontextualization and utterance length indicated that the effect of decontextualization was greatest for shorter utterances. TDs increased in decontextualized utterances when controlling for utterance length for both CWS and CWNS. The effect of decontextualization on TDs did not differ statistically between the two groups. Conclusions: The increased working memory demands associated with decontextualized language contribute to increased language planning effort. This leads to increased TD in CWS and CWNS. Under a multifactorial dynamic model of stuttering, the increased language demands may also contribute to increased stuttering in CWS due to instabilities in their speech motor systems.more » « less
-
Can we predict the words a child is going to learn next given information about the words that a child knows now? Do different representations of a child’s vocabulary knowledge affect our ability to predict the acquisition of lexical items for individual children? Past research has often focused on population statistics of vocabulary growth rather than prediction of words an individual child is likely to learn next. We consider a neural network approach to predict vocabulary acquisition. Specifically, we investigate how best to represent the child’s current vocabulary in order to accurately predict future learning. The models we consider are based on qualitatively different sources of information: descriptive information about the child, the specific words a child knows, and representations that aim to capture the child’s aggregate lexical knowledge. Using longitudinal vocabulary data from children aged 15-36 months, we construct neural network models to predict which words are likely to be learned by a particular child in the coming month. Many models based on child-specific vocabulary information outperform models with child information only, suggesting that the words a child knows influence prediction of future language learning. These models provide an understanding of the role of current vocabulary knowledge on future lexical growth.more » « less
-
null (Ed.)NLP is currently dominated by language models like RoBERTa which are pretrained on billions of words. But what exact knowledge or skills do Transformer LMs learn from large-scale pretraining that they cannot learn from less data? To explore this question, we adopt five styles of evaluation: classifier probing, information-theoretic probing, unsupervised relative acceptability judgments, unsupervised language model knowledge probing, and fine-tuning on NLU tasks. We then draw learning curves that track the growth of these different measures of model ability with respect to pretraining data volume using the MiniBERTas, a group of RoBERTa models pretrained on 1M, 10M, 100M and 1B words. We find that these LMs require only about 10M to 100M words to learn to reliably encode most syntactic and semantic features we test. They need a much larger quantity of data in order to acquire enough commonsense knowledge and other skills required to master typical downstream NLU tasks. The results suggest that, while the ability to encode linguistic features is almost certainly necessary for language understanding, it is likely that other, unidentified, forms of knowledge are the major drivers of recent improvements in language understanding among large pretrained models.more » « less
An official website of the United States government

