skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: How Abstract Is Linguistic Generalization in Large Language Models? Experiments with Argument Structure
Abstract Language models are typically evaluated on their success at predicting the distribution of specific words in specific contexts. Yet linguistic knowledge also encodes relationships between contexts, allowing inferences between word distributions. We investigate the degree to which pre-trained transformer-based large language models (LLMs) represent such relationships, focusing on the domain of argument structure. We find that LLMs perform well in generalizing the distribution of a novel noun argument between related contexts that were seen during pre-training (e.g., the active object and passive subject of the verb spray), succeeding by making use of the semantically organized structure of the embedding space for word embeddings. However, LLMs fail at generalizations between related contexts that have not been observed during pre-training, but which instantiate more abstract, but well-attested structural generalizations (e.g., between the active object and passive subject of an arbitrary verb). Instead, in this case, LLMs show a bias to generalize based on linear order. This finding points to a limitation with current models and points to a reason for which their training is data-intensive.1  more » « less
Award ID(s):
1919321
PAR ID:
10538253
Author(s) / Creator(s):
; ;
Publisher / Repository:
MIT Press
Date Published:
Journal Name:
Transactions of the Association for Computational Linguistics
Volume:
11
ISSN:
2307-387X
Page Range / eLocation ID:
1377 to 1395
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Acquisition of natural language has been shown to fundamentally impact both one’s ability to use the first language and the ability to learn subsequent languages later in life. Sign languages offer a unique perspective on this issue because Deaf signers receive access to signed input at varying ages. The majority acquires sign language in (early) childhood, but some learn sign language later—a situation that is drastically different from that of spoken language acquisition. To investigate the effect of age of sign language acquisition and its potential interplay with age in signers, we examined grammatical acceptability ratings and reaction time measures in a group of Deaf signers (age range = 28–58 years) with early (0–3 years) or later (4–7 years) acquisition of sign language in childhood. Behavioral responses to grammatical word order variations (subject–object–verb [SOV] vs. object–subject–verb [OSV]) were examined in sentences that included (1) simple sentences, (2) topicalized sentences, and (3) sentences involving manual classifier constructions, uniquely characteristic of sign languages. Overall, older participants responded more slowly. Age of acquisition had subtle effects on acceptability ratings, whereby the direction of the effect depended on the specific linguistic structure. 
    more » « less
  2. Pre-trained language models (PLMs) aim to learn universal language representations by conducting self-supervised training tasks on large-scale corpora. Since PLMs capture word semantics in different contexts, the quality of word representations highly depends on word frequency, which usually follows a heavy-tailed distributions in the pre-training corpus. Therefore, the embeddings of rare words on the tail are usually poorly optimized. In this work, we focus on enhancing language model pre-training by leveraging definitions of the rare words in dictionaries (e.g., Wiktionary). To incorporate a rare word definition as a part of input, we fetch its definition from the dictionary and append it to the end of the input text sequence. In addition to training with the masked language modeling objective, we propose two novel self-supervised pre-training tasks on word and sentence-level alignment between input text sequence and rare word definitions to enhance language modeling representation with dictionary. We evaluate the proposed Dict-BERT model on the language understanding benchmark GLUE and eight specialized domain benchmark datasets. Extensive experiments demonstrate that Dict-BERT can significantly improve the understanding of rare words and boost model performance on various NLP downstream tasks. 
    more » « less
  3. Gero, JS (Ed.)
    Recent developments in using Large Language Models (LLMs) to predict and align with neural representations of language can be applied to achieving a future vision of design tools that enable detection and reconstruction of designers’ mental representations of ideas. Prior work has largely explored this relationship during passive language tasks only, e.g., reading or listening. In this work, the relationship between brain activation data (functional imaging, fMRI) during appropriate and novel word association generation and LLM (Llama-2 7b) word representations is tested using Representational Similarity Analysis (RSA). Findings suggest that LLM word representations align with brain activity captured during novel word association, but not when forming appropriate associates. Association formation is one cognitive process central to design. By demonstrating that brain activity during this task can align with LLM word representations, insights from this work encourage further investigation into this relationship during more complex design ideation processes. 
    more » « less
  4. To learn new words, particularly verbs, child learners have been shown to benefit from the linguistic contexts in which the words appear. However, cross-linguistic differences affect how this process unfolds. One previous study found that children’s abilities to learn a new verb differed across Korean and English as a function of the sentence in which the verb occurred. The authors hypothesized that the properties of word order and argument drop, which vary systematically in these two languages, were driving the differences. In the current study, we pursued this finding to ask if the difference persists later in development, or if children acquiring different languages come to appear more similar as their linguistic knowledge and learning capacities increase. Preschool-aged monolingual English learners (N = 80) and monolingual Korean learners (N = 64) were presented with novel verbs in contexts that varied in word order and argument drop and accompanying visual stimuli. We assessed their learning by measuring accuracy in a forced-choice pointing task, and we measured eye gaze during the learning phase as an indicator of the processes by which they mapped the novel verbs to meaning. Unlike previous studies which identified differences between English and Korean learning 2-year-olds in a similar task, our results revealed similarities between the two language groups with these older preschoolers. We interpret our results as evidence that over the course of early childhood, children become adept at learning from a large variety of contexts, such that differences between learners of different languages are attenuated. 
    more » « less
  5. A long-standing issue in the analysis of noun incorporation (NI) concerns whether the noun-verb complex is derived by syntactic movement of the object or postsyntactic merger of the verb and an in situ object. The same question pervades the literature on word formation and affixation more generally. This paper investigates these questions from the point of view of Inuktitut, an Inuit language of Northern Canada, and argues that both NI and polysynthetic word formation in Inuit are postsyntactic phenomena, derived by successive m-merger between adjacent elements along the clausal spine. I argue that incorporated nominals in Inuktitut are syntactically active, in that they remain accessible for case, agreement, and even phrasal movement operations, despite being overtly realized within the verb complex. These patterns follow straightforwardly from interactions between postsyntactic m-merger and general conditions on copy spell-out. M-merger of a nominal copy in a movement chain prevents that copy from being deleted, in accordance to morphological well-formedness conditions on word formation. 
    more » « less