skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Neural Network Acceptability Judgments
This paper investigates the ability of artificial neural networks to judge the grammatical acceptability of a sentence, with the goal of testing their linguistic competence. We introduce the Corpus of Linguistic Acceptability (CoLA), a set of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature. As baselines, we train several recurrent neural network models on acceptability classification, and find that our models outperform unsupervised models by Lau et al. (2016) on CoLA. Error-analysis on specific grammatical phenomena reveals that both Lau et al.’s models and ours learn systematic generalizations like subject-verb-object order. However, all models we test perform far below human level on a wide range of grammatical constructions.  more » « less
Award ID(s):
1850208
PAR ID:
10130157
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Transactions of the Association for Computational Linguistics
Volume:
7
ISSN:
2307-387X
Page Range / eLocation ID:
625 to 641
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We introduce The Benchmark of Linguistic Minimal Pairs (BLiMP), 1 a challenge set for evaluating the linguistic knowledge of language models (LMs) on major grammatical phenomena in English. BLiMP consists of 67 individual datasets, each containing 1,000 minimal pairs—that is, pairs of minimally different sentences that contrast in grammatical acceptability and isolate specific phenomenon in syntax, morphology, or semantics. We generate the data according to linguist-crafted grammar templates, and human aggregate agreement with the labels is 96.4%. We evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs by observing whether they assign a higher probability to the acceptable sentence in each minimal pair. We find that state-of-the-art models identify morphological contrasts related to agreement reliably, but they struggle with some subtle semantic and syntactic phenomena, such as negative polarity items and extraction islands. 
    more » « less
  2. We conduct a thorough study to diagnose the behaviors of pre-trained language encoders (ELMo, BERT, and RoBERTa) when confronted with natural grammatical errors. Specifically, we collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. We use this approach to facilitate debugging models on downstream applications. Results confirm that the performance of all tested models is affected but the degree of impact varies. To interpret model behaviors, we further design a linguistic acceptability task to reveal their abilities in identifying ungrammatical sentences and the position of errors. We find that fixed contextual encoders with a simple classifier trained on the prediction of sentence correctness are able to locate error positions. We also design a cloze test for BERT and discover that BERT captures the interaction between errors and specific tokens in context. Our results shed light on understanding the robustness and behaviors of language encoders against grammatical errors. 
    more » « less
  3. Like all domains of cognition, language processing is affected by top–down knowledge. Classic evidence for this is missing blatant errors in the signal. In sentence comprehension, one instance is failing to notice word order errors, such as transposed words in the middle of a sentence: “you that read wrong” (Mirault et al., 2018). Our brains seem to fix such errors, since they are incompatible with our grammatical knowledge, but how do our brains do this? Following behavioral work on inner transpositions, we flashed four-word sentences for 300 ms using rapid parallel visual presentation (Snell and Grainger, 2017). We compared magnetoencephalography responses to fully grammatical and reversed sentences (24 human participants: 21 females, 4 males). The left lateral language cortex robustly distinguished grammatical and reversed sentences starting at 213 ms. Thus, the influence of grammatical knowledge begun rapidly after visual word form recognition (Tarkiainen et al., 1999). At the earliest stage of this neural “sentence superiority effect,” inner transpositions patterned between grammatical and reversed sentences, showing evidence that the brain initially “noticed” the error. However, 100 ms later, inner transpositions became indistinguishable from grammatical sentences, suggesting at this point, the brain had “fixed” the error. These results show that after a single glance at a sentence, syntax impacts our neural activity almost as quickly as higher-level object recognition is assumed to take place (Cichy et al., 2014). The earliest stage involves detailed comparisons between the bottom–up input and grammatical knowledge, while shortly afterward, top–down knowledge can override an error in the stimulus. 
    more » « less
  4. J. Culbertson, A. Perfors (Ed.)
    Languages often express grammatical information through inflectional morphology, in which grammatical features are grouped into strings of morphemes. In this work, we propose that cross-linguistic generalizations about morphological fusion, in which multiple features are expressed through one morpheme, can be explained in part by optimization of processing efficiency, as formalized using the memory--surprisal tradeoff of Hahn et al. (2021). We show in a toy setting that fusion of highly informative neighboring morphemes can lead to greater processing efficiency under our processing model. Next, based on paradigm and frequency data from four languages, we consider both total fusion and gradable fusion using empirical measures developed by Rathi et al. (2021), and find that the degree of fusion is predicted by closeness of optimal morpheme ordering as determined by optimization of processing efficiency. Finally, we show that optimization of processing efficiency can successfully predict typological patterns involving suppletion. 
    more » « less
  5. Culbertson, J.; Perfors, A.; Rabagliati, H.; Ramenzoni, V. (Ed.)
    Source-goal events involve an object moving from the Source to the Goal. In this work, we focus on the representation of the object, which has received relatively less attention in the study of Source-goal events. Specifically, this study aims to investigate the mapping between language and mental representations of object locations in transfer-of-possession events (e.g. throwing, giving). We investigate two different grammatical factors that may influence the representation of object location in transfer-of-possession events: (a) grammatical aspect (e.g. threw vs. was throwing) and (b) verb semantics (guaranteed transfer, e.g. give vs. no guaranteed transfer, e.g. throw). We conducted a visual-world eye-tracking study using a novel webcam-based eye-tracking paradigm (Webgazer; Papoutsaki et al., 2016) to investigate how grammatical aspect and verb semantics in the linguistic input guide the real-time and final representations of object locations. We show that grammatical cues guide the real-time and final representations of object locations. 
    more » « less