skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Investigating representations of verb bias in neural language models
Languages typically provide more than one grammatical construction to express certain types of messages. A speaker’s choice of construction is known to depend on multiple factors, including the choice of main verb – a phenomenon known as verb bias. Here we introduce DAIS, a large benchmark dataset containing 50K human judgments for 5K distinct sentence pairs in the English dative alternation. This dataset includes 200 unique verbs and systematically varies the definiteness and length of arguments. We use this dataset, as well as an existing corpus of naturally occurring data, to evaluate how well recent neural language models capture human preferences. Results show that larger models perform better than smaller models, and transformer architectures (e.g. GPT-2) tend to out-perform recurrent architectures (e.g. LSTMs) even under comparable parameter and training settings. Additional analyses of internal feature representations suggest that transformers may better integrate specific lexical information with grammatical constructions.  more » « less
Award ID(s):
1911835
PAR ID:
10285688
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Page Range / eLocation ID:
4653--4663
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper investigates the ability of artificial neural networks to judge the grammatical acceptability of a sentence, with the goal of testing their linguistic competence. We introduce the Corpus of Linguistic Acceptability (CoLA), a set of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature. As baselines, we train several recurrent neural network models on acceptability classification, and find that our models outperform unsupervised models by Lau et al. (2016) on CoLA. Error-analysis on specific grammatical phenomena reveals that both Lau et al.’s models and ours learn systematic generalizations like subject-verb-object order. However, all models we test perform far below human level on a wide range of grammatical constructions. 
    more » « less
  2. Culbertson, J.; Perfors, A.; Rabagliati, H.; Ramenzoni, V. (Ed.)
    Source-goal events involve an object moving from the Source to the Goal. In this work, we focus on the representation of the object, which has received relatively less attention in the study of Source-goal events. Specifically, this study aims to investigate the mapping between language and mental representations of object locations in transfer-of-possession events (e.g. throwing, giving). We investigate two different grammatical factors that may influence the representation of object location in transfer-of-possession events: (a) grammatical aspect (e.g. threw vs. was throwing) and (b) verb semantics (guaranteed transfer, e.g. give vs. no guaranteed transfer, e.g. throw). We conducted a visual-world eye-tracking study using a novel webcam-based eye-tracking paradigm (Webgazer; Papoutsaki et al., 2016) to investigate how grammatical aspect and verb semantics in the linguistic input guide the real-time and final representations of object locations. We show that grammatical cues guide the real-time and final representations of object locations. 
    more » « less
  3. Quantifier Raising leaves no overt marking to indicate movement has occurred, making the task of identifying when raising has occurred extremely difficult for the parser. Beyond this challenge, evidence from interpretation and judgement studies suggests that raising causes difficulty in processing. These two aspects taken together have led some to suggest that human sentence processor employs a strategy in which the construction of raised structures is avoided, commonly called processing scope economy. This contrasts to the traditional notion of grammatical scope economy, where Quantifier Raising is restricted in the grammar. In this paper we discuss the properties of these two theories. We conclude that the two approaches make different predictions about when raising should occur online, with processing scope economy predicting that the parser avoids raising whenever possible and grammatical scope economy predicting that the parser raises regularly and sometimes produces ungrammatical structures in the process. We then present an experiment which examines complex scope structures in verb phrase ellipsis to observe when penalties related to Quantifier Raising are observed online. We find that penalties appear in configurations where Quantifier Raising would be ungrammatical under grammatical scope economy, suggesting the parser attempts Quantifier Raising in these configurations. This evidence indicates that the parser’s behavior matches the predictions of grammatical scope economy rather than those of processing scope economy. 
    more » « less
  4. null (Ed.)
    Speech recognition and machine translation have made major progress over the past decades, providing practical systems to map one language sequence to another. Although multiple modalities such as sound and video are becoming increasingly available, the state-of-the-art systems are inherently unimodal, in the sense that they take a single modality --- either speech or text --- as input. Evidence from human learning suggests that additional modalities can provide disambiguating signals crucial for many language tasks. Here, we describe the dataset, a large, open-domain collection of videos with transcriptions and their translations. We then show how this single dataset can be used to develop systems for a variety of language tasks and present a number of models meant as starting points. Across tasks, we find that building multi-modal architectures that perform better than their unimodal counterpart remains a challenge. This leaves plenty of room for the exploration of more advanced solutions that fully exploit the multi-modal nature of the dataset, and the general direction of multimodal learning with other datasets as well. 
    more » « less
  5. To enable building and testing models on long-document comprehension, we introduce QuALITY, a multiple-choice QA dataset with context passages in English that have an average length of about 5,000 tokens, much longer than typical current models can process. Unlike in prior work with passages, our questions are written and validated by contributors who have read the entire passage, rather than relying on summaries or excerpts. In addition, only half of the questions are answerable by annotators working under tight time constraints, indicating that skimming and simple search are not enough to consistently perform well. Our baseline models perform poorly on this task (55.4%) and significantly lag behind human performance (93.5%). 
    more » « less