skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Testing the Predictions of Surprisal Theory in 11 Languages
Abstract Surprisal theory posits that less-predictable words should take more time to process, with word predictability quantified as surprisal, i.e., negative log probability in context. While evidence supporting the predictions of surprisal theory has been replicated widely, much of it has focused on a very narrow slice of data: native English speakers reading English texts. Indeed, no comprehensive multilingual analysis exists. We address this gap in the current literature by investigating the relationship between surprisal and reading times in eleven different languages, distributed across five language families. Deriving estimates from language models trained on monolingual and multilingual corpora, we test three predictions associated with surprisal theory: (i) whether surprisal is predictive of reading times, (ii) whether expected surprisal, i.e., contextual entropy, is predictive of reading times, and (iii) whether the linking function between surprisal and reading times is linear. We find that all three predictions are borne out crosslinguistically. By focusing on a more diverse set of languages, we argue that these results offer the most robust link to date between information theory and incremental language processing across languages.  more » « less
Award ID(s):
2121074
PAR ID:
10488148
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
MIT Press
Date Published:
Journal Name:
Transactions of the Association for Computational Linguistics
Volume:
11
ISSN:
2307-387X
Page Range / eLocation ID:
1451 to 1470
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract This work presents a linguistic analysis into why larger Transformer-based pre-trained language models with more parameters and lower perplexity nonetheless yield surprisal estimates that are less predictive of human reading times. First, regression analyses show a strictly monotonic, positive log-linear relationship between perplexity and fit to reading times for the more recently released five GPT-Neo variants and eight OPT variants on two separate datasets, replicating earlier results limited to just GPT-2 (Oh et al., 2022). Subsequently, analysis of residual errors reveals a systematic deviation of the larger variants, such as underpredicting reading times of named entities and making compensatory overpredictions for reading times of function words such as modals and conjunctions. These results suggest that the propensity of larger Transformer-based models to ‘memorize’ sequences during training makes their surprisal estimates diverge from humanlike expectations, which warrants caution in using pre-trained language models to study human language processing. 
    more » « less
  2. Abstract Source code is a form of human communication, albeit one where the information shared between the programmers reading and writing the code is constrained by the requirement that the code executes correctly. Programming languages are more syntactically constrained than natural languages, but they are also very expressive, allowing a great many different ways to express even very simple computations. Still, code written by developers is highly predictable, and many programming tools have taken advantage of this phenomenon, relying on language modelsurprisalas a guiding mechanism. While surprisal has been validated as a measure of cognitive load in natural language, its relation to human cognitive processes in code is still poorly understood. In this paper, we explore the relationship between surprisal and programmer preference at a small granularity—do programmers prefer more predictable expressions in code? Usingmeaning‐preserving transformations, we produce equivalent alternatives to developer‐written code expressions and run a corpus study on Java and Python projects. In general, language models rate the code expressions developerschooseto write as more predictable than these transformed alternatives. Then, we perform two human subject studies asking participants to choose between two equivalent snippets of Java code with different surprisal scores (one original and transformed). We find that programmersdoprefer more predictable variants, and that stronger language models like the transformer align more often and more consistently with these preferences. 
    more » « less
  3. A major goal of psycholinguistic theory is to account for the cognitive constraints limiting the speed and ease of language comprehension and production. Wide-ranging evidence demonstrates a key role for linguistic expectations: A word’s predictability, as measured by the information-theoretic quantity of surprisal, is a major determinant of processing difficulty. But surprisal, under standard theories, fails to predict the difficulty profile of an important class of linguistic patterns: the nested hierarchical structures made possible by recursion in human language. These nested structures are better accounted for by psycholinguistic theories of constrained working memory capacity. However, progress on theory unifying expectation-based and memory-based accounts has been limited. Here we present a unified theory of a rational trade-off between precision of memory representations with ease of prediction, a scaled-up computational implementation using contemporary machine learning methods, and experimental evidence in support of the theory’s distinctive predictions. We show that the theory makes nuanced and distinctive predictions for difficulty patterns in nested recursive structures predicted by neither expectation-based nor memory-based theories alone. These predictions are confirmed 1) in two language comprehension experiments in English, and 2) in sentence completions in English, Spanish, and German. More generally, our framework offers computationally explicit theory and methods for understanding how memory constraints and prediction interact in human language comprehension and production. 
    more » « less
  4. Automated text simplification aims to produce simple versions of complex texts. This task is especially useful in the medical domain, where the latest medical findings are typically communicated via complex and technical articles. This creates barriers for laypeople seeking access to up-to-date medical findings, consequently impeding progress on health literacy. Most existing work on medical text simplification has focused on monolingual settings, with the result that such evidence would be available only in just one language (most often, English). This work addresses this limitation via multilingual simplification, i.e., directly simplifying complex texts into simplified texts in multiple languages. We introduce MultiCochrane, the first sentence-aligned multilingual text simplification dataset for the medical domain in four languages: English, Spanish, French, and Farsi. We evaluate fine-tuned and zero-shot models across these languages with extensive human assessments and analyses. Although models can generate viable simplified texts, we identify several outstanding challenges that this dataset might be used to address. 
    more » « less
  5. Abstract Multilingual speakers can find speech recognition in everyday environments like restaurants and open-plan offices particularly challenging. In a world where speaking multiple languages is increasingly common, effective clinical and educational interventions will require a better understanding of how factors like multilingual contexts and listeners’ language proficiency interact with adverse listening environments. For example, word and phrase recognition is facilitated when competing voices speak different languages. Is this due to a “release from masking” from lower-level acoustic differences between languages and talkers, or higher-level cognitive and linguistic factors? To address this question, we created a “one-man bilingual cocktail party” selective attention task using English and Mandarin speech from one bilingual talker to reduce low-level acoustic cues. In Experiment 1, 58 listeners more accurately recognized English targets when distracting speech was Mandarin compared to English. Bilingual Mandarin–English listeners experienced significantly more interference and intrusions from the Mandarin distractor than did English listeners, exacerbated by challenging target-to-masker ratios. In Experiment 2, 29 Mandarin–English bilingual listeners exhibited linguistic release from masking in both languages. Bilinguals experienced greater release from masking when attending to English, confirming an influence of linguistic knowledge on the “cocktail party” paradigm that is separate from primarily energetic masking effects. Effects of higher-order language processing and expertise emerge only in the most demanding target-to-masker contexts. The “one-man bilingual cocktail party” establishes a useful tool for future investigations and characterization of communication challenges in the large and growing worldwide community of Mandarin–English bilinguals. 
    more » « less