skip to main content


Title: A resource-rational model of human processing of recursive linguistic structure
A major goal of psycholinguistic theory is to account for the cognitive constraints limiting the speed and ease of language comprehension and production. Wide-ranging evidence demonstrates a key role for linguistic expectations: A word’s predictability, as measured by the information-theoretic quantity of surprisal, is a major determinant of processing difficulty. But surprisal, under standard theories, fails to predict the difficulty profile of an important class of linguistic patterns: the nested hierarchical structures made possible by recursion in human language. These nested structures are better accounted for by psycholinguistic theories of constrained working memory capacity. However, progress on theory unifying expectation-based and memory-based accounts has been limited. Here we present a unified theory of a rational trade-off between precision of memory representations with ease of prediction, a scaled-up computational implementation using contemporary machine learning methods, and experimental evidence in support of the theory’s distinctive predictions. We show that the theory makes nuanced and distinctive predictions for difficulty patterns in nested recursive structures predicted by neither expectation-based nor memory-based theories alone. These predictions are confirmed 1) in two language comprehension experiments in English, and 2) in sentence completions in English, Spanish, and German. More generally, our framework offers computationally explicit theory and methods for understanding how memory constraints and prediction interact in human language comprehension and production.  more » « less
Award ID(s):
2121074
NSF-PAR ID:
10376126
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
119
Issue:
43
ISSN:
0027-8424
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Expectation-based theories of sentence processing posit that processing difficulty is determined by predictability in context. While predictability quantified via surprisal has gained empirical support, this representation-agnostic measure leaves open the question of how to best approximate the human comprehender's latent probability model. This article first describes an incremental left-corner parser that incorporates information about common linguistic abstractions such as syntactic categories, predicate-argument structure, and morphological rules as a computational-level model of sentence processing. The article then evaluates a variety of structural parsers and deep neural language models as cognitive models of sentence processing by comparing the predictive power of their surprisal estimates on self-paced reading, eye-tracking, and fMRI data collected during real-time language processing. The results show that surprisal estimates from the proposed left-corner processing model deliver comparable and often superior fits to self-paced reading and eye-tracking data when compared to those from neural language models trained on much more data. This may suggest that the strong linguistic generalizations made by the proposed processing model may help predict humanlike processing costs that manifest in latency-based measures, even when the amount of training data is limited. Additionally, experiments using Transformer-based language models sharing the same primary architecture and training data show a surprising negative correlation between parameter count and fit to self-paced reading and eye-tracking data. These findings suggest that large-scale neural language models are making weaker generalizations based on patterns of lexical items rather than stronger, more humanlike generalizations based on linguistic structure. 
    more » « less
  2. Speakers often face choices as to how to structure their intended message into an utterance. Here we investigate the influence of contextual predictability on the encoding of linguistic content manifested by speaker choice in a classifier language, Mandarin Chinese. In Mandarin, modifying a noun with a numeral obligatorily requires the use of a classifier. While different nouns are compatible with different SPECIFIC classifiers, there is a GENERAL classifier that can be used with most nouns. When the upcoming noun is less predictable, using a more specific classifier would reduce the noun’s surprisal, potentially facilitating comprehension (predicted to be preferred under Uniform Information Density, Levy & Jaeger, 2007), but the specific classifier may be dispreferred from a production standpoint if the general classifier is more easily available (predicted by Availability-Based Production; Bock, 1987; Ferreira & Dell, 2000). Here we report a picture-naming experiment confirming two distinctive predictions made by Availability-Based Production. 
    more » « less
  3. In standard models of language production or comprehension, the elements which are retrieved from memory and combined into a syntactic structure are “lemmas” or “lexical items.” Such models implicitly take a “lexicalist” approach, which assumes that lexical items store meaning, syntax, and form together, that syntactic and lexical processes are distinct, and that syntactic structure does not extend below the word level. Across the last several decades, linguistic research examining a typologically diverse set of languages has provided strong evidence against this approach. These findings suggest that syntactic processes apply both above and below the “word” level, and that both meaning and form are partially determined by the syntactic context. This has significant implications for psychological and neurological models of language processing as well as for the way that we understand different types of aphasia and other language disorders. As a consequence of the lexicalist assumptions of these models, many kinds of sentences that speakers produce and comprehend—in a variety of languages, including English—are challenging for them to account for. Here we focus on language production as a case study. In order to move away from lexicalism in psycho- and neuro-linguistics, it is not enough to simply update the syntactic representations of words or phrases; the processing algorithms involved in language production are constrained by the lexicalist representations that they operate on, and thus also need to be reimagined. We provide an overview of the arguments against lexicalism, discuss how lexicalist assumptions are represented in models of language production, and examine the types of phenomena that they struggle to account for as a consequence. We also outline what a non-lexicalist alternative might look like, as a model that does not rely on a lemma representation, but instead represents that knowledge as separate mappings between (a) meaning and syntax and (b) syntax and form, with a single integrated stage for the retrieval and assembly of syntactic structure. By moving away from lexicalist assumptions, this kind of model provides better cross-linguistic coverage and aligns better with contemporary syntactic theory.

     
    more » « less
  4. Abstract

    Surprisal theory posits that less-predictable words should take more time to process, with word predictability quantified as surprisal, i.e., negative log probability in context. While evidence supporting the predictions of surprisal theory has been replicated widely, much of it has focused on a very narrow slice of data: native English speakers reading English texts. Indeed, no comprehensive multilingual analysis exists. We address this gap in the current literature by investigating the relationship between surprisal and reading times in eleven different languages, distributed across five language families. Deriving estimates from language models trained on monolingual and multilingual corpora, we test three predictions associated with surprisal theory: (i) whether surprisal is predictive of reading times, (ii) whether expected surprisal, i.e., contextual entropy, is predictive of reading times, and (iii) whether the linking function between surprisal and reading times is linear. We find that all three predictions are borne out crosslinguistically. By focusing on a more diverse set of languages, we argue that these results offer the most robust link to date between information theory and incremental language processing across languages.

     
    more » « less
  5. Abstract Background

    Transfer students in engineering must navigate a myriad of information sources to obtain accurate information on how to matriculate into a 4‐year institution. Although some institutional and state‐level initiatives attempt to streamline the transfer process, students still report difficulties.

    Purpose

    This article explores the extent to which web‐based transfer information is fragmented across institutional websites and written using communicative strategies that could limit comprehension. Accordingly, this study characterizes information asymmetries—gaps in information—that affect transfer students in terms of two constructs: fragmentation and language.

    Method

    We employed a convergent fully integrated mixed‐methods design with a stratified random sample of 38 US engineering degree‐granting institutions. The connections between the webpages were transformed into networks and clustered usingk‐means and partitioning around medoids with measures of dispersion and centrality. A purposeful nested sample of 16 institutions was taken based on the clusters and explored using a two‐cycle mixed‐methods coding protocol to understand how fragmentation and language interact to create information asymmetries. The resulting themes from each construct were integrated to develop narratives across the sampled institutions.

    Conclusions

    We found the web‐based information for transfer students to be a messy web of loosely connected structures with language that complicates understanding. We identified four fragmentation themes illustrating how transfer information is organized and six language themes capturing linguistic patterns across the webpages. We offer strategies for researchers and practitioners based on the narratives we developed.

     
    more » « less