NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Evaluation of an Algorithmic‐Level Left‐Corner Parsing Account of Surprisal Effects

https://doi.org/10.1111/cogs.13500

Schuler, William; Yue, Shisen (October 2024, Cognitive Science)

Abstract This article evaluates the predictions of an algorithmic‐level distributed associative memory model as it introduces, propagates, and resolves ambiguity, and compares it to the predictions of computational‐level parallel parsing models in which ambiguous analyses are accounted separately in discrete distributions. By superposing activation patterns that serve as cues to other activation patterns, the model is able to maintain multiple syntactically complex analyses superposed in a finite working memory, propagate this ambiguity through multiple intervening words, then resolve this ambiguity in a way that produces a measurable predictor that is proportional to the log conditional probability of the disambiguating word given its context, marginalizing over all remaining analyses. The results are indeed consistent in cases of complex structural ambiguity with computational‐level parallel parsing models producing this same probability as a predictor, which have been shown reliably to predict human reading times.
more » « less
Frequency Explains the Inverse Correlation of Large Language Models’ Size, Training Data Amount, and Surprisal’s Fit to Reading Times

Oh, Byung-Doh; Yue, Shisen; Schuler, William (March 2024, Association for Computational Linguistics)

Full Text Available
Leading Whitespaces of Language Models’ Subword Vocabulary Pose a Confound for Calculating Word Probabilities

https://doi.org/10.18653/v1/2024.emnlp-main.202

Oh, Byung-Doh; Schuler, William (January 2024, Association for Computational Linguistics)

Full Text Available
Transformer-Based Language Model Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens

https://doi.org/10.18653/v1/2023.findings-emnlp.128

Oh, Byung-Doh; Schuler, William (January 2023, Association for Computational Linguistics)

Full Text Available
Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions

https://doi.org/10.18653/v1/2023.acl-long.562

Oh, Byung-Doh; Schuler, William (January 2023, Association for Computational Linguistics)

Full Text Available
Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times?

https://doi.org/10.1162/tacl_a_00548

Oh, Byung-Doh; Schuler, William (January 2023, Transactions of the Association for Computational Linguistics)

Abstract This work presents a linguistic analysis into why larger Transformer-based pre-trained language models with more parameters and lower perplexity nonetheless yield surprisal estimates that are less predictive of human reading times. First, regression analyses show a strictly monotonic, positive log-linear relationship between perplexity and fit to reading times for the more recently released five GPT-Neo variants and eight OPT variants on two separate datasets, replicating earlier results limited to just GPT-2 (Oh et al., 2022). Subsequently, analysis of residual errors reveals a systematic deviation of the larger variants, such as underpredicting reading times of named entities and making compensatory overpredictions for reading times of function words such as modals and conjunctions. These results suggest that the propensity of larger Transformer-based models to ‘memorize’ sequences during training makes their surprisal estimates diverge from humanlike expectations, which warrants caution in using pre-trained language models to study human language processing.
more » « less
Full Text Available
Entropy- and Distance-Based Predictors From GPT-2 Attention Patterns Predict Reading Times Over and Above GPT-2 Surprisal

Oh, Byung-Doh; Schuler, William (December 2022, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing)

Full Text Available
Robust Effects of Working Memory Demand during Naturalistic Language Comprehension in Language-Selective Cortex

https://doi.org/10.1523/JNEUROSCI.1894-21.2022

Shain, Cory; Blank, Idan A.; Fedorenko, Evelina; Gibson, Edward; Schuler, William (September 2022, The Journal of Neuroscience)

Full Text Available
Comparison of Structural Parsers and Neural Language Models as Surprisal Estimators

https://doi.org/10.3389/frai.2022.777963

Oh, Byung-Doh; Clark, Christian; Schuler, William (March 2022, Frontiers in Artificial Intelligence)

Expectation-based theories of sentence processing posit that processing difficulty is determined by predictability in context. While predictability quantified via surprisal has gained empirical support, this representation-agnostic measure leaves open the question of how to best approximate the human comprehender's latent probability model. This article first describes an incremental left-corner parser that incorporates information about common linguistic abstractions such as syntactic categories, predicate-argument structure, and morphological rules as a computational-level model of sentence processing. The article then evaluates a variety of structural parsers and deep neural language models as cognitive models of sentence processing by comparing the predictive power of their surprisal estimates on self-paced reading, eye-tracking, and fMRI data collected during real-time language processing. The results show that surprisal estimates from the proposed left-corner processing model deliver comparable and often superior fits to self-paced reading and eye-tracking data when compared to those from neural language models trained on much more data. This may suggest that the strong linguistic generalizations made by the proposed processing model may help predict humanlike processing costs that manifest in latency-based measures, even when the amount of training data is limited. Additionally, experiments using Transformer-based language models sharing the same primary architecture and training data show a surprising negative correlation between parameter count and fit to self-paced reading and eye-tracking data. These findings suggest that large-scale neural language models are making weaker generalizations based on patterns of lexical items rather than stronger, more humanlike generalizations based on linguistic structure.
more » « less
Full Text Available
Continuous-time deconvolutional regression for psycholinguistic modeling

https://doi.org/10.1016/j.cognition.2021.104735

Shain, Cory; Schuler, William (October 2021, Cognition)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records