A Suite of LMs Comprehend Puzzle Statements as Well or Better Than Humans

Rakshit, Supantho; Hu, Jennifer; Mahowald, Kyle; Goldberg, Adele E

doi:10.1162/OPMI.a.344

Citation Details

A Suite of LMs Comprehend Puzzle Statements as Well or Better Than Humans

Abstract This paper reexamines a recent claim that Large Language Models lag behind humans in language comprehension on what were described as minimally complex statements. We argue that human performance was overestimated and LM performance, underestimated. Moreover, both people and lower-performing LMs are disproportionately challenged by queries involving potentially appropriate inferences, suggesting shared pragmatic sensitivity rather than model-specific deficits. Analysis of more sensitive log probabilities of Llama-2-70B demonstrate ceiling-level accuracy and pragmatic sensitivity. A separate group of LM grammaticality judgments previously characterized as incorrect are shown to correlate with human judgments, while certain reasoning models approximate idealized judgments when prompted to respond as an expert generative syntactician. Overall, the findings suggest that apparent deficits in LM performance may reflect task design, evaluation choices, and assumptions about human performance, rather than deficiencies in current models. more »

Award ID(s):: 2339729

PAR ID:: 10675137

Author(s) / Creator(s):: Rakshit, Supantho; Hu, Jennifer; Mahowald, Kyle; Goldberg, Adele E

Publisher / Repository:: MIT Press

Date Published:: 2026-01-01

Journal Name:: Open Mind

Volume:: 10

ISSN:: 2470-2986

Page Range / eLocation ID:: 431 to 440

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Journal Article:
https://doi.org/10.1162/OPMI.a.344

More Like this