Failures and successes to learn a core conceptual distinction from the statistics of language

Hu, Z; van_Paridon, J; Lupyan, G

Citation Details

Generic statements like “tigers are striped” and “cars have radios” com- municate information that is, in general, true. However, while the first state- ment is true *in principle*, the second is true only statistically. People are exquisitely sensitive to this principled-vs-statistical distinction. It has been argued that this ability to distinguish between something being true by virtue of it being a category member versus being true because of mere statistical regularity, is a general property of people’s conceptual machinery and cannot itself be learned. We investigate whether the distinction between principled and statistical properties can be learned from language itself. If so, it raises the possibility that language experience can bootstrap core conceptual dis- tinctions and that it is possible to learn sophisticated causal models directly from language. We find that language models are all sensitive to statistical prevalence, but struggle with representing the principled-vs-statistical dis- tinction controlling for prevalence. Until GPT-4, which succeeds. more »

Award ID(s):: 2020969

PAR ID:: 10547759

Author(s) / Creator(s):: Hu, Z; van_Paridon, J; Lupyan, G

Editor(s):: Nölle, J; Raviv, L; Graham, E; Hartmann, S; Jadoul, Y; Josserand, M; Matzinger, T; Mudd, K; Pleyer, M; Slonimska, A; Wacewicz, S; Watson, S

Publisher / Repository:: The Evolution of Language: Proceedings of the 15th International Conference (Evolang XV)

Date Published:: 2024-07-08

Format(s):: Medium: X

Location:: Madison, WI

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this