Testing Causal Models of Word Meaning in LLMs

Musker, Samuel; Pavlick, Ellie

Citation Details

Large Language Models (LLMs) have driven extraordinary improvements in NLP. However, it is unclear how such models represent lexical concepts-i.e., the meanings of the words they use. We evaluate the lexical representations of GPT-4, GPT-3, and Falcon-40B through the lens of HIPE theory, a concept representation theory focused on words describing artifacts (such as ‚Äúmop‚Äù, ‚Äúpencil‚Äù, and ‚Äúwhistle‚Äù). The theory posits a causal graph relating the meanings of such words to the form, use, and history of the referred objects. We test LLMs with the stimuli used by Chaigneau et al. (2004) on human subjects, and consider a variety of prompt designs. Our experiments concern judgements about causal outcomes, object function, and object naming. We do not find clear evidence that GPT-3 or Falcon-40B encode HIPE's causal structure, but find evidence that GPT-4 does. The results contribute to a growing body of research characterizing the representational capacity of LLMs. more »

Award ID(s):: 1956221

PAR ID:: 10559551

Author(s) / Creator(s):: Musker, Samuel; Pavlick, Ellie

Publisher / Repository:: Proceedings of the Annual Meeting of the Cognitive Science Society Volume 46

Date Published:: 2024-07-01

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this