Adjudicating LLMs as PropBank Adjudicators

Bonn, Julia; Madabushi, Harish Tayyar; Hwang, Jena D; Bonial, Claire

Citation Details

We evaluate the ability of large language models (LLMs) to provide PropBank semantic role label annotations across different realizations of the same verbs in transitive, intransitive, and middle voice constructions. In order to assess the meta-linguistic capabilities of LLMs as well as their ability to glean such capabilities through in-context learning, we evaluate the models in a zero-shot setting, in a setting where it is given three examples of another verb used in transitive, intransitive, and middle voice constructions, and finally in a setting where it is given the examples as well as the correct sense and roleset information. We find that zero-shot knowledge of PropBank annotation is almost nonexistent. The largest model evaluated, GPT-4, achieves the best performance in the setting where it is given both examples and the correct roleset in the prompt, demonstrating that larger models can ascertain some meta-linguistic capabilities through in-context learning. However, even in this setting, which is simpler than the task of a human in PropBank annotation, the model achieves only 48% accuracy in marking numbered arguments correctly. To ensure transparency and reproducibility, we publicly release our dataset and model responses. more »

Award ID(s):: 2213805

PAR ID:: 10527721

Author(s) / Creator(s):: Bonn, Julia; Madabushi, Harish Tayyar; Hwang, Jena D; Bonial, Claire

Editor(s):: Bonial, Claire; Bonn, Julia; Hwang, Jena D

Publisher / Repository:: ELRA and ICCL

Date Published:: 2024-05-01

Format(s):: Medium: X

Location:: https://aclanthology.org/2024.dmr-1.12

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this