Perspectives on Large Language Models for Relevance Judgment

Faggioli, Guglielmo; Dietz, Laura; Clarke, Charles L.; Demartini, Gianluca; Hagen, Matthias; Hauff, Claudia; Kando, Noriko; Kanoulas, Evangelos; Potthast, Martin; Stein, Benno; Wachsmuth, Henning

doi:10.1145/3578337.3605136

Citation Details

Perspectives on Large Language Models for Relevance Judgment

When asked, large language models (LLMs) like ChatGPT claim that they can assist with relevance judgments but it is not clear whether automated judgments can reliably be used in evaluations of retrieval systems. In this perspectives paper, we discuss possible ways for LLMs to support relevance judgments along with concerns and issues that arise. We devise a human–machine collaboration spectrum that allows to categorize different relevance judgment strategies, based on how much humans rely on machines. For the extreme point of ‘fully automated judgments’, we further include a pilot experiment on whether LLM-based relevance judgments corre- late with judgments from trained human assessors. We conclude the paper by providing opposing perspectives for and against the use of LLMs for automatic relevance judgments, and a compromise per- spective, informed by our analyses of the literature, our preliminary experimental evidence, and our experience as IR researchers more »

Award ID(s):: 1846017

PAR ID:: 10473538

Author(s) / Creator(s):: Faggioli, Guglielmo; Dietz, Laura; Clarke, Charles L.; Demartini, Gianluca; Hagen, Matthias; Hauff, Claudia; Kando, Noriko; Kanoulas, Evangelos; Potthast, Martin; Stein, Benno; Wachsmuth, Henning

Publisher / Repository:: ACM

Date Published:: 2023-08-09

ISBN:: 9798400700736

Page Range / eLocation ID:: 39 to 50

Subject(s) / Keyword(s):: large language models, relevance judgments, human–machine collaboration, automatic test collections

Format(s):: Medium: X

Location:: Taipei Taiwan

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3578337.3605136

More Like this