SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation

Zhao, K; Yang, B; Tang, C; Lin, C; Zhan, L

Citation Details

The long-standing one-to-many problem of gold standard responses in open-domain dialogue systems presents challenges for automatic evaluation metrics. Though prior works have demonstrated some success by applying powerful Large Language Models (LLMs), existing approaches still struggle with the one-to-many problem, and exhibit subpar performance in domain-specific scenarios. We assume the commonsense reasoning biases within LLMs may hinder their performance in domain-specific evaluations. To address both issues, we propose a novel framework SLIDE (Small and Large Integrated for Dialogue Evaluation), that leverages both a small, specialized model (SLM), and LLMs for the evaluation of open-domain dialogues. Our approach introduces several techniques: (1) Contrastive learning to differentiate between robust and non-robust response embeddings; (2) A novel metric for semantic sensitivity that combines embedding cosine distances with similarity learned through neural networks, and (3) A strategy for incorporating the evaluation results from both the SLM and LLMs. Our empirical results demonstrate that our approach achieves state-of-the-art performance in both the classification and evaluation tasks, and additionally, the SLIDE evaluator exhibits a better correlation with human judgments. more »

Award ID(s):: 2045848 2319450

PAR ID:: 10518872

Author(s) / Creator(s):: Zhao, K; Yang, B; Tang, C; Lin, C; Zhan, L

Publisher / Repository:: The 62nd Annual Meeting of the Association for Computational Linguistics

Date Published:: 2024-08-16

Format(s):: Medium: X

Location:: Bangkok, Thailand

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this