CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models

Wang, Song; Wang, Peng; Zhou, Tong; Dong, Yushun; Tan, Zhen; Li, Jundong

Citation Details

This content will become publicly available on April 24, 2026

CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models

As Large Language Models (LLMs) are increasingly deployed to handle various natural language processing (NLP) tasks, concerns regarding the potential negative societal impacts of LLM-generated content have also arisen. To evaluate the biases exhibited by LLMs, researchers have recently proposed a variety of datasets. However, existing bias evaluation efforts often focus on only a particular type of bias and employ inconsistent evaluation metrics, leading to difficulties in comparison across different datasets and LLMs. To address these limitations, we collect a variety of datasets designed for the bias evaluation of LLMs, and further propose CEB, a Compositional Evaluation Bechmark that covers different types of bias across different social groups and tasks. The curation of CEB is based on our newly proposed compositional taxonomy, which characterizes each dataset from three dimensions: bias types, social groups, and tasks. By combining the three dimensions, we develop a comprehensive evaluation strategy for the bias in LLMs. Our experiments demonstrate that the levels of bias vary across these dimensions, thereby providing guidance for the development of specific bias mitigation methods. more »

Award ID(s):: 2411248 2223769 2228534 2154962 2144209 2006844

PAR ID:: 10612762

Author(s) / Creator(s):: Wang, Song; Wang, Peng; Zhou, Tong; Dong, Yushun; Tan, Zhen; Li, Jundong

Publisher / Repository:: International Conference on Learning Representations

Date Published:: 2025-04-24

Format(s):: Medium: X

Location:: Singapore

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on April 24, 2026
Conference Paper:
The DOI is not currently available.

More Like this