Are Models Biased on Text without Gender-related Language?

Belem, Catarina; Seshadri, Preethi; Razeghi, Yasaman; Singh, Sameer

Citation Details

We introduce UnStereoEval (USE), a novel framework tailored for investigating gender bias in stereotype-free scenarios. USE defines a sentence-level score based on pretraining data statistics to determine if the sentence contain minimal word-gender associations. To systematically benchmark the fairness of popular language models in stereotype-free scenarios, we utilize USE to automatically generate benchmarks without any gender-related language. By leveraging USE's sentence-level score, we also repurpose prior gender bias benchmarks (Winobias and Winogender) for non-stereotypical evaluation. Surprisingly, we find low fairness across all 28 evaluated models. Concretely, models demonstrate fair behavior in only 9%-41% of stereotype-free sentences, suggesting that bias does not solely stem from the presence of gender-related words. These results raise important questions about where underlying model biases come from and highlight the need for more systematic and comprehensive bias evaluation. more »

Award ID(s):: 2046873 2040989

PAR ID:: 10526344

Author(s) / Creator(s):: Belem, Catarina; Seshadri, Preethi; Razeghi, Yasaman; Singh, Sameer

Publisher / Repository:: International Conference on Learning Representations (ICLR)

Date Published:: 2024-05-01

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this