Language is increasingly being used to define rich visual recognition problems with supporting image collections sourced from the web. Structured prediction models are used in these tasks to take advantage of correlations between co-occurring labels and visual input but risk inadvertently encoding social biases found in web corpora. In this work, we study data and models associated with multilabel object classification and visual semantic role labeling. We find that (a) datasets for these tasks contain significant gender bias and (b) models trained on these datasets further amplify existing bias. For example, the activity cooking is over 33% more likely to involve females than males in a training set, and a trained model further amplifies the disparity to 68% at test time. We propose to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for collective inference. Our method results in almost no performance loss for the underlying recognition task but decreases the magnitude of bias amplification by 47.5% and 40.5% for multilabel classification and visual semantic role labeling, respectively
more »
« less
The Bias Amplification Paradox in Text-to-Image Generation
Bias amplification is a phenomenon in which models exacerbate biases or stereotypes present in the training data. In this paper, we study bias amplification in the text-to-image domain using Stable Diffusion by comparing gender ratios in training vs. generated images. We find that the model appears to amplify gender-occupation biases found in the training data (LAION) considerably. However, we discover that amplification can be largely attributed to discrepancies between training captions and model prompts. For example, an inherent difference is that captions from the training data often contain explicit gender information while our prompts do not, which leads to a distribution shift and consequently inflates bias measures. Once we account for distributional differences between texts used for training and generation when evaluating amplification, we observe that amplification decreases drastically. Our findings illustrate the challenges of comparing biases in models and their training data, as well as evaluation more broadly, and highlight how confounding factors can impact analyses.
more »
« less
- PAR ID:
- 10526005
- Publisher / Repository:
- Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT))
- Date Published:
- Format(s):
- Medium: X
- Location:
- Mexico City, Mexico
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Large Language Models (LLMs) perpetuate social biases, reflecting prejudices in their training data and reinforcing societal stereotypes and inequalities. Our work explores the potential of the Contact Hypothesis, a concept from social psychology for debiasing LLMs. We simulate various forms of social contact through LLM prompting to measure their influence on the model’s biases, mirroring how intergroup interactions can reduce prejudices in social contexts. We create a dataset of 108,000 prompts following a principled approach replicating social contact to measure biases in three LLMs (LLaMA 2, Tulu, and NousHermes) across 13 social bias dimensions. We propose a unique debiasing technique, Social Contact Debiasing (SCD), that instruction-tunes these models with unbiased responses to prompts. Our research demonstrates that LLM responses exhibit social biases when subject to contact probing, but more importantly, these biases can be significantly reduced by up to 40% in 1 epoch of instruction tuning LLaMA 2 following our SCD strategy.more » « less
-
Mental health stigma manifests differently for different genders, often being more associated with women and overlooked with men. Prior work in NLP has shown that gendered mental health stigmas are captured in large language models (LLMs). However, in the last year, LLMs have changed drastically: newer, generative models not only require different methods for measuring bias, but they also have become widely popular in society, interacting with millions of users and increasing the stakes of perpetuating gendered mental health stereotypes. In this paper, we examine gendered mental health stigma in GPT3.5-Turbo, the model that powers OpenAI’s popular ChatGPT. Building off of prior work, we conduct both quantitative and qualitative analyses to measure GPT3.5-Turbo’s bias between binary genders, as well as to explore its behavior around non-binary genders, in conversations about mental health. We find that, though GPT3.5-Turbo refrains from explicitly assuming gender, it still contains implicit gender biases when asked to complete sentences about mental health, consistently preferring female names over male names. Additionally, though GPT3.5-Turbo shows awareness of the nuances of non-binary people’s experiences, it often over-fixates on non-binary gender identities in free-response prompts. Our preliminary results demonstrate that while modern generative LLMs contain safeguards against blatant gender biases and have progressed in their inclusiveness of non-binary identities, they still implicitly encode gendered mental health stigma, and thus risk perpetuating harmful stereotypes in mental health contexts.more » « less
-
Contextual word embeddings such as BERT have achieved state of the art performance in numerous NLP tasks. Since they are optimized to capture the statistical properties of training data, they tend to pick up on and amplify social stereotypes present in the data as well. In this study, we (1) propose a template-based method to quantify bias in BERT; (2) show that this method obtains more consistent results in capturing social biases than the traditional cosine based method; and (3) conduct a case study, evaluating gender bias in a downstream task of Gender Pronoun Resolution. Although our case study focuses on gender bias, the proposed technique is generalizable to unveiling other biases, including in multiclass settings, such as racial and religious biases.more » « less
-
We employ an inversion-based approach to examine CLIP models. Our examination reveals that inverting CLIP models results in the generation of images that exhibit semantic alignment with the specified target prompts. We leverage these inverted images to gain insights into various aspects of CLIP models, such as their ability to blend concepts and inclusion of gender biases. We notably observe instances of NSFW (Not Safe For Work) images during model inversion. This phenomenon occurs even for semantically innocuous prompts, like "a beautiful landscape," as well as for prompts involving the names of celebrities.more » « less
An official website of the United States government

