skip to main content


Title: MISGENDERED: Limits of Large Language Models in Understanding Pronouns
Gender bias in language technologies has been widely studied, but research has mostly been restricted to a binary paradigm of gender. It is essential also to consider non-binary gender identities, as excluding them can cause further harm to an already marginalized group. In this paper, we comprehensively evaluate popular language models for their ability to correctly use English gender-neutral pronouns (e.g., singular they, them) and neo-pronouns (e.g., ze, xe, thon) that are used by individuals whose gender identity is not represented by binary pronouns. We introduce Misgendered, a framework for evaluating large language models’ ability to correctly use preferred pronouns, consisting of (i) instances declaring an individual’s pronoun, followed by a sentence with a missing pronoun, and (ii) an experimental setup for evaluating masked and auto-regressive language models using a unified method. When prompted out-of-the-box, language models perform poorly at correctly predicting neo-pronouns (averaging 7.6% accuracy) and gender-neutral pronouns (averaging 31.0% accuracy). This inability to generalize results from a lack of representation of non-binary pronouns in training data and memorized associations. Few-shot adaptation with explicit examples in the prompt improves the performance but plateaus at only 45.4% for neo-pronouns. We release the full dataset, code, and demo at https://tamannahossainkay.github.io/misgendered/.  more » « less
Award ID(s):
2046873
NSF-PAR ID:
10462496
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Page Range / eLocation ID:
5352 to 5367
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Arnold, J. E., Mayo, H., & Dong, L. (2020). Individual differences (or the lack of them) in comprehension of singular they. Technical Report #3. UNC Language Processing Lab, Department of Psychology & Neuroscience, University of North Carolina – Chapel Hill, Chapel Hill, North Carolina. The pronoun “they” can refer to an individual who identifies as nonbinary, but it also is commonly used as a plural pronoun. How do listeners identify whether “they” is being used in a singular or plural sense? Arnold, Mayo, & Dong (in press) report three experiments in that test the role of explicitly introducing gender identity via pronouns, e.g. “This is Alex, and they use they/them pronouns.” Participants read short stories like “Alex went running with Liz and they fell down.” Answers to “Who fell down” indicated whether participants interpreted they as Alex or Alex-and-Liz. Singular interpretations of they were more likely when participants hear an explicit statement that Alex uses they/them pronouns, and in supporting discourse contexts. This paper is a companion to the main article, and reports analyses of individual difference measures. Participants self-reported familiarity with individuals who identify as nonbinary, which was expected to increase singular interpretations, but mostly it did not. In experiment 2 we also measured print exposure, but we found that it did not affect interpretation of singular they. In short, we saw virtually no effects of individual difference predictors. 
    more » « less
  2. he pronoun “they” can refer to an individual who identifies as nonbinary, but it also is commonly used as a plural pronoun. How do listeners identify whether “they” is being used in a singular or plural sense? Arnold, Mayo, & Dong (in press) report three experiments in that test the role of explicitly introducing gender identity via pronouns, e.g. “This is Alex, and they use they/them pronouns.” Participants read short stories like “Alex went running with Liz and they fell down.” Answers to “Who fell down” indicated whether participants interpreted they as Alex or Alex-and-Liz. Singular interpretations of they were more likely when participants hear an explicit statement that Alex uses they/them pronouns, and in supporting discourse contexts. This paper is a companion to the main article, and reports analyses of individual difference measures. Participants self-reported familiarity with individuals who identify as nonbinary, which was expected to increase singular interpretations, but mostly it did not. In experiment 2 we also measured print exposure, but we found that it did not affect interpretation of singular they. In short, we saw virtually no effects of individual difference predictors. 
    more » « less
  3. null (Ed.)
    The pronoun “they” can be either plural or singular, perhaps referring to an individual who identifies as nonbinary. How do listeners identifywhether “they” has a singular or plural sense? We test the role of explicitly discussing pronouns (e.g., “Alex uses they/them pronouns”). In three experiments, participants read short stories, like “Alex went running with Liz. They fell down.” Answers to “Who fell down” indicated whether participants interpreted they as Alex or Alex-and-Liz. We found more singular responses in discourse contexts that make Alex more available: when Alex was either the only person in the context or mentioned first. Critically, the singular interpretation was stronger when participants heard explicit instructions that Alex uses they/them pronouns, even though participants in all conditions had ample opportunity to learn this fact through observation. Results show that the social trend to talk about pronouns has a direct impact on how language is understood. 
    more » « less
  4. Over the past few years, pronoun lists have become more prevalent in online spaces. Currently, various LGBT+ activists, universities, and corporations encourage people to share their preferred pronouns. Little research exists examining the characteristics of individuals who do publicly share their preferred pronouns. Using Twitter bios from the US between early 2015 and June 30, 2022, we explored users’ expression of preferred pronouns. First, we noted the prevalence of users with pronoun lists within their bio has increased substantially. Second, we observed that certain linguistic tokens systematically co-occurred with pronoun lists. Specifically, tokens associated with left-wing politics, gender or sexual identity, and social media argot co-occurred disproportionately often alongside pronoun lists, while tokens associated with right-wing politics, religion, sports, and finance co-occurred infrequently. Additionally, we discovered clustering among Twitter users with pronouns in their bios. Specifically, we found an above-average proportion of the followers and friends of Twitter users with pronouns in their bio also had pronouns in their bios. Twitter users who did not share their preferred pronouns, on the other hand, were disproportionately unlikely to be connected with Twitter users who did. 
    more » « less
  5. This study investigates pronoun interpretation by second language (L2) learners of English, focusing on whether first language (L1) transfer and/or processing difficulty affect L2 learners’ pronoun resolution. It is hypothesized that L2 learners’ non-target performance in L2-pronoun interpretation is attributable to two sources. The first is the computational complexity required for pronoun resolution, as argued in L1 acquisition by Grodzinsky and Reinhart and L2 acquisition by Slabakova et al. The second is how pronoun interpretation operates in L1. The hypothesis is tested by comparing Korean and Spanish L2-English learners’ interpretation of English pronouns using a Truth Value Judgment Task. Both groups had difficulty rejecting pronouns with local-referential antecedents when their proficiency levels were low. Additionally, Korean speakers showed more non-target responses than Spanish speakers due to their knowledge of pronoun interpretation in Korean. These results indicate that both L1 transfer and processing difficulty may be sources of L2 learners’ non-target pronoun interpretation, supporting the hypothesis of the study.

     
    more » « less