Recent developments in Neural Relation Extraction (NRE) have made significant strides towards Automated Knowledge Base Construction (AKBC). While much attention has been dedicated towards improvements in accuracy, there have been no attempts in the literature to our knowledge to evaluate social biases in NRE systems. We create WikiGenderBias, a distantly supervised dataset with a human annotated test set. WikiGenderBias has sentences specifically curated to analyze gender bias in relation extraction systems. We use WikiGenderBias to evaluate systems for bias and find that NRE systems exhibit gender biased predictions and lay groundwork for future evaluation of bias in NRE. We also analyze how name anonymization, hard debiasing for word embeddings, and counterfactual data augmentation affect gender bias in predictions and performance.
more »
« less
Towards Understanding Gender Bias in Relation Extraction
Recent developments in Neural Relation Extraction (NRE) have made significant strides towards Automated Knowledge Base Construction. While much attention has been dedicated towards improvements in accuracy, there have been no attempts in the literature to evaluate social biases exhibited in NRE systems. In this paper, we create WikiGenderBias, a distantly supervised dataset composed of over 45,000 sentences including a 10% human annotated test set for the purpose of analyzing gender bias in relation extraction systems. We find that when extracting spouse-of and hypernym (i.e., occupation) relations, an NRE system performs differently when the gender of the target entity is different. However, such disparity does not appear when extracting relations such as birthDate or birthPlace. We also analyze how existing bias mitigation techniques, such as name anonymization, word embedding debiasing, and data augmentation affect the NRE system in terms of maintaining the test performance and reducing biases. Unfortunately, due to NRE models rely heavily on surface level cues, we find that existing bias mitigation approaches have a negative effect on NRE. Our analysis lays groundwork for future quantifying and mitigating bias in NRE.
more »
« less
- Award ID(s):
- 1927554
- PAR ID:
- 10192194
- Date Published:
- Journal Name:
- Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
- Page Range / eLocation ID:
- 2943 to 2953
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Knowledge graphs (KGs) are of great importance in various artificial intelligence systems, such as question answering, relation extraction, and recommendation. Nevertheless, most real-world KGs are highly incomplete, with many missing relations between entities. To discover new triples (i.e., head entity, relation, tail entity), many KG completion algorithms have been proposed in recent years. However, a vast majority of existing studies often require a large number of training triples for each relation, which contradicts the fact that the frequency distribution of relations in KGs often follows a long tail distribution, meaning a majority of relations have only very few triples. Meanwhile, since most existing large-scale KGs are constructed automatically by extracting information from crowd-sourcing data using heuristic algorithms, plenty of errors could be inevitably incorporated due to the lack of human verification, which greatly reduces the performance for KG completion. To tackle the aforementioned issues, in this paper, we study a novel problem of error-aware few-shot KG completion and present a principled KG completion framework REFORM. Specifically, we formulate the problem under the few-shot learning framework, and our goal is to accumulate meta-knowledge across different meta-tasks and generalize the accumulated knowledge to the meta-test task for error-aware few-shot KG completion. To address the associated challenges resulting from insufficient training samples and inevitable errors, we propose three essential modules neighbor encoder, cross-relation aggregation, and error mitigation in each meta-task. Extensive experiments on three widely used KG datasets demonstrate the superiority of the proposed framework REFORM over competitive baseline methods.more » « less
-
To recognize and mitigate the harms of generative AI systems, it is crucial to consider who is represented in the outputs of generative AI systems and how people are represented. A critical gap emerges when naively improving who is represented, as this does not imply bias mitigation efforts have been applied to address how people are represented. We critically examined this by investigating gender representation in occupation across state-of-the-art large language models. We first show evidence suggesting that over time there have been interventions to models altering the resulting gender distribution, and we find that women are more represented than men when models are prompted to generate biographies or personas. We then demonstrate that representational biases persist in how different genders are represented by examining statistically significant word differences across genders. This results in a proliferation of representational harms, stereotypes, and neoliberalism ideals that, despite existing interventions to increase female representation, reinforce existing systems of oppression.more » « less
-
null (Ed.)Mixed-initiative visual analytics systems incorporate well-established design principles that improve users' abilities to solve problems. As these systems consider whether to take initiative towards achieving user goals, many current systems address the potential for cognitive bias in human initiatives statically, relying on fixed initiatives they can take instead of identifying, communicating and addressing the bias as it occurs. We argue that mixed-initiative design principles can and should incorporate cognitive bias mitigation strategies directly through development of mitigation techniques embedded in the system to address cognitive biases in situ. We identify domain experts in machine learning adopting visual analytics techniques and systems that incorporate existing mixed-initiative principles and examine their potential to support bias mitigation strategies. This examination considers the unique perspective these experts bring to visual analytics and is situated in existing user-centered systems that make exemplary use of design principles informed by cognitive theory. We then suggest informed opportunities for domain experts to take initiative toward addressing cognitive biases in light of their existing contributions to the field. Finally, we contribute open questions and research directions for designers seeking to adopt visual analytics techniques that incorporate bias-aware initiatives in future systems.more » « less
-
NA (Ed.)Facial attribute classification algorithms frequently manifest demographic biases by obtaining differential performance across gender and racial groups. Existing bias mitigation techniques are mostly in-processing techniques, i.e., implemented during the classifier’s training stage, that often lack generalizability, require demographically annotated training sets, and exhibit a trade-off between fairness and classification accuracy. In this paper, we propose a technique to mitigate bias at the test time i.e., during the deployment stage, by harnessing prediction uncertainty and human–machine partnership. To this front, we propose to utilize those lowest percentages of test data samples identified as outliers with high prediction uncertainty. These identified uncertain samples at test-time are labeled by human analysts for decision rendering and for subsequently retraining the deep neural network in a continual learning framework. With minimal human involvement and through iterative refinement of the network with human guidance at test-time, we seek to enhance the accuracy as well as the fairness of the already deployed facial attribute classification algorithms. Extensive experiments are conducted on gender and smile attribute classification tasks using four publicly available datasets and with gender and race as the protected attributes. The obtained outcomes consistently demonstrate improved accuracy by up to 2% and 5% for the gender and smile attribute classification tasks, respectively, using our proposed approaches. Further, the demographic bias was significantly reduced, outperforming the State-of-the-Art (SOTA) bias mitigation and baseline techniques by up to 55% for both classification tasks.more » « less
An official website of the United States government

