Recent work in fairness in machine learning has proposed adjusting for fairness by equalizing accuracy metrics across groups and has also studied how datasets affected by historical prejudices may lead to unfair decision policies. We connect these lines of work and study the residual unfairness that arises when a fairness-adjusted predictor is not actually fair on the target population due to systematic censoring of training data by existing biased policies. This scenario is particularly common in the same applications where fairness is a concern. We characterize theoretically the impact of such censoring on standard fairness metrics for binary classifiers and provide criteria for when residual unfairness may or may not appear. We prove that, under certain conditions, fairness-adjusted classifiers will in fact induce residual unfairness that perpetuates the same injustices, against the same groups, that biased the data to begin with, thus showing that even state-of-the-art fair machine learning can have a "bias in, bias out" property. When certain benchmark data is available, we show how sample reweighting can estimate and adjust fairness metrics while accounting for censoring. We use this to study the case of Stop, Question, and Frisk (SQF) and demonstrate that attempting to adjust for fairness perpetuates the same injustices that the policy is infamous for.
more »
« less
This content will become publicly available on June 3, 2025
Recommend Me? Designing Fairness Metrics with Providers
Fairness metrics have become a useful tool to measure how fair or unfair a machine learning system may be for its stakeholders. In the context of recommender systems, previous research has explored how various stakeholders experience algorithmic fairness or unfairness, but it is also important to capture these experiences in the design of fairness metrics. Therefore, we conducted four focus groups with providers (those whose items, content, or profiles are being recommended) of two different domains: content creators and dating app users. We explored how our participants experience unfairness on their associated platforms, and worked with them to co-design fairness goals, definitions, and metrics that might capture these experiences. This work represents an important step towards designing fairness metrics with the stakeholders who will be impacted by their operationalizations. We analyze the efficacy and challenges of enacting these metrics in practice and explore how future work might benefit from this methodology.
more »
« less
- Award ID(s):
- 2107577
- PAR ID:
- 10545004
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400704505
- Page Range / eLocation ID:
- 2389 to 2399
- Format(s):
- Medium: X
- Location:
- Rio de Janeiro Brazil
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Recommender systems have a variety of stakeholders. Applying concepts of fairness in such systems requires attention to stakeholders’ complex and often-conflicting needs. Since fairness is socially constructed, there are numerous definitions, both in the social science and machine learning literatures. Still, it is rare for machine learning researchers to develop their metrics in close consideration of their social context. More often, standard definitions are adopted and assumed to be applicable across contexts and stakeholders. Our research starts with a recommendation context and then seeks to understand the breadth of the fairness considerations of associated stakeholders. In this paper, we report on the results of a semi-structured interview study with 23 employees who work for the Kiva microlending platform. We characterize the many different ways in which they enact and strive toward fairness for microlending recommendations in their own work, uncover the ways in which these different enactments of fairness are in tension with each other, and identify how stakeholders are differentially prioritized. Finally, we reflect on the implications of this study for future research and for the design of multistakeholder recommender systems.more » « less
-
null (Ed.)Ranking evaluation metrics play an important role in information retrieval, providing optimization objectives during development and means of assessment of deployed performance. Recently, fairness of rankings has been recognized as crucial, especially as automated systems are increasingly used for high impact decisions. While numerous fairness metrics have been proposed, a comparative analysis to understand their interrelationships is lacking. Even for fundamental statistical parity metrics which measure group advantage, it remains unclear whether metrics measure the same phenomena, or when one metric may produce different results than another. To address these open questions, we formulate a conceptual framework for analytical comparison of metrics.We prove that under reasonable assumptions, popular metrics in the literature exhibit the same behavior and that optimizing for one optimizes for all. However, our analysis also shows that the metrics vary in the degree of unfairness measured, in particular when one group has a strong majority. Based on this analysis, we design a practical statistical test to identify whether observed data is likely to exhibit predictable group bias. We provide a set of recommendations for practitioners to guide the choice of an appropriate fairness metric.more » « less
-
For applications where multiple stakeholders provide recommendations, a fair consensus ranking must not only ensure that the preferences of rankers are well represented, but must also mitigate disadvantages among socio-demographic groups in the final result. However, there is little empirical guidance on the value or challenges of visualizing and integrating fairness metrics and algorithms into human-in-the-loop systems to aid decision-makers. In this work, we design a study to analyze the effectiveness of integrating such fairness metrics-based visualization and algorithms. We explore this through a task-based crowdsourced experiment comparing an interactive visualization system for constructing consensus rankings, ConsensusFuse, with a similar system that includes visual encodings of fairness metrics and fair-rank generation algorithms, FairFuse. We analyze the measure of fairness, agreement of rankers’ decisions, and user interactions in constructing the fair consensus ranking across these two systems. In our study with 200 participants, results suggest that providing these fairness-oriented support features nudges users to align their decision with the fairness metrics while minimizing the tedious process of manually having to amend the consensus ranking. We discuss the implications of these results for the design of next-generation fairness oriented-systems and along with emerging directions for future research.more » « less
-
Abstract To increase participation of students of color in science graduate programs, research has focused on illuminating student experiences to inform ways to improve them. In biology, Black students are vastly underrepresented, and while religion has been shown to be a particularly important form of cultural wealth for Black students, Christianity is stigmatized in biology. Very few studies have explored the intersection of race/ethnicity and Christianity for Black students in biology where there is high documented tension between religion and science. Since graduate school is important for socialization and Black students are likely to experience stigmatization of their racial and religious identity, it is important to understand their experiences and how we might be able to improve them. Thus, we interviewed 13 Black Christian students enrolled in biology graduate programs and explored their experiences using the theoretical lens of stigmatized identities. Through thematic content analysis, we revealed that students negotiated experiences of cultural isolation, devaluation of intelligence, and acts of bias like other racially minoritized students in science. However, by examining these experiences at the intersection of race/ethnicity and religion, we shed light on interactions students have had with faculty and peers within the biology community that cultivated perceptions of mistrust, conflict, and stigma. Our study also revealed ways in which students' religious/spiritual capital has positively supported their navigation through biology graduate school. These results contribute to a deeper understanding of why Black Christian graduate students are more likely to leave or not pursue advanced degrees in biology with implications for research and practice that help facilitate their success.more » « less