skip to main content

This content will become publicly available on July 11, 2023

Title: Measuring Fairness in Ranked Results: An Analytical and Empirical Comparison
Information access systems, such as search and recommender systems, often use ranked lists to present results believed to be relevant to the user’s information need. Evaluating these lists for their fairness along with other traditional metrics provide a more complete understanding of an information access system’s behavior beyond accuracy or utility constructs. To measure the (un)fairness of rankings, particularly with respect to protected group(s) of producers or providers, several metrics have been proposed in the last several years. However, an empirical and comparative analyses of these metrics showing the applicability to specific scenario or real data, conceptual similarities, and differences is still lacking. We aim to bridge the gap between theoretical and practical application of these metrics. In this paper we describe several fair ranking metrics from the existing literature in a common notation, enabling direct comparison of their approaches and assumptions, and empirically compare them on the same experimental setup and data sets in the context of three information access tasks. We also provide a sensitivity analysis to assess the impact of the design choices and parameter settings that go in to these metrics and point to additional work needed to improve fairness measurement.
Award ID(s):
Publication Date:
Journal Name:
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
Sponsoring Org:
National Science Foundation
More Like this
  1. Ranking evaluation metrics play an important role in information retrieval, providing optimization objectives during development and means of assessment of deployed performance. Recently, fairness of rankings has been recognized as crucial, especially as automated systems are increasingly used for high impact decisions. While numerous fairness metrics have been proposed, a comparative analysis to understand their interrelationships is lacking. Even for fundamental statistical parity metrics which measure group advantage, it remains unclear whether metrics measure the same phenomena, or when one metric may produce different results than another. To address these open questions, we formulate a conceptual framework for analytical comparisonmore »of metrics.We prove that under reasonable assumptions, popular metrics in the literature exhibit the same behavior and that optimizing for one optimizes for all. However, our analysis also shows that the metrics vary in the degree of unfairness measured, in particular when one group has a strong majority. Based on this analysis, we design a practical statistical test to identify whether observed data is likely to exhibit predictable group bias. We provide a set of recommendations for practitioners to guide the choice of an appropriate fairness metric.« less
  2. Single-cell RNA sequencing (scRNA-seq) data provides unprecedented information on cell fate decisions; however, the spatial arrangement of cells is often lost. Several recent computational methods have been developed to impute spatial information onto a scRNA-seq dataset through analyzing known spatial expression patterns of a small subset of genes known as a reference atlas. However, there is a lack of comprehensive analysis of the accuracy, precision, and robustness of the mappings, along with the generalizability of these methods, which are often designed for specific systems. We present a system-adaptive deep learning-based method (DEEPsc) to impute spatial information onto a scRNA-seq datasetmore »from a given spatial reference atlas. By introducing a comprehensive set of metrics that evaluate the spatial mapping methods, we compare DEEPsc with four existing methods on four biological systems. We find that while DEEPsc has comparable accuracy to other methods, an improved balance between precision and robustness is achieved. DEEPsc provides a data-adaptive tool to connect scRNA-seq datasets and spatial imaging datasets to analyze cell fate decisions. Our implementation with a uniform API can serve as a portal with access to all the methods investigated in this work for spatial exploration of cell fate decisions in scRNA-seq data. All methods evaluated in this work are implemented as an open-source software with a uniform interface.« less
  3. Fairness is increasingly recognized as a critical component of machine learning systems. However, it is the underlying data on which these systems are trained that often reflect discrimination, suggesting a database repair problem. Existing treatments of fairness rely on statistical correlations that can be fooled by statistical anomalies, such as Simpson's paradox. Proposals for causality-based definitions of fairness can correctly model some of these situations, but they require specification of the underlying causal models. In this paper, we formalize the situation as a database repair problem, proving sufficient conditions for fair classifiers in terms of admissible variables as opposed tomore »a complete causal model. We show that these conditions correctly capture subtle fairness violations. We then use these conditions as the basis for database repair algorithms that provide provable fairness guarantees about classifiers trained on their training labels. We evaluate our algorithms on real data, demonstrating improvement over the state of the art on multiple fairness metrics proposed in the literature while retaining high utility.« less
  4. This paper provides detailed information for a poster that will be presented in the National Science Foundation (NSF) Grantees Poster Session during the 2020 ASEE Annual Conference & Exposition. The poster describes the progress and the state of an NSF Scholarships in Science, Technology, Engineering, and Math (S-STEM) project. The objectives of this project are to 1) enhance student learning by providing access to extra- and co-curricular experiences, 2) create a positive student experience through mentorship, and 3) ensure successful student placement in the STEM workforce or graduate school. S-STEM Scholars supported by this program receive financial, academic, professional, andmore »social development via various evidence-based activities integrated throughout their four-year undergraduate degrees beginning during the summer prior to starting at the University. The paper describes the characteristics (demographics, high school GPA, ACT/SAT scores, etc.) of the Scholars supported by the S-STEM grant. The paper also provides information about the completed tasks of the project to date. The completed tasks include a system for recruiting academically talented and economically disadvantaged students, a Summer Bridge Program (SBP), a first semester introductory engineering course, and a system to recruit and maintain faculty mentors. The ongoing tasks include the execution of a service learning project course and a system for recruiting industry mentors. This paper reports detailed assessment and evaluation data about different project tasks and the academic success metrics of the Scholars. It also lists a set of recommendations based on the lessons learned in this S-STEM project.« less
  5. Dynamic Information Flow Tracking (DIFT), also called Dynamic Taint Analysis (DTA), is a technique for tracking the information as it flows through a program's execution. Specifically, some inputs or data get tainted and then these taint marks (tags) propagate usually at the instruction-level. While DIFT has been a fundamental concept in computer and network security for the past decade, it still faces open challenges that impede its widespread application in practice; one of them being the indirect flow propagation dilemma: should the tags involved in an indirect flow, e.g., in a control or address dependency, be propagated? Propagating all thesemore »tags, as is done for direct flows, leads to overtainting (all taintable objects become tainted), while not propagating them leads to undertainting (information flow becomes incomplete). In this paper, we analytically model that decisioning problem for indirect flows, by considering various tradeoffs including undertainting versus overtainting, importance of heterogeneous code semantics and context. Towards tackling this problem, we design MITOS, a distributed-optimization algorithm, that: decides about the propagation of indirect flows by properly weighting all these tradeoffs, is of low-complexity, is scalable, is able to flexibly adapt to different application scenarios and security needs of large distributed systems. Additionally, MITOS is applicable to most DIFT systems that consider an arbitrary number of tag types, and introduces the key properties of fairness and tag-balancing to the DIFT field. To demonstrate MITOS's applicability in practice, we implement and evaluate MITOS on top of an open-source DIFT, and we shed light on the open problem. We also perform a case-study scenario with a real in-memory only attack and show that MITOS improves simultaneously (i) system's spatio-temporal overhead (up to 40%), and (ii) system's fingerprint on suspected bytes (up to 167\%) compared to traditional DIFT, even though these metrics usually conflict.« less