skip to main content


Title: Understanding User Sensemaking in Machine Learning Fairness Assessment Systems
A variety of systems have been proposed to assist users in detecting machine learning (ML) fairness issues. These systems approach bias reduction from a number of perspectives, including recommender systems, exploratory tools, and dashboards. In this paper, we seek to inform the design of these systems by examining how individuals make sense of fairness issues as they use different de-biasing affordances. In particular, we consider the tension between de-biasing recommendations which are quick but may lack nuance and ”what-if” style exploration which is time consuming but may lead to deeper understanding and transferable insights. Using logs, think-aloud data, and semi-structured interviews we find that exploratory systems promote a rich pattern of hypothesis generation and testing, while recommendations deliver quick answers which satisfy participants at the cost of reduced information exposure. We highlight design requirements and trade-offs in the design of ML fairness systems to promote accurate and explainable assessments.  more » « less
Award ID(s):
1850195
NSF-PAR ID:
10336361
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
WWW '21: Proceedings of the Web Conference 2021
Page Range / eLocation ID:
658 to 668
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The introduction of machine learning (ML) components in software projects has created the need for software engineers to collaborate with data scientists and other specialists. While collaboration can always be challenging, ML introduces additional challenges with its exploratory model development process, additional skills and knowledge needed, difficulties testing ML systems, need for continuous evolution and monitoring, and non-traditional quality requirements such as fairness and explainability. Through interviews with 45 practitioners from 28 organizations, we identified key collaboration challenges that teams face when building and deploying ML systems into production. We report on common collaboration points in the development of production ML systems for requirements, data, and integration, as well as corresponding team patterns and challenges. We find that most of these challenges center around communication, documentation, engineering, and process, and collect recommendations to address these challenges. 
    more » « less
  2. Data sets and statistics about groups of individuals are increasingly collected and released, feeding many optimization and learning algorithms. In many cases, the released data contain sensitive information whose privacy is strictly regulated. For example, in the U.S., the census data is regulated under Title 13, which requires that no individual be identified from any data released by the Census Bureau. In Europe, data release is regulated according to the General Data Protection Regulation, which addresses the control and transfer of personal data. Differential privacy has emerged as the de-facto standard to protect data privacy. In a nutshell, differentially private algorithms protect an individual’s data by injecting random noise into the output of a computation that involves such data. While this process ensures privacy, it also impacts the quality of data analysis, and, when private data sets are used as inputs to complex machine learning or optimization tasks, they may produce results that are fundamentally different from those obtained on the original data and even rise unintended bias and fairness concerns. In this talk, I will first focus on the challenge of releasing privacy-preserving data sets for complex data analysis tasks. I will introduce the notion of Constrained-based Differential Privacy (C-DP), which allows casting the data release problem to an optimization problem whose goal is to preserve the salient features of the original data. I will review several applications of C-DP in the context of very large hierarchical census data, data streams, energy systems, and in the design of federated data-sharing protocols. Next, I will discuss how errors induced by differential privacy algorithms may propagate within a decision problem causing biases and fairness issues. This is particularly important as privacy-preserving data is often used for critical decision processes, including the allocation of funds and benefits to states and jurisdictions, which ideally should be fair and unbiased. Finally, I will conclude with a roadmap to future work and some open questions. 
    more » « less
  3. Currently, there is a surge of interest in fair Artificial Intelligence (AI) and Machine Learning (ML) research which aims to mitigate discriminatory bias in AI algorithms, e.g., along lines of gender, age, and race. While most research in this domain focuses on developing fair AI algorithms, in this work, we examine the challenges which arise when humans and fair AI interact. Our results show that due to an apparent conflict between human preferences and fairness, a fair AI algorithm on its own may be insufficient to achieve its intended results in the real world. Using college major recommendation as a case study, we build a fair AI recommender by employing gender debiasing machine learning techniques. Our offline evaluation showed that the debiased recommender makes fairer career recommendations without sacrificing its accuracy in prediction. Nevertheless, an online user study of more than 200 college students revealed that participants on average prefer the original biased system over the debiased system. Specifically, we found that perceived gender disparity is a determining factor for the acceptance of a recommendation. In other words, we cannot fully address the gender bias issue in AI recommendations without addressing the gender bias in humans. We conducted a follow-up survey to gain additional insights into the effectiveness of various design options that can help participants to overcome their own biases. Our results suggest that making fair AI explainable is crucial for increasing its adoption in the real world. 
    more » « less
  4. AI plays an increasingly prominent role in society since decisions that were once made by humans are now delegated to automated systems. These systems are currently in charge of deciding bank loans, criminals’ incarceration, and the hiring of new employees, and it’s not difficult to envision that they will in the future underpin most of the decisions in society. Despite the high complexity entailed by this task, there is still not much understanding of basic properties of such systems. For instance, we currently cannot detect (neither explain nor correct) whether an AI system is operating fairly (i.e., is abiding by the decision-constraints agreed by society) or it is reinforcing biases and perpetuating a preceding prejudicial practice. Issues of discrimination have been discussed extensively in legal circles, but there exists still not much understanding of the formal conditions that an automated system must adhere to be deemed fair. In this paper, we use the language of structural causality (Pearl, 2000) to fill in this gap. We start by introducing three new fine-grained measures of transmission of change from stimulus to effect called counterfactual direct (Ctf-DE), indirect (Ctf-IE), and spurious (Ctf-SE) effects. Building on these measures, we derive the causal explanation formula, which allows the AI designer to quantitatively evaluate fairness and explain the total observed disparity of decisions through different discriminatory mechanisms. We apply these results to various discrimination analysis tasks and run extensive simulations, including detection, evaluation, and optimization of decision-making under fairness constraints. We conclude studying the trade-off between different types of fairness criteria (outcome and procedural), and provide a quantitative approach to policy implementation and the design of fair decision-making systems. 
    more » « less
  5. Personal cloud storage systems increasingly offer recommendations to help users retrieve or manage files of interest. For example, Google Drive's Quick Access predicts and surfaces files likely to be accessed. However, when multiple, related recommendations are made, interfaces typically present recommended files and any accompanying explanations individually, burdening users. To improve the usability of ML-driven personal information management systems, we propose a new method for summarizing related file-management recommendations. We generate succinct summaries of groups of related files being recommended. Summaries reference the files' shared characteristics. Through a within-subjects online study in which participants received recommendations for groups of files in their own Google Drive, we compare our summaries to baselines like visualizing a decision tree model or simply listing the files in a group. Compared to the baselines, participants expressed greater understanding and confidence in accepting recommendations when shown our novel recommendation summaries. 
    more » « less