skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: JupyterLab in Retrograde: Contextual Notifications That Highlight Fairness and Bias Issues for Data Scientists
Current algorithmic fairness tools focus on auditing completed models, neglecting the potential downstream impacts of iterative decisions about cleaning data and training machine learning models. In response, we developed Retrograde, a JupyterLab environment extension for Python that generates real-time, contextual notifications for data scientists about decisions they are making regarding protected classes, proxy variables, missing data, and demographic differences in model performance. Our novel framework uses automated code analysis to trace data provenance in JupyterLab, enabling these notifications. In a between-subjects online experiment, 51 data scientists constructed loan-decision models with Retrograde providing notifications continuously throughout the process, only at the end, or never. Retrograde’s notifications successfully nudged participants to account for missing data, avoid using protected classes as predictors, minimize demographic differences in model performance, and exhibit healthy skepticism about their models.  more » « less
Award ID(s):
2047827 1939728
PAR ID:
10493968
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Proceedings of the CHI Conference on Human Factors in Computing Systems
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Structured demographic models are among the most common and useful tools in population biology. However, the introduction of integral projection models (IPMs) has caused a profound shift in the way many demographic models are conceptualized. Some researchers have argued that IPMs, by explicitly representing demographic processes as continuous functions of state variables such as size, are more statistically efficient, biologically realistic, and accurate than classic matrix projection models, calling into question the usefulness of the many studies based on matrix models. Here, we evaluate how IPMs and matrix models differ, as well as the extent to which these differences matter for estimation of key model outputs, including population growth rates, sensitivity patterns, and life spans. First, we detail the steps in constructing and using each type of model. Second, we present a review of published demographic models, concentrating on size‐based studies, which shows significant overlap in the way IPMs and matrix models are constructed and analyzed. Third, to assess the impact of various modeling decisions on demographic predictions, we ran a series of simulations based on size‐based demographic data sets for five biologically diverse species. We found little evidence that discrete vital rate estimation is less accurate than continuous functions across a wide range of sample sizes or size classes (equivalently bin numbers or mesh points). Most model outputs quickly converged with modest class numbers (≥10), regardless of most other modeling decisions. Another surprising result was that the most commonly used method to discretize growth rates for IPM analyses can introduce substantial error into model outputs. Finally, we show that empirical sample sizes generally matter more than modeling approach for the accuracy of demographic outputs. Based on these results, we provide specific recommendations to those constructing and evaluating structured population models. Both our literature review and simulations question the treatment of IPMs as a clearly distinct modeling approach or one that is inherently more accurate than classic matrix models. Importantly, this suggests that matrix models, representing the vast majority of past demographic analyses available for comparative and conservation work, continue to be useful and important sources of demographic information. 
    more » « less
  2. IntroductionMachine learning (ML) algorithms have been heralded as promising solutions to the realization of assistive systems in digital healthcare, due to their ability to detect fine-grain patterns that are not easily perceived by humans. Yet, ML algorithms have also been critiqued for treating individuals differently based on their demography, thus propagating existing disparities. This paper explores gender and race bias in speech-based ML algorithms that detect behavioral and mental health outcomes. MethodsThis paper examines potential sources of bias in the data used to train the ML, encompassing acoustic features extracted from speech signals and associated labels, as well as in the ML decisions. The paper further examines approaches to reduce existing bias via using the features that are the least informative of one’s demographic information as the ML input, and transforming the feature space in an adversarial manner to diminish the evidence of the demographic information while retaining information about the focal behavioral and mental health state. ResultsResults are presented in two domains, the first pertaining to gender and race bias when estimating levels of anxiety, and the second pertaining to gender bias in detecting depression. Findings indicate the presence of statistically significant differences in both acoustic features and labels among demographic groups, as well as differential ML performance among groups. The statistically significant differences present in the label space are partially preserved in the ML decisions. Although variations in ML performance across demographic groups were noted, results are mixed regarding the models’ ability to accurately estimate healthcare outcomes for the sensitive groups. DiscussionThese findings underscore the necessity for careful and thoughtful design in developing ML models that are capable of maintaining crucial aspects of the data and perform effectively across all populations in digital healthcare applications. 
    more » « less
  3. The database community has largely focused on providing improved transaction management and query capabilities over records (and generalizations thereof). Yet such capabilities address only a small part of today’s data science tasks, which are often much more focused on discovery, linking, comparative analysis, and collaboration across holistic datasets and data products. Data scientists frequently point to a strong need for data management — with respect to their many datasets and data products. We propose the development of the dataset relationship management system to support five main classes of operations on datasets: reuse of schema, data, curation, and work across many datasets; revelation of provenance, context, and assumptions; rapid revision of data and processing steps; system-assisted retargeting of computation to alternative execution environments; and metrics to reward individuals’ contributions to the broader data ecosystem. We argue that the recent adoption of computational notebooks (particularly JupyterLab and Jupyter Notebook), as a unified interface over data tools, provides an ideal way of gathering detailed information about how data is being used, i.e., of transparently capturing dataset provenance and relationships, and thus such notebooks provide an attractive mechanism for integrating dataset relationship management into the data science ecosystem. We briefly outline our experiences in building towards JuNEAU, the first prototype DRMS. 
    more » « less
  4. A common challenge in developmental research is the amount of incomplete and missing data that occurs from respondents failing to complete tasks or questionnaires, as well as from disengaging from the study (i.e., attrition). This missingness can lead to biases in parameter estimates and, hence, in the interpretation of findings. These biases can be addressed through statistical techniques that adjust for missing data, such as multiple imputation. Although multiple imputation is highly effective, it has not been widely adopted by developmental scientists given barriers such as lack of training or misconceptions about imputation methods. Utilizing default methods within statistical software programs like listwise deletion is common but may introduce additional bias. This manuscript is intended to provide practical guidelines for developmental researchers to follow when examining their data for missingness, making decisions about how to handle that missingness and reporting the extent of missing data biases and specific multiple imputation procedures in publications. 
    more » « less
  5. A machine learning model may exhibit discrimination when used to make decisions involving people. One potential cause for such outcomes is that the model uses a statistical proxy for a protected demographic attribute. In this paper we formulate a definition of proxy use for the setting of linear regression and present algorithms for detecting proxies. Our definition follows recent work on proxies in classification models, and characterizes a model's constituent behavior that: 1) correlates closely with a protected random variable, and 2) is causally influential in the overall behavior of the model. We show that proxies in linear regression models can be efficiently identified by solving a second-order cone program, and further extend this result to account for situations where the use of a certain input variable is justified as a business necessity''. Finally, we present empirical results on two law enforcement datasets that exhibit varying degrees of racial disparity in prediction outcomes, demonstrating that proxies shed useful light on the causes of discriminatory behavior in models. 
    more » « less