skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 1939728

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Current algorithmic fairness tools focus on auditing completed models, neglecting the potential downstream impacts of iterative decisions about cleaning data and training machine learning models. In response, we developed Retrograde, a JupyterLab environment extension for Python that generates real-time, contextual notifications for data scientists about decisions they are making regarding protected classes, proxy variables, missing data, and demographic differences in model performance. Our novel framework uses automated code analysis to trace data provenance in JupyterLab, enabling these notifications. In a between-subjects online experiment, 51 data scientists constructed loan-decision models with Retrograde providing notifications continuously throughout the process, only at the end, or never. Retrograde’s notifications successfully nudged participants to account for missing data, avoid using protected classes as predictors, minimize demographic differences in model performance, and exhibit healthy skepticism about their models. 
    more » « less
  2. With the increasing prevalence of automatic decision-making systems, concerns regarding the fairness of these systems also arise. Without a universally agreed-upon definition of fairness, given an automated decision-making scenario, researchers often adopt a crowdsourced approach to solicit people’s preferences across multiple fairness definitions. However, it is often found that crowdsourced fairness preferences are highly context-dependent, making it intriguing to explore the driving factors behind these preferences. One plausible hypothesis is that people’s fairness preferences reflect their perceived risk levels for different decision-making mistakes, such that the fairness definition that equalizes across groups the type of mistakes that are perceived as most serious will be preferred. To test this conjecture, we conduct a human-subject study (𝑁 =213) to study people’s fairness perceptions in three societal contexts. In particular, these three societal contexts differ on the expected level of risk associated with different types of decision mistakes, and we elicit both people’s fairness preferences and risk perceptions for each context. Our results show that people can often distinguish between different levels of decision risks across different societal contexts. However, we find that people’s fairness preferences do not vary significantly across the three selected societal contexts, except for within a certain subgroup of people (e.g., people with a certain racial background). As such, we observe minimal evidence suggesting that people’s risk perceptions of decision mistakes correlate with their fairness preference. These results highlight that fairness preferences are highly subjective and nuanced, and they might be primarily affected by factors other than the perceived risks of decision mistakes. 
    more » « less
  3. Biased AI models result in unfair decisions. In response, a number of algorithmic solutions have been engineered to mitigate bias, among which the Synthetic Minority Oversampling Technique (SMOTE) has been studied, to an extent. Although the SMOTE technique and its variants have great potentials to help improve fairness, there is little theoretical justification for its success. In addition, formal error and fairness bounds are not clearly given. This paper attempts to address both issues. We prove and demonstrate that synthetic data generated by oversampling underrepresented groups can mitigate algorithmic bias in AI models, while keeping the predictive errors bounded. We further compare this technique to the existing state-of-the-art fair AI techniques on five datasets using a variety of fairness metrics. We show that this approach can effectively improve fairness even when there is a significant amount of label and selection bias, regardless of the baseline AI algorithm. 
    more » « less
  4. Conventional wisdom holds that discrimination in machine learning is a result of historical discrimination: biased training data leads to biased models. We show that the reality is more nuanced; machine learning can be expected to induce types of bias not found in the training data. In particular, if different groups have different optimal models, and the optimal model for one group has higher accuracy, the optimal accuracy joint model will induce disparate impact even when the training data does not display disparate impact. We argue that due to systemic bias, this is a likely situation, and simply ensuring training data appears unbiased is insufficient to ensure fair machine learning. 
    more » « less
  5. This survey article assesses and compares existing critiques of current fairness-enhancing technical interventions in machine learning (ML) that draw from a range of non-computing disciplines, including philosophy, feminist studies, critical race and ethnic studies, legal studies, anthropology, and science and technology studies. It bridges epistemic divides in order to offer an interdisciplinary understanding of the possibilities and limits of hegemonic computational approaches to ML fairness for producing just outcomes for society’s most marginalized. The article is organized according to nine major themes of critique wherein these different fields intersect: 1) how "fairness" in AI fairness research gets defined; 2) how problems for AI systems to address get formulated; 3) the impacts of abstraction on how AI tools function and its propensity to lead to technological solutionism; 4) how racial classification operates within AI fairness research; 5) the use of AI fairness measures to avoid regulation and engage in ethics washing; 6) an absence of participatory design and democratic deliberation in AI fairness considerations; 7) data collection practices that entrench “bias,” are non-consensual, and lack transparency; 8) the predatory inclusion of marginalized groups into AI systems; and 9) a lack of engagement with AI’s long-term social and ethical outcomes. Drawing from these critiques, the article concludes by imagining future ML fairness research directions that actively disrupt entrenched power dynamics and structural injustices in society. 
    more » « less
  6. null (Ed.)
    Data brokers and advertisers increasingly collect data in one context and use it in another. When users encounter a misuse of their data, do they subsequently disclose less information? We report on human-subjects experiments with 25 in-person and 280 online participants. First, participants provided personal information amidst distractor questions. A week later, while participants completed another survey, they received either a robotext or online banner ad seemingly unrelated to the study. Half of the participants received an ad containing their name, partner's name, preferred cuisine, and location; others received a generic ad. We measured how many of 43 potentially invasive questions participants subsequently chose to answer. Participants reacted negatively to the personalized ad, yet answered nearly all invasive questions accurately. We unpack our results relative to the privacy paradox, contextual integrity, and power dynamics in crowdworker platforms. 
    more » « less
  7. null (Ed.)
    There are many competing definitions of what statistical properties make a machine learning model fair. Unfortunately, research has shown that some key properties are mutually exclusive. Realistic models are thus necessarily imperfect, choosing one side of a trade-off or the other. To gauge perceptions of the fairness of such realistic, imperfect models, we conducted a between-subjects experiment with 502 Mechanical Turk workers. Each participant compared two models for deciding whether to grant bail to criminal defendants. The first model equalized one potentially desirable model property, with the other property varying across racial groups. The second model did the opposite. We tested pairwise trade-offs between the following four properties: accuracy; false positive rate; outcomes; and the consideration of race. We also varied which racial group the model disadvantaged. We observed a preference among participants for equalizing the false positive rate between groups over equalizing accuracy. Nonetheless, no preferences were overwhelming, and both sides of each trade-off we tested were strongly preferred by a non-trivial fraction of participants. We observed nuanced distinctions between participants considering a model "unbiased" and considering it "fair." Furthermore, even when a model within a trade-off pair was seen as fair and unbiased by a majority of participants, we did not observe consensus that a machine learning model was preferable to a human judge. Our results highlight challenges for building machine learning models that are perceived as fair and broadly acceptable in realistic situations. 
    more » « less