skip to main content


This content will become publicly available on March 1, 2025

Title: Survey on Machine Learning Biases and Mitigation Techniques

Machine learning (ML) has become increasingly prevalent in various domains. However, ML algorithms sometimes give unfair outcomes and discrimination against certain groups. Thereby, bias occurs when our results produce a decision that is systematically incorrect. At various phases of the ML pipeline, such as data collection, pre-processing, model selection, and evaluation, these biases appear. Bias reduction methods for ML have been suggested using a variety of techniques. By changing the data or the model itself, adding more fairness constraints, or both, these methods try to lessen bias. The best technique relies on the particular context and application because each technique has advantages and disadvantages. Therefore, in this paper, we present a comprehensive survey of bias mitigation techniques in machine learning (ML) with a focus on in-depth exploration of methods, including adversarial training. We examine the diverse types of bias that can afflict ML systems, elucidate current research trends, and address future challenges. Our discussion encompasses a detailed analysis of pre-processing, in-processing, and post-processing methods, including their respective pros and cons. Moreover, we go beyond qualitative assessments by quantifying the strategies for bias reduction and providing empirical evidence and performance metrics. This paper serves as an invaluable resource for researchers, practitioners, and policymakers seeking to navigate the intricate landscape of bias in ML, offering both a profound understanding of the issue and actionable insights for responsible and effective bias mitigation.

 
more » « less
Award ID(s):
2306109
PAR ID:
10501414
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
MDPI
Date Published:
Journal Name:
Digital
Volume:
4
Issue:
1
ISSN:
2673-6470
Page Range / eLocation ID:
1 to 68
Subject(s) / Keyword(s):
machine learning bias mitigation techniques fairness constraints pre-processing in-processing post-processing
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper presents an algorithm for restoring AC power flow feasibility from solutions to simplified optimal power flow (OPF) problems, including convex relaxations, power flow approximations, and machine learning (ML) models. The proposed algorithm employs a state estimation-based post-processing technique in which voltage phasors, power injections, and line flows from solutions to relaxed, approximated, or ML-based OPF problems are treated similarly to noisy measurements in a state estimation algorithm. The algorithm leverages information from various quantities to obtain feasible voltage phasors and power injections that satisfy the AC power flow equations. Weight and bias parameters are computed offline using an adaptive stochastic gradient descent method. By automatically learning the trustworthiness of various outputs from simplified OPF problems, these parameters inform the online computations of the state estimation-based algorithm to both recover feasible solutions and characterize the performance of power flow approximations, relaxations, and ML models. Furthermore, the proposed algorithm can simultaneously utilize combined solutions from different relaxations, approximations, and ML models to enhance performance. Case studies demonstrate the effectiveness and scalability of the proposed algorithm, with solutions that are both AC power flow feasible and much closer to the true AC OPF solutions than alternative methods, often by several orders of magnitude in the squared two-norm loss function. 
    more » « less
  2. Machine learning (ML) methods already permeate environmental decision-making, from processing high-dimensional data on earth systems to monitoring compliance with environmental regulations. Of the ML techniques available to address pressing environmental problems (e.g., climate change, biodiversity loss), Reinforcement Learning (RL) may both hold the greatest promise and present the most pressing perils. This paper explores how RL-driven policy refracts existing power relations in the environmental domain while also creating unique challenges to ensuring equitable and accountable environmental decision processes. We leverage examples from RL applications to climate change mitigation and fisheries management to explore how RL technologies shift the distribution of power between resource users, governing bodies, and private industry. 
    more » « less
  3. Discrimination-aware classification methods remedy socioeconomic disparities exacerbated by machine learning systems. In this paper, we propose a novel data pre-processing technique that assigns weights to training instances in order to reduce discrimination without changing any of the inputs or labels. While the existing reweighing approach only looks into sensitive attributes, we refine the weights by utilizing both sensitive and insensitive ones. We formulate our weight assignment as a linear programming problem. The weights can be directly used in any classification model into which they are incorporated. We demonstrate three advantages of our approach on synthetic and benchmark datasets. First, discrimination reduction comes at a small cost in accuracy. Second, our method is more scalable than most other pre-processing methods. Third, the trade-off between fairness and accuracy can be explicitly monitored by model users. Code is available athttps://github.com/frnliang/refined_reweighing.

     
    more » « less
  4. This paper synthesizes multiple methods for machine learning (ML) model interpretation and visualization (MIV) focusing on meteorological applications. ML has recently exploded in popularity in many fields, including meteorology. Although ML has been successful in meteorology, it has not been as widely accepted, primarily due to the perception that ML models are “black boxes,” meaning the ML methods are thought to take inputs and provide outputs but not to yield physically interpretable information to the user. This paper introduces and demonstrates multiple MIV techniques for both traditional ML and deep learning, to enable meteorologists to understand what ML models have learned. We discuss permutation-based predictor importance, forward and backward selection, saliency maps, class-activation maps, backward optimization, and novelty detection. We apply these methods at multiple spatiotemporal scales to tornado, hail, winter precipitation type, and convective-storm mode. By analyzing such a wide variety of applications, we intend for this work to demystify the black box of ML, offer insight in applying MIV techniques, and serve as a MIV toolbox for meteorologists and other physical scientists. 
    more » « less
  5. Given significant concerns about fairness and bias in the use of artificial intelligence (AI) and machine learning (ML) for psychological assessment, we provide a conceptual framework for investigating and mitigating machine-learning measurement bias (MLMB) from a psychometric perspective. MLMB is defined as differential functioning of the trained ML model between subgroups. MLMB manifests empirically when a trained ML model produces different predicted score levels for different subgroups (e.g., race, gender) despite them having the same ground-truth levels for the underlying construct of interest (e.g., personality) and/or when the model yields differential predictive accuracies across the subgroups. Because the development of ML models involves both data and algorithms, both biased data and algorithm-training bias are potential sources of MLMB. Data bias can occur in the form of nonequivalence between subgroups in the ground truth, platform-based construct, behavioral expression, and/or feature computing. Algorithm-training bias can occur when algorithms are developed with nonequivalence in the relation between extracted features and ground truth (i.e., algorithm features are differentially used, weighted, or transformed between subgroups). We explain how these potential sources of bias may manifest during ML model development and share initial ideas for mitigating them, including recognizing that new statistical and algorithmic procedures need to be developed. We also discuss how this framework clarifies MLMB but does not reduce the complexity of the issue. 
    more » « less