skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: AI fairness in practice: Paradigm, challenges, and prospects
Abstract Understanding and correcting algorithmic bias in artificial intelligence (AI) has become increasingly important, leading to a surge in research on AI fairness within both the AI community and broader society. Traditionally, this research operates within the constrained supervised learning paradigm, assuming the presence of class labels, independent and identically distributed (IID) data, and batch‐based learning necessitating the simultaneous availability of all training data. However, in practice, class labels may be absent due to censoring, data is often represented using non‐IID graph structures that capture connections among individual units, and data can arrive and evolve over time. These prevalent real‐world data representations limit the applicability of existing fairness literature, which typically addresses fairness in static and tabular supervised learning settings. This paper reviews recent advances in AI fairness aimed at bridging these gaps for practical deployment in real‐world scenarios. Additionally, opportunities are envisioned by highlighting the limitations and significant potential for real applications.  more » « less
Award ID(s):
2404039
PAR ID:
10543672
Author(s) / Creator(s):
 
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
AI Magazine
Volume:
45
Issue:
3
ISSN:
0738-4602
Format(s):
Medium: X Size: p. 386-395
Size(s):
p. 386-395
Sponsoring Org:
National Science Foundation
More Like this
  1. The widespread use of Artificial Intelligence (AI) based decision-making systems has raised a lot of concerns regarding potential discrimination, particularly in domains with high societal impact. Most existing fairness research focused on tackling bias relies heavily on the presence of class labels, an assumption that often mismatches real-world scenarios, which ignores the ubiquity of censored data. Further, existing works regard group fairness and individual fairness as two disparate goals, overlooking their inherent interconnection, i.e., addressing one can degrade the other. This paper proposes a novel unified method that aims to mitigate group unfairness under censorship while curbing the amplification of individual unfairness when enforcing group fairness constraints. Specifically, our introduced ranking algorithm optimizes individual fairness within the bounds of group fairness, uniquely accounting for censored information. Evaluation across four benchmark tasks confirms the effectiveness of our method in quantifying and mitigating both fairness dimensions in the face of censored data. 
    more » « less
  2. There has been increasing concern within the machine learning community and beyond that Artificial Intelligence (AI) faces a bias and discrimination crisis which needs AI fairness with urgency. As many have begun to work on this problem, most existing work depends on the availability of class label for the given fairness definition and algorithm which may not align with real-world usage. In this work, we study an AI fairness problem that stems from the gap between the design of a fair model in the lab and its deployment in the real-world. Specifically, we consider defining and mitigating individual unfairness amidst censorship, where the availability of class label is not always guaranteed due to censorship, which is broadly applicable in a diversity of real-world socially sensitive applications. We show that our method is able to quantify and mitigate individual unfairness in the presence of censorship across three benchmark tasks, which provides the first known results on individual fairness guarantee in analysis of censored data. 
    more » « less
  3. In recent years, deep learning has achieved tremendous success in image segmentation for computer vision applications. The performance of these models heavily relies on the availability of large-scale high-quality training labels (e.g., PASCAL VOC 2012). Unfortunately, such large-scale high-quality training data are often unavailable in many real-world spatial or spatiotemporal problems in earth science and remote sensing (e.g., mapping the nationwide river streams for water resource management). Although extensive efforts have been made to reduce the reliance on labeled data (e.g., semi-supervised or unsupervised learning, few-shot learning), the complex nature of geographic data such as spatial heterogeneity still requires sufficient training labels when transferring a pre-trained model from one region to another. On the other hand, it is often much easier to collect lower-quality training labels with imperfect alignment with earth imagery pixels (e.g., through interpreting coarse imagery by non-expert volunteers). However, directly training a deep neural network on imperfect labels with geometric annotation errors could significantly impact model performance. Existing research that overcomes imperfect training labels either focuses on errors in label class semantics or characterizes label location errors at the pixel level. These methods do not fully incorporate the geometric properties of label location errors in the vector representation. To fill the gap, this article proposes a weakly supervised learning framework to simultaneously update deep learning model parameters and infer hidden true vector label locations. Specifically, we model label location errors in the vector representation to partially reserve geometric properties (e.g., spatial contiguity within line segments). Evaluations on real-world datasets in the National Hydrography Dataset (NHD) refinement application illustrate that the proposed framework outperforms baseline methods in classification accuracy. 
    more » « less
  4. Many AI platforms, including traffic monitoring systems, use Federated Learning (FL) for decentralized sensor data processing for learning-based applications while preserving privacy and ensuring secured information transfer. On the other hand, applying supervised learning to large data samples, like high-resolution images requires intensive human labor to label different parts of a data sample. Multiple Instance Learning (MIL) alleviates this challenge by operating over labels assigned to the ’bag’ of instances. In this paper, we introduce Federated Multiple-Instance Learning (FedMIL). This framework applies federated learning to boost the training performance in video-based MIL tasks such as vehicle accident detection using distributed CCTV networks. However, data sources in decentralized settings are not typically Independently and Identically Distributed (IID), making client selection imperative to collectively represent the entire dataset with minimal clients. To address this challenge, we propose DPPQ, a framework based on the Determinantal Point Process (DPP) with a quality-based kernel to select clients with the most diverse datasets that achieve better performance compared to both random selection and current DPP-based client selection methods even with less data utilization in the majority of non-IID cases. This offers a significant advantage for deployment on edge devices with limited computational resources, providing a reliable solution for training AI models in massive smart sensor networks. 
    more » « less
  5. Abstract Text classification is a widely studied problem and has broad applications. In many real-world problems, the number of texts for training classification models is limited, which renders these models prone to overfitting. To address this problem, we propose SSL-Reg, a data-dependent regularization approach based on self-supervised learning (SSL). SSL (Devlin et al., 2019a) is an unsupervised learning approach that defines auxiliary tasks on input data without using any human-provided labels and learns data representations by solving these auxiliary tasks. In SSL-Reg, a supervised classification task and an unsupervised SSL task are performed simultaneously. The SSL task is unsupervised, which is defined purely on input texts without using any human- provided labels. Training a model using an SSL task can prevent the model from being overfitted to a limited number of class labels in the classification task. Experiments on 17 text classification datasets demonstrate the effectiveness of our proposed method. Code is available at https://github.com/UCSD-AI4H/SSReg. 
    more » « less