skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Fair and Robust Classification Under Sample Selection Bias
To address the sample selection bias between the training and test data, previous research works focus on reweighing biased training data to match the test data and then building classification models on there weighed raining data. However, how to achieve fairness in the built classification models is under-explored. In this paper, we propose a framework for robust and fair learning under sample selection bias. Our framework adopts there weighing estimation approach for bias correction and the minimax robust estimation approach for achieving robustness on prediction accuracy. Moreover, during the minimax optimization, the fairness is achieved under the worst case, which guarantees the model’s fairness on test data. We further develop two algorithms to handle sample selection bias when test data is both available and unavailable.  more » « less
Award ID(s):
1946391 1920920 1940093 2137335
PAR ID:
10321734
Author(s) / Creator(s):
;
Date Published:
Journal Name:
30th ACM International Conference on Information & Knowledge Management
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. NA (Ed.)
    Facial attribute classification algorithms frequently manifest demographic biases by obtaining differential performance across gender and racial groups. Existing bias mitigation techniques are mostly in-processing techniques, i.e., implemented during the classifier’s training stage, that often lack generalizability, require demographically annotated training sets, and exhibit a trade-off between fairness and classification accuracy. In this paper, we propose a technique to mitigate bias at the test time i.e., during the deployment stage, by harnessing prediction uncertainty and human–machine partnership. To this front, we propose to utilize those lowest percentages of test data samples identified as outliers with high prediction uncertainty. These identified uncertain samples at test-time are labeled by human analysts for decision rendering and for subsequently retraining the deep neural network in a continual learning framework. With minimal human involvement and through iterative refinement of the network with human guidance at test-time, we seek to enhance the accuracy as well as the fairness of the already deployed facial attribute classification algorithms. Extensive experiments are conducted on gender and smile attribute classification tasks using four publicly available datasets and with gender and race as the protected attributes. The obtained outcomes consistently demonstrate improved accuracy by up to 2% and 5% for the gender and smile attribute classification tasks, respectively, using our proposed approaches. Further, the demographic bias was significantly reduced, outperforming the State-of-the-Art (SOTA) bias mitigation and baseline techniques by up to 55% for both classification tasks. 
    more » « less
  2. While conventional ranking systems focus solely on maximizing the utility of the ranked items to users, fairness-aware ranking systems additionally try to balance the exposure based on different protected attributes such as gender or race. To achieve this type of group fairness for ranking, we derive a new ranking system from the first principles of distributional robustness. We formulate a minimax game between a player choosing a distribution over rankings to maximize utility while satisfying fairness constraints against an adversary seeking to minimize utility while matching statistics of the training data. Rather than maximizing utility and fairness for the specific training data, this approach efficiently produces robust utility and fairness for a much broader family of distributions of rankings that include the training data. We show that our approach provides better utility for highly fair rankings than existing baseline methods. 
    more » « less
  3. While conventional ranking systems focus solely on maximizing the utility of the ranked items to users, fairness-aware ranking systems additionally try to balance the exposure based on different protected attributes such as gender or race. To achieve this type of group fairness for ranking, we derive a new ranking system from the first principles of distributional robustness. We formulate a minimax game between a player choosing a distribution over rankings to maximize utility while satisfying fairness constraints against an adversary seeking to minimize utility while matching statistics of the training data. Rather than maximizing utility and fairness for the specific training data, this approach efficiently produces robust utility and fairness for a much broader family of distributions of rankings that include the training data. We show that our approach provides better utility for highly fair rankings than existing baseline methods. 
    more » « less
  4. There has been significant progress in improving the performance of graph neural networks (GNNs) through enhancements in graph data, model architecture design, and training strategies. For fairness in graphs, recent studies achieve fair representations and predictions through either graph data pre-processing (e.g., node feature masking, and topology rewiring) or fair training strategies (e.g., regularization, adversarial debiasing, and fair contrastive learning). How to achieve fairness in graphs from the model architecture perspective is less explored. More importantly, GNNs exhibit worse fairness performance compared to multilayer perception since their model architecture (i.e., neighbor aggregation) amplifies biases. To this end, we aim to achieve fairness via a new GNN architecture. We propose Fair Message Passing (FMP) designed within a unified optimization framework for GNNs. Notably, FMP explicitly renders sensitive attribute usage in forward propagation for node classification task using cross-entropy loss without data pre-processing. In FMP, the aggregation is first adopted to utilize neighbors' information and then the bias mitigation step explicitly pushes demographic group node presentation centers together.In this way, FMP scheme can aggregate useful information from neighbors and mitigate bias to achieve better fairness and prediction tradeoff performance. Experiments on node classification tasks demonstrate that the proposed FMP outperforms several baselines in terms of fairness and accuracy on three real-world datasets. The code is available at https://github.com/zhimengj0326/FMP. 
    more » « less
  5. Abstract We address the problem of adaptive minimax density estimation on $$\mathbb{R}^{d}$$ with $$L_{p}$$ loss functions under Huber’s contamination model. To investigate the contamination effect on the optimal estimation of the density, we first establish the minimax rate with the assumption that the density is in an anisotropic Nikol’skii class. We then develop a data-driven bandwidth selection procedure for kernel estimators, which can be viewed as a robust generalization of the Goldenshluger-Lepski method. We show that the proposed bandwidth selection rule can lead to the estimator being minimax adaptive to either the smoothness parameter or the contamination proportion. When both of them are unknown, we prove that finding any minimax-rate adaptive method is impossible. Extensions to smooth contamination cases are also discussed. 
    more » « less