Towards Debiasing DNN Models from Spurious Feature Influence

Du, Mengnan; Tang, Ruixiang; Fu, Weijie; Hu, Xia

doi:10.1609/aaai.v36i9.21185

Citation Details

Towards Debiasing DNN Models from Spurious Feature Influence

Recent studies indicate that deep neural networks (DNNs) are prone to show discrimination towards certain demographic groups. We observe that algorithmic discrimination can be explained by the high reliance of the models on fairness sensitive features. Motivated by this observation, we propose to achieve fairness by suppressing the DNN models from capturing the spurious correlation between those fairness sensitive features with the underlying task. Specifically, we firstly train a bias-only teacher model which is explicitly encouraged to maximally employ fairness sensitive features for prediction. The teacher model then counter-teaches a debiased student model so that the interpretation of the student model is orthogonal to the interpretation of the teacher model. The key idea is that since the teacher model relies explicitly on fairness sensitive features for prediction, the orthogonal interpretation loss enforces the student network to reduce its reliance on sensitive features and instead capture more task relevant features for prediction. Experimental analysis indicates that our framework substantially reduces the model's attention on fairness sensitive features. Experimental results on four datasets further validate that our framework has consistently improved the fairness with respect to three group fairness metrics, with a comparable or even better accuracy. more »

Award ID(s):: 1939716

PAR ID:: 10397777

Author(s) / Creator(s):: Du, Mengnan; Tang, Ruixiang; Fu, Weijie; Hu, Xia

Date Published:: 2022-06-30

Journal Name:: Proceedings of the AAAI Conference on Artificial Intelligence

Volume:: 36

Issue:: 9

ISSN:: 2159-5399

Page Range / eLocation ID:: 9521 to 9528

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1609/aaai.v36i9.21185

More Like this