Analyzing the impact of missing values and selection bias on fairness

Wang, Yanchen; Singh, Lisa

doi:10.1007/s41060-021-00259-z

Abstract Algorithmic decision making is becoming more prevalent, increasingly impacting people’s daily lives. Recently, discussions have been emerging about the fairness of decisions made by machines. Researchers have proposed different approaches for improving the fairness of these algorithms. While these approaches can help machines make fairer decisions, they have been developed and validated on fairly clean data sets. Unfortunately, most real-world data have complexities that make them more dirty . This work considers two of these complexities by analyzing the impact of two real-world data issues on fairness—missing values and selection bias—for categorical data. After formulating this problem and showing its existence, we propose fixing algorithms for data sets containing missing values and/or selection bias that use different forms of reweighting and resampling based upon the missing value generation process. We conduct an extensive empirical evaluation on both real-world and synthetic data using various fairness metrics, and demonstrate how different missing values generated from different mechanisms and selection bias impact prediction fairness, even when prediction accuracy remains fairly constant.

More Like this