Leveraging Simulation Data to Understand Bias in Predictive Models of Infectious Disease Spread

Züfle, Andreas; Salim, Flora; Anderson, Taylor; Scotch, Matthew; Xiong, Li; Sokol, Kacper; Xue, Hao; Kong, Ruochen; Heslop, David; Paik, Hye-Young; MacIntyre, C Raina

doi:10.1145/3660631

Citation Details

Leveraging Simulation Data to Understand Bias in Predictive Models of Infectious Disease Spread

The spread of infectious diseases is a highly complex spatiotemporal process, difficult to understand, predict, and effectively respond to. Machine learning and artificial intelligence (AI) have achieved impressive results in other learning and prediction tasks; however, while many AI solutions are developed for disease prediction, only a few of them are adopted by decision-makers to support policy interventions. Among several issues preventing their uptake, AI methods are known to amplify the bias in the data they are trained on. This is especially problematic for infectious disease models that typically leverage large, open, and inherently biased spatiotemporal data. These biases may propagate through the modeling pipeline to decision-making, resulting in inequitable policy interventions. Therefore, there is a need to gain an understanding of how the AI disease modeling pipeline can mitigate biased input data, in-processing models, and biased outputs. Specifically, our vision is to develop a large-scale micro-simulation of individuals from which human mobility, population, and disease ground-truth data can be obtained. From this complete dataset—which may not reflect the real world—we can sample and inject different types of bias. By using the sampled data in which bias is known (as it is given as the simulation parameter), we can explore how existing solutions for fairness in AI can mitigate and correct these biases and investigate novel AI fairness solutions. Achieving this vision would result in improved trust in such models for informing fair and equitable policy interventions. more »