The spread of infectious diseases is a highly complex spatiotemporal process, difficult to understand, predict, and effectively respond to. Machine learning and artificial intelligence (AI) have achieved impressive results in other learning and prediction tasks; however, while many AI solutions are developed for disease prediction, only a few of them are adopted by decision-makers to support policy interventions. Among several issues preventing their uptake, AI methods are known to amplify the bias in the data they are trained on. This is especially problematic for infectious disease models that typically leverage large, open, and inherently biased spatiotemporal data. These biases may propagate through the modeling pipeline to decision-making, resulting in inequitable policy interventions. Therefore, there is a need to gain an understanding of how the AI disease modeling pipeline can mitigate biased input data, in-processing models, and biased outputs. Specifically, our vision is to develop a large-scale micro-simulation of individuals from which human mobility, population, and disease ground-truth data can be obtained. From this complete dataset—which may not reflect the real world—we can sample and inject different types of bias. By using the sampled data in which bias is known (as it is given as the simulation parameter), we can explore how existing solutions for fairness in AI can mitigate and correct these biases and investigate novel AI fairness solutions. Achieving this vision would result in improved trust in such models for informing fair and equitable policy interventions. 
                        more » 
                        « less   
                    
                            
                            An Infectious Disease Spread Simulation to Control Data Bias
                        
                    
    
            The increased availability of datasets during the COVID-19 pandemic enabled machine-learning approaches for modeling and forecasting infectious diseases. However, such approaches are known to amplify the bias in the data they are trained on. Bias in such input data like clinical case data for COVID-19 is difficult to measure due to disparities in testing availability, reporting standards, and healthcare access among different populations and regions. Furthermore, the way such biases may propagate through the modeling pipeline to decision-making is relatively unknown. Therefore, we present a system that leverages a highly detailed agent-based model (ABM) of infectious disease spread in a city to simulate the collection of biased clinical case data where the bias is known. Our system allows users to load either a pre-selected region or select their own (using OpenStreetMap data for the environment and census data for the population), specify population and infectious disease parameters, and the degree(s) to which different populations will be overrepresented or underrepresented in the case data. In addition to the system, we provide a large number of benchmark datasets that produce case data at different levels of bias for different regions. Wehope that infectious disease modelers will use these datasets to investigate how well their models are robust to data bias or whether their model is overfit to biased data. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10578757
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400711077
- Page Range / eLocation ID:
- 681 to 684
- Subject(s) / Keyword(s):
- Data Simulation, Infectious Disease Data, Data Bias, Bias Simulation
- Format(s):
- Medium: X
- Location:
- Atlanta GA USA
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Current modeling practices for environmental and sociological modulated infectious diseases remain inadequate to forecast the risk of outbreak(s) in human populations, partly due to a lack of integration of disciplinary knowledge, limited availability of disease surveillance datasets, and overreliance on compartmental epidemiological modeling methods. Harvesting data knowledge from virus transmission (aerosols) and detection (wastewater) of SARS-CoV-2, a heuristic score-based environmental predictive intelligence system was developed that calculates the risk of COVID-19 in the human population. Seasonal validation of the algorithm was uniquely associated with wastewater surveillance of the virus, providing a lead time of 7–14 days before a county-level outbreak. Using county-scale disease prevalence data from the United States, the algorithm could predict COVID-19 risk with an overall accuracy ranging between 81% and 98%. Similarly, using wastewater surveillance data from Illinois and Maryland, the SARS-CoV-2 detection rate was greater than 80% for 75% of the locations during the same time the risk was predicted to be high. Results suggest the importance of a holistic approach across disciplinary boundaries that can potentially allow anticipatory decision-making policies of saving lives and maximizing the use of available capacity and resources.more » « less
- 
            The COVID-19 pandemic has mainstreamed human mobility data into the public domain, with research focused on understanding the impact of mobility reduction policies as well as on regional COVID-19 case prediction models. Nevertheless, current research on COVID-19 case prediction tends to focus on performance improvements, masking relevant insights about when mobility data does not help, and more importantly, why, so that it can adequately inform local decision making. In this article, we carry out a systematic analysis to reveal the conditions under which human mobility data provides (or not) an enhancement over individual regional COVID-19 case prediction models that do not use mobility as a source of information. Our analysis—focused on U.S. county-based COVID-19 case prediction models—shows that (1) at most, 60% of counties improve their performance after adding mobility data; (2) the performance improvements are modest, with median correlation improvements of approximately 0.13; (3) improvements were lower for counties with higher Black, Hispanic, and other non-White populations as well as low-income and rural populations, pointing to potential bias in the mobility data negatively impacting predictive performance; and (4) different mobility datasets, predictive models, and training approaches bring about diverse performance improvements.more » « less
- 
            The COVID-19 pandemic has mainstreamed human mobility data into the public domain, with research focused on understanding the impact of mobility reduction policies as well as on regional COVID-19 case prediction models. Nevertheless, current research on COVID-19 case prediction tends to focus on performance improvements, masking relevant insights about when mobility data does not help, and more importantly, why, so that it can adequately inform local decision making. In this article, we carry out a systematic analysis to reveal the conditions under which human mobility data provides (or not) an enhancement over individual regional COVID-19 case prediction models that do not use mobility as a source of information. Our analysis— focused on U.S. county-based COVID-19 case prediction models—shows that (1) at most, 60% of counties improve their performance after adding mobility data; (2) the performance improvements are modest, with median correlation improvements of approximately 0.13; (3) improvements were lower for counties with higher Black, Hispanic, and other non-White populations as well as low-income and rural populations, pointing to potential bias in the mobility data negatively impacting predictive performance; and (4) different mobility datasets, predictive models, and training approaches bring about diverse performance improvements.more » « less
- 
            Abstract Background Coronavirus Disease 2019 (COVID-19) led to pandemic that affected almost all countries in the world. Many countries have implemented border restriction as a public health measure to limit local outbreak. However, there is inadequate scientific data to support such a practice, especially in the presence of an established local transmission of the disease. Objective To apply a metapopulation Susceptible-Exposed-Infectious-Recovered (SEIR) model with inspected migration to investigate the effect of border restriction as a public health measure to limit outbreak of coronavirus disease 2019. Methods We apply a modified metapopulation SEIR model with inspected migration with simulating population migration, and incorporating parameters such as efficiency of custom inspection in blocking infected travelers in the model. The population sizes were retrieved from government reports, while the number of COVID-19 patients were retrieved from Hong Kong Department of Health and China Centre for Disease Control (CDC) data. The R 0 was obtained from previous clinical studies. Results Complete border closure can help to reduce the cumulative COVID-19 case number and mortality in Hong Kong by 13.99% and 13.98% respectively. To prevent full occupancy of isolation facilities in Hong Kong; effective public health measures to reduce local R 0 to below 1.6 was necessary, apart from having complete border closure. Conclusions Early complete travel restriction is effective in reducing cumulative cases and mortality. However, additional anti-COVID-19 measures to reduce local R 0 to below 1.6 are necessary to prevent COVID-19 cases from overwhelming hospital isolation facilities.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    