skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2302970

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available March 21, 2026
  2. The increased availability of datasets during the COVID-19 pandemic enabled machine-learning approaches for modeling and forecasting infectious diseases. However, such approaches are known to amplify the bias in the data they are trained on. Bias in such input data like clinical case data for COVID-19 is difficult to measure due to disparities in testing availability, reporting standards, and healthcare access among different populations and regions. Furthermore, the way such biases may propagate through the modeling pipeline to decision-making is relatively unknown. Therefore, we present a system that leverages a highly detailed agent-based model (ABM) of infectious disease spread in a city to simulate the collection of biased clinical case data where the bias is known. Our system allows users to load either a pre-selected region or select their own (using OpenStreetMap data for the environment and census data for the population), specify population and infectious disease parameters, and the degree(s) to which different populations will be overrepresented or underrepresented in the case data. In addition to the system, we provide a large number of benchmark datasets that produce case data at different levels of bias for different regions. Wehope that infectious disease modelers will use these datasets to investigate how well their models are robust to data bias or whether their model is overfit to biased data. 
    more » « less
  3. The spread of infectious diseases is a highly complex spatiotemporal process, difficult to understand, predict, and effectively respond to. Machine learning and artificial intelligence (AI) have achieved impressive results in other learning and prediction tasks; however, while many AI solutions are developed for disease prediction, only a few of them are adopted by decision-makers to support policy interventions. Among several issues preventing their uptake, AI methods are known to amplify the bias in the data they are trained on. This is especially problematic for infectious disease models that typically leverage large, open, and inherently biased spatiotemporal data. These biases may propagate through the modeling pipeline to decision-making, resulting in inequitable policy interventions. Therefore, there is a need to gain an understanding of how the AI disease modeling pipeline can mitigate biased input data, in-processing models, and biased outputs. Specifically, our vision is to develop a large-scale micro-simulation of individuals from which human mobility, population, and disease ground-truth data can be obtained. From this complete dataset—which may not reflect the real world—we can sample and inject different types of bias. By using the sampled data in which bias is known (as it is given as the simulation parameter), we can explore how existing solutions for fairness in AI can mitigate and correct these biases and investigate novel AI fairness solutions. Achieving this vision would result in improved trust in such models for informing fair and equitable policy interventions. 
    more » « less