NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Simulated Infectious Diseases Datasets with Controlled Data Bias

https://doi.org/10.1145/3711896.3737401

Kong, Ruochen; Anderson, Taylor; Scotch, Matthew; Heslop, David J; Khaokaew, Yonchanok; Xue, Hao; Xiong, Li; MacIntyre, Chandini Raina; Salim, Flora D; Züfle, Andreas (August 2025, ACM)

Free, publicly-accessible full text available August 3, 2026
An Infectious Disease Spread Simulation to Control Data Bias

https://doi.org/10.1145/3678717.3691293

Kong, Ruochen; Anderson, Taylor; Heslop, David; Zufle, Andreas (October 2024, ACM)

The increased availability of datasets during the COVID-19 pandemic enabled machine-learning approaches for modeling and forecasting infectious diseases. However, such approaches are known to amplify the bias in the data they are trained on. Bias in such input data like clinical case data for COVID-19 is difficult to measure due to disparities in testing availability, reporting standards, and healthcare access among different populations and regions. Furthermore, the way such biases may propagate through the modeling pipeline to decision-making is relatively unknown. Therefore, we present a system that leverages a highly detailed agent-based model (ABM) of infectious disease spread in a city to simulate the collection of biased clinical case data where the bias is known. Our system allows users to load either a pre-selected region or select their own (using OpenStreetMap data for the environment and census data for the population), specify population and infectious disease parameters, and the degree(s) to which different populations will be overrepresented or underrepresented in the case data. In addition to the system, we provide a large number of benchmark datasets that produce case data at different levels of bias for different regions. Wehope that infectious disease modelers will use these datasets to investigate how well their models are robust to data bias or whether their model is overfit to biased data.
more » « less
Full Text Available
In Silico Human Mobility Data Science: Leveraging Massive Simulated Mobility Data (Vision Paper)

https://doi.org/10.1145/3672557

Züfle, Andreas; Pfoser, Dieter; Wenk, Carola; Crooks, Andrew; Kavak, Hamdi; Anderson, Taylor; Kim, Joon-Seok; Holt, Nathan; Diantonio, Andrew (June 2024, ACM Transactions on Spatial Algorithms and Systems)

Human mobility data science using trajectories or check-ins of individuals has many applications. Recently, we have seen a plethora of research efforts that tackle these applications. However, research progress in this field is limited by a lack of large and representative datasets. The largest and most commonly used dataset of individual human trajectories captures fewer than 200 individuals, while datasets of individual human check-ins capture fewer than 100 check-ins per city per day. Thus, it is not clear if findings from the human mobility data science community would generalize to large populations. Since obtaining massive, representative, and individual-level human mobility data is hard to come by due to privacy considerations, the vision of this work is to embrace the use of data generated by large-scale socially realistic microsimulations. Informed by both real data and leveraging social and behavioral theories, massive spatially explicit microsimulations may allow us to simulate entire megacities at the person level. The simulated worlds, which do not capture any identifiable personal information, allow us to perform “in silico” experiments using the simulated world as a sandbox in which we have perfect information and perfect control without jeopardizing the privacy of any actual individual. In silico experiments have become commonplace in other scientific domains such as chemistry and biology, permitting experiments that foster the understanding of concepts without any harm to individuals. This work describes challenges and opportunities for leveraging massive and realistic simulated alternate worlds for in silico human mobility data science.
more » « less
Full Text Available
Vaccine Attitudes and Uptake Among Latino Residents of a Former COVID-19 Hotspot

https://doi.org/10.1353/hpu.2024.a919821

Cleaveland, Carol; Anderson, Taylor; McNally, Kimberly; Roess, Amira A. (February 2024, Journal of Health Care for the Poor and Underserved)

Full Text Available
Leveraging Simulation Data to Understand Bias in Predictive Models of Infectious Disease Spread

https://doi.org/10.1145/3660631

Züfle, Andreas; Salim, Flora; Anderson, Taylor; Scotch, Matthew; Xiong, Li; Sokol, Kacper; Xue, Hao; Kong, Ruochen; Heslop, David; Paik, Hye-Young; et al (June 2024, ACM Transactions on Spatial Algorithms and Systems)

The spread of infectious diseases is a highly complex spatiotemporal process, difficult to understand, predict, and effectively respond to. Machine learning and artificial intelligence (AI) have achieved impressive results in other learning and prediction tasks; however, while many AI solutions are developed for disease prediction, only a few of them are adopted by decision-makers to support policy interventions. Among several issues preventing their uptake, AI methods are known to amplify the bias in the data they are trained on. This is especially problematic for infectious disease models that typically leverage large, open, and inherently biased spatiotemporal data. These biases may propagate through the modeling pipeline to decision-making, resulting in inequitable policy interventions. Therefore, there is a need to gain an understanding of how the AI disease modeling pipeline can mitigate biased input data, in-processing models, and biased outputs. Specifically, our vision is to develop a large-scale micro-simulation of individuals from which human mobility, population, and disease ground-truth data can be obtained. From this complete dataset—which may not reflect the real world—we can sample and inject different types of bias. By using the sampled data in which bias is known (as it is given as the simulation parameter), we can explore how existing solutions for fairness in AI can mitigate and correct these biases and investigate novel AI fairness solutions. Achieving this vision would result in improved trust in such models for informing fair and equitable policy interventions.
more » « less
Full Text Available
Synthetic Geosocial Network Generation

https://doi.org/10.1145/3615896.3628345

Gallagher, Ketevan; Anderson, Taylor; Crooks, Andrew; Züfle, Andreas (November 2023, Proceedings of the 7th ACM SIGSPATIAL Workshop on Location-based Recommendations, Geosocial Networks and Geoadvertising (LocalRec'23))
A Framework for Simulating Emergent Health Behaviors in Spatial Agent-Based Models of Disease Spread

https://doi.org/10.1145/3615891.3628010

Von Hoene, Emma; Roess, Amira; Achuthan, Shivani; Anderson, Taylor (November 2023, Proceedings of the 6th ACM SIGSPATIAL International Workshop on GeoSpatial Simulation (GeoSim'23))
GeoAI for Public Health

https://doi.org/10.1201/9781003308423-15

Züfle, Andreas; Anderson, Taylor; Kavak, Hamdi; Pfoser, Dieter; Kim, Joon-Seok; Roess, Amira (December 2023, CRC Press)

Infectious disease spread within the human population can be conceptualized as a complex system composed of individuals who interact and transmit viruses through spatio-temporal processes that manifest across and between scales. The complexity of this system ultimately means that the spread of infectious diseases is difficult to understand, predict, and respond to effectively. Research interest in GeoAI for public health has been fueled by the increased availability of rich data sources such as human mobility data, OpenStreetMap data, contact tracing data, symptomatic online surveys, retail and commerce data, genomics data, and more. This data availability has resulted in a wide variety of data-driven solutions for infectious disease spread prediction which show potential in enhancing our forecasting capabilities. This book chapter (1) motivates the need for AI-based solutions in public health by showing the heterogeneity of human behavior related to health, (2) provides a brief survey of current state-of-the-art solutions using AI for infectious disease spread prediction, (3) describes a use-case of using large-scale human mobility data to inform AI models for the prediction of infectious disease spread in a city, and (4) provides future research directions and ideas.
more » « less
Full Text Available
A Data-Driven Decision-Making Framework for Spatial Agent-Based Models of Infectious Disease Spread

https://doi.org/10.4230/LIPIcs.GIScience.2023.76

Von Hoene, Emma; Roess, Amira; Anderson, Taylor (January 2023, Proceedings of 12th International Conference on Geographic Information Science (GIScience 2023))
Beecham, Roger; Long, Jed A.; Smith, Dianna; Zhao, Qunshan; Wise, Sarah (Ed.)
Agent-based models (ABMs) are powerful tools used for better understanding, predicting, and responding to diseases. ABMs are well-suited to represent human health behaviors, a key driver of disease spread. However, many existing ABMs of infectious respiratory disease spread oversimplify or ignore behavioral aspects due to limited data and the variety of behavioral theories available. Therefore, this study aims to develop and implement a data-driven framework for agent decision-making related to health behaviors in geospatial ABMs of infectious disease spread. The agent decision-making framework uses a logistic regression model expressed in the form of odds ratios to calculate the probability of adopting a behavior. The framework is integrated into a geospatial ABM that simulates the spread of COVID-19 and mask usage among the student population at George Mason University in Fall 2021. The framework leverages odds ratios, which can be derived from surveys or open data, and can be modified to incorporate variables identified by behavioral theories. This advancement will offer the public and decision-makers greater insight into disease transmission, accurate predictions on disease outcomes, and preparation for future infectious disease outbreaks.
more » « less
Predicting building types using OpenStreetMap

https://doi.org/10.1038/s41598-022-24263-w

Atwal, Kuldip Singh; Anderson, Taylor; Pfoser, Dieter; Züfle, Andreas (December 2022, Scientific Reports)

Abstract Having accurate building information is paramount for a plethora of applications, including humanitarian efforts, city planning, scientific studies, and navigation systems. While volunteered geographic information from sources such as OpenStreetMap (OSM) has good building geometry coverage, descriptive attributes such as the type of a building are sparse. To fill this gap, this study proposes a supervised learning-based approach to provide meaningful, semantic information for OSM data without manual intervention. We present a basic demonstration of our approach that classifies buildings into either residential or non-residential types for three study areas: Fairfax County in Virginia (VA), Mecklenburg County in North Carolina (NC), and the City of Boulder in Colorado (CO). The model leverages (i) available OSM tags capturing non-spatial attributes, (ii) geometric and topological properties of the building footprints including adjacent types of roads, proximity to parking lots, and building size. The model is trained and tested using ground truth data available for the three study areas. The results show that our approach achieves high accuracy in predicting building types for the selected areas. Additionally, a trained model is transferable with high accuracy to other regions where ground truth data is unavailable. The OSM and data science community are invited to build upon our approach to further enrich the volunteered geographic information in an automated manner.
more » « less
Full Text Available

« Prev Next »

Search for: All records