Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available December 2, 2025
-
Free, publicly-accessible full text available December 2, 2025
-
The increased availability of datasets during the COVID-19 pandemic enabled machine-learning approaches for modeling and forecasting infectious diseases. However, such approaches are known to amplify the bias in the data they are trained on. Bias in such input data like clinical case data for COVID-19 is difficult to measure due to disparities in testing availability, reporting standards, and healthcare access among different populations and regions. Furthermore, the way such biases may propagate through the modeling pipeline to decision-making is relatively unknown. Therefore, we present a system that leverages a highly detailed agent-based model (ABM) of infectious disease spread in a city to simulate the collection of biased clinical case data where the bias is known. Our system allows users to load either a pre-selected region or select their own (using OpenStreetMap data for the environment and census data for the population), specify population and infectious disease parameters, and the degree(s) to which different populations will be overrepresented or underrepresented in the case data. In addition to the system, we provide a large number of benchmark datasets that produce case data at different levels of bias for different regions. Wehope that infectious disease modelers will use these datasets to investigate how well their models are robust to data bias or whether their model is overfit to biased data.more » « lessFree, publicly-accessible full text available October 29, 2025
-
Human mobility anomaly detection based on location is essential in areas such as public health, safety, welfare, and urban planning. Developing models and approaches for location-based anomaly detection requires a comprehensive dataset. However, privacy concerns and the absence of ground truth hinder the availability of publicly available datasets. With this paper, we provide extensive simulated human mobility datasets featuring various anomaly types created using an existing Urban Patterns of Life Simulation. To create these datasets, we inject changes in the logic of individual agents to change their behavior. Specifically, we create four of anomalous agent behavior by (1) changing the agents’ appetite (causing agents to have meals more frequently), (2) changing their group of interest (causing agents to interact with different agents from another group). (3) changing their social place selection (causing agents to visit different recreational places) and (4) changing their work schedule (causing agents to skip work), For each type of anomaly, we use three degrees of behavioral change to tune the difficulty of detecting the anomalous agents. To select agents to inject anomalous behavior into, we employ three methods: (1) Random selection using a centralized manipulation mechanism, (2) Spread based selection using an infectious disease model, and (3) through exposure of agents to a specific location. All datasets are split into normal and anomalous phases. The normal phase, which can be used for training models of normalcy, exhibits no anomalous behavior. The anomalous phase, which can be used for testing for anomalous detection algorithm, includes ground truth labels that indicate, for each five-minute simulation step, which agents are anomalous at that time. Datasets are generated using the maps (roads and buildings) for Atlanta and Berlin having 1k agents in each simulation. All datasets are openly available at https://osf.io/dg6t3/. Additionally, we provide instructions to regenerate the data for other locations and numbers of agents.more » « lessFree, publicly-accessible full text available October 29, 2025
-
Free, publicly-accessible full text available October 28, 2025
-
Free, publicly-accessible full text available October 21, 2025
-
Free, publicly-accessible full text available October 21, 2025
-
Human mobility data offers valuable insights for many applications such as urban planning and pandemic response, but its use also raises privacy concerns. In this paper, we introduce the Hierarchical and Multi-Resolution Network (HRNet), a novel deep generative model specifically designed to synthesize realistic human mobility data while guaranteeing differential privacy. We first identify the key difficulties inherent in learning human mobility data under differential privacy. In response to these challenges, HRNet integrates three components: a hierarchical location encoding mechanism, multi-task learning across multiple resolutions, and private pre-training. These elements collectively enhance the model's ability under the constraints of differential privacy. Through extensive comparative experiments utilizing a real-world dataset, HRNet demonstrates a marked improvement over existing methods in balancing the utility-privacy trade-off.more » « lessFree, publicly-accessible full text available July 1, 2025
-
Differentially Private Federated Learning (DP-FL) has garnered attention as a collaborative machine learning approach that ensures formal privacy. Most DP-FL approaches ensure DP at the record-level within each silo for cross-silo FL. However, a single user's data may extend across multiple silos, and the desired user-level DP guarantee for such a setting remains unknown. In this study, we present Uldp-FL, a novel FL framework designed to guarantee user-level DP in cross-silo FL where a single user's data may belong to multiple silos. Our proposed algorithm directly ensures user-level DP through per-user weighted clipping, departing from group-privacy approaches. We provide a theoretical analysis of the algorithm's privacy and utility. Additionally, we improve the utility of the proposed algorithm with an enhanced weighting strategy based on user record distribution and design a novel private protocol that ensures no additional information is revealed to the silos and the server. Experiments on real-world datasets show substantial improvements in our methods in privacy-utility trade-offs under user-level DP compared to baseline methods. To the best of our knowledge, our work is the first FL framework that effectively provides user-level DP in the general cross-silo FL setting.more » « lessFree, publicly-accessible full text available July 1, 2025
-
The spread of infectious diseases is a highly complex spatiotemporal process, difficult to understand, predict, and effectively respond to. Machine learning and artificial intelligence (AI) have achieved impressive results in other learning and prediction tasks; however, while many AI solutions are developed for disease prediction, only a few of them are adopted by decision-makers to support policy interventions. Among several issues preventing their uptake, AI methods are known to amplify the bias in the data they are trained on. This is especially problematic for infectious disease models that typically leverage large, open, and inherently biased spatiotemporal data. These biases may propagate through the modeling pipeline to decision-making, resulting in inequitable policy interventions. Therefore, there is a need to gain an understanding of how the AI disease modeling pipeline can mitigate biased input data, in-processing models, and biased outputs. Specifically, our vision is to develop a large-scale micro-simulation of individuals from which human mobility, population, and disease ground-truth data can be obtained. From this complete dataset—which may not reflect the real world—we can sample and inject different types of bias. By using the sampled data in which bias is known (as it is given as the simulation parameter), we can explore how existing solutions for fairness in AI can mitigate and correct these biases and investigate novel AI fairness solutions. Achieving this vision would result in improved trust in such models for informing fair and equitable policy interventions.more » « lessFree, publicly-accessible full text available June 30, 2025