skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, June 12 until 2:00 AM ET on Friday, June 13 due to maintenance. We apologize for the inconvenience.


Search for: All records

Award ID contains: 2109647

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available March 21, 2026
  2. Free, publicly-accessible full text available January 1, 2026
  3. The increased availability of datasets during the COVID-19 pandemic enabled machine-learning approaches for modeling and forecasting infectious diseases. However, such approaches are known to amplify the bias in the data they are trained on. Bias in such input data like clinical case data for COVID-19 is difficult to measure due to disparities in testing availability, reporting standards, and healthcare access among different populations and regions. Furthermore, the way such biases may propagate through the modeling pipeline to decision-making is relatively unknown. Therefore, we present a system that leverages a highly detailed agent-based model (ABM) of infectious disease spread in a city to simulate the collection of biased clinical case data where the bias is known. Our system allows users to load either a pre-selected region or select their own (using OpenStreetMap data for the environment and census data for the population), specify population and infectious disease parameters, and the degree(s) to which different populations will be overrepresented or underrepresented in the case data. In addition to the system, we provide a large number of benchmark datasets that produce case data at different levels of bias for different regions. Wehope that infectious disease modelers will use these datasets to investigate how well their models are robust to data bias or whether their model is overfit to biased data. 
    more » « less
    Free, publicly-accessible full text available October 29, 2025
  4. Human mobility anomaly detection based on location is essential in areas such as public health, safety, welfare, and urban planning. Developing models and approaches for location-based anomaly detection requires a comprehensive dataset. However, privacy concerns and the absence of ground truth hinder the availability of publicly available datasets. With this paper, we provide extensive simulated human mobility datasets featuring various anomaly types created using an existing Urban Patterns of Life Simulation. To create these datasets, we inject changes in the logic of individual agents to change their behavior. Specifically, we create four of anomalous agent behavior by (1) changing the agents’ appetite (causing agents to have meals more frequently), (2) changing their group of interest (causing agents to interact with different agents from another group). (3) changing their social place selection (causing agents to visit different recreational places) and (4) changing their work schedule (causing agents to skip work), For each type of anomaly, we use three degrees of behavioral change to tune the difficulty of detecting the anomalous agents. To select agents to inject anomalous behavior into, we employ three methods: (1) Random selection using a centralized manipulation mechanism, (2) Spread based selection using an infectious disease model, and (3) through exposure of agents to a specific location. All datasets are split into normal and anomalous phases. The normal phase, which can be used for training models of normalcy, exhibits no anomalous behavior. The anomalous phase, which can be used for testing for anomalous detection algorithm, includes ground truth labels that indicate, for each five-minute simulation step, which agents are anomalous at that time. Datasets are generated using the maps (roads and buildings) for Atlanta and Berlin having 1k agents in each simulation. All datasets are openly available at https://osf.io/dg6t3/. Additionally, we provide instructions to regenerate the data for other locations and numbers of agents. 
    more » « less
    Free, publicly-accessible full text available October 29, 2025
  5. Identifying anomalous human spatial trajectory patterns can indicate dynamic changes in mobility behavior with applications in domains like infectious disease monitoring and elderly care. Recent advancements in large language models (LLMs) have demonstrated their ability to reason in a manner akin to humans. This presents significant potential for analyzing temporal patterns in human mobility. In this paper, we conduct empirical studies to assess the capabilities of leading LLMs like GPT-4 and Claude-2 in detecting anomalous behaviors from mobility data, by comparing to specialized methods. Our key findings demonstrate that LLMs can attain reasonable anomaly detection performance even without any specific cues. In addition, providing contextual clues about potential irregularities could further enhances their prediction efficacy. Moreover, LLMs can provide reasonable explanations for their judgments, thereby improving transparency. Our work provides insights on the strengths and limitations of LLMs for human spatial trajectory analysis. 
    more » « less
    Free, publicly-accessible full text available October 29, 2025
  6. Free, publicly-accessible full text available October 29, 2025
  7. Analyzing individual human trajectory data helps our understanding of human mobility and finds many commercial and academic applications. There are two main approaches to accessing trajectory data for research: one involves using real-world datasets like GeoLife, while the other employs simulations to synthesize data. Real-world data provides insights from real human activities, but such data is generally sparse due to voluntary participation. Conversely, simulated data can be more comprehensive but may capture unrealistic human behavior. In this Data and Resource paper, we combine the benefit of both by leveraging the statistical features of real-world data and the comprehensiveness of simulated data. Specifically, we extract features from the real-world GeoLife dataset such as the average number of individual daily trips, average radius of gyration, and maximum and minimum trip distances. We calibrate the Pattern of Life Simulation, a realistic simulation of human mobility, to reproduce these features. Therefore, we use a genetic algorithm to calibrate the parameters of the simulation to mimic the GeoLife features. For this calibration, we simulated numerous random simulation settings, measured the similarity of generated trajectories to GeoLife, and iteratively (over many generations) combined parameter settings of trajectory datasets most similar to GeoLife. Using the calibrated simulation, we simulate large trajectory datasets that we call GeoLife+, where + denotes the Kleene Plus, indicating unlimited replication with at least one occurrence. We provide simulated GeoLife+ data with 182, 1k, and 5k over 5 years, 10k, and 50k over a year and 100k users over 6 months of simulation lifetime. 
    more » « less
    Free, publicly-accessible full text available October 29, 2025
  8. Transformer-based models are popular for time series forecasting and spatiotemporal prediction due to their ability to infer semantic correlations in long sequences. However, for human mobility prediction, temporal correlations, such as location patterns at the same time on previous days or weeks, are essential. While positional encodings help retain order, the self-attention mechanism causes a loss of temporal detail. To validate this claim, we used a simple approach in the 2nd ACM SIGSPATIAL Human Mobility Prediction Challenge, predicting locations based on past patterns weighted by reliability scores for missing data. Our simple approach was among the top 10 competitors and significantly outperformed the Transformer-based model that won the 2023 challenge. 
    more » « less
    Free, publicly-accessible full text available October 29, 2025
  9. Human mobility data science using trajectories or check-ins of individuals has many applications. Recently, we have seen a plethora of research efforts that tackle these applications. However, research progress in this field is limited by a lack of large and representative datasets. The largest and most commonly used dataset of individual human trajectories captures fewer than 200 individuals, while datasets of individual human check-ins capture fewer than 100 check-ins per city per day. Thus, it is not clear if findings from the human mobility data science community would generalize to large populations. Since obtaining massive, representative, and individual-level human mobility data is hard to come by due to privacy considerations, the vision of this work is to embrace the use of data generated by large-scale socially realistic microsimulations. Informed by both real data and leveraging social and behavioral theories, massive spatially explicit microsimulations may allow us to simulate entire megacities at the person level. The simulated worlds, which do not capture any identifiable personal information, allow us to perform “in silico” experiments using the simulated world as a sandbox in which we have perfect information and perfect control without jeopardizing the privacy of any actual individual. In silico experiments have become commonplace in other scientific domains such as chemistry and biology, permitting experiments that foster the understanding of concepts without any harm to individuals. This work describes challenges and opportunities for leveraging massive and realistic simulated alternate worlds for in silico human mobility data science. 
    more » « less
    Free, publicly-accessible full text available June 30, 2025