skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: In Silico Human Mobility Data Science: Leveraging Massive Simulated Mobility Data (Vision Paper)
Human mobility data science using trajectories or check-ins of individuals has many applications. Recently, we have seen a plethora of research efforts that tackle these applications. However, research progress in this field is limited by a lack of large and representative datasets. The largest and most commonly used dataset of individual human trajectories captures fewer than 200 individuals, while datasets of individual human check-ins capture fewer than 100 check-ins per city per day. Thus, it is not clear if findings from the human mobility data science community would generalize to large populations. Since obtaining massive, representative, and individual-level human mobility data is hard to come by due to privacy considerations, the vision of this work is to embrace the use of data generated by large-scale socially realistic microsimulations. Informed by both real data and leveraging social and behavioral theories, massive spatially explicit microsimulations may allow us to simulate entire megacities at the person level. The simulated worlds, which do not capture any identifiable personal information, allow us to perform “in silico” experiments using the simulated world as a sandbox in which we have perfect information and perfect control without jeopardizing the privacy of any actual individual. In silico experiments have become commonplace in other scientific domains such as chemistry and biology, permitting experiments that foster the understanding of concepts without any harm to individuals. This work describes challenges and opportunities for leveraging massive and realistic simulated alternate worlds for in silico human mobility data science.  more » « less
Award ID(s):
2109647
PAR ID:
10582628
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
ACM Transactions on Spatial Algorithms and Systems
Volume:
10
Issue:
2
ISSN:
2374-0353
Page Range / eLocation ID:
1 to 27
Subject(s) / Keyword(s):
Spatial Simulation, Mobility Data Science, Trajectory Data, Location Based Social Network Data, In Silico
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Understanding human mobility has become an important aspect of location-based services in tasks such as personalized recommendation and individual moving pattern recognition, enabled by the large volumes of data from geo-tagged social media (GTSM). Prior studies mainly focus on analyzing human historical footprints collected by GTSM and assuming the veracity of the data, which need not hold when some users are not willing to share their real footprints due to privacy concerns—thereby affecting reliability/authenticity. In this study, we address the problem of Inferring Real Mobility (IRMo) of users, from their unreliable historical traces. Tackling IRMo is a non-trivial task due to the: (1) sparsity of check-in data; (2) suspicious counterfeit check-in behaviors; and (3) unobserved dependencies in human trajectories. To address these issues, we develop a novel Graph-enhanced Attention model calledIRMoGA, which attempts to capture underlying mobility patterns and check-in correlations by exploiting the unreliable spatio-temporal data. Specifically, we incorporate the attention mechanism (rather than solely relying on traditional recursive models) to understand the regularity of human mobility, while employing a graph neural network to understand the mutual interactions from human historical check-ins and leveraging prior knowledge to alleviate the inferring bias. Our experiments conducted on four real-world datasets demonstrate the superior performance of IRMoGA over several state-of-the-art baselines, e.g., up to 39.16% improvement regarding the Recall score on Foursquare. 
    more » « less
  2. Human mobility data offers valuable insights for many applications such as urban planning and pandemic response, but its use also raises privacy concerns. In this paper, we introduce the Hierarchical and Multi-Resolution Network (HRNet), a novel deep generative model specifically designed to synthesize realistic human mobility data while guaranteeing differential privacy. We first identify the key difficulties inherent in learning human mobility data under differential privacy. In response to these challenges, HRNet integrates three components: a hierarchical location encoding mechanism, multi-task learning across multiple resolutions, and private pre-training. These elements collectively enhance the model's ability under the constraints of differential privacy. Through extensive comparative experiments utilizing a real-world dataset, HRNet demonstrates a marked improvement over existing methods in balancing the utility-privacy trade-off. 
    more » « less
  3. Analyzing individual human trajectory data helps our understanding of human mobility and finds many commercial and academic applications. There are two main approaches to accessing trajectory data for research: one involves using real-world datasets like GeoLife, while the other employs simulations to synthesize data. Real-world data provides insights from real human activities, but such data is generally sparse due to voluntary participation. Conversely, simulated data can be more comprehensive but may capture unrealistic human behavior. In this Data and Resource paper, we combine the benefit of both by leveraging the statistical features of real-world data and the comprehensiveness of simulated data. Specifically, we extract features from the real-world GeoLife dataset such as the average number of individual daily trips, average radius of gyration, and maximum and minimum trip distances. We calibrate the Pattern of Life Simulation, a realistic simulation of human mobility, to reproduce these features. Therefore, we use a genetic algorithm to calibrate the parameters of the simulation to mimic the GeoLife features. For this calibration, we simulated numerous random simulation settings, measured the similarity of generated trajectories to GeoLife, and iteratively (over many generations) combined parameter settings of trajectory datasets most similar to GeoLife. Using the calibrated simulation, we simulate large trajectory datasets that we call GeoLife+, where + denotes the Kleene Plus, indicating unlimited replication with at least one occurrence. We provide simulated GeoLife+ data with 182, 1k, and 5k over 5 years, 10k, and 50k over a year and 100k users over 6 months of simulation lifetime. 
    more » « less
  4. We conceptualize and measure upward mobility over income or wealth. At the core of our exercise is the Growth Progressivity Axiom: transfers of instantaneous growth rates from relatively rich to poor individuals increases upward mobility. This axiom, along with mild auxiliary restrictions, identifies an “upward mobility kernel” with a single free parameter, in which mobility is linear in individual growth rates, with geometrically declining weights on baseline incomes. We extend this kernel to trajectories over intervals. The analysis delivers an upward mobility index that does not rely on panel data. That significantly expands our analytical scope to data-poor settings. (JEL D31, D63, I32, O15, O40) 
    more » « less
  5. Location-based social networks (LBSNs) have been studied extensively in recent years. However, utilizing real-world LBSN data sets yields several weaknesses: sparse and small data sets, privacy concerns, and a lack of authoritative ground-truth. To overcome these weaknesses, we leverage a large-scale LBSN simulation to create a framework to simulate human behavior and to create synthetic but realistic LBSN data based on human patterns of life. Such data not only captures the location of users over time but also their interactions via social networks. Patterns of life are simulated by giving agents (i.e., people) an array of “needs” that they aim to satisfy, e.g., agents go home when they are tired, to restaurants when they are hungry, to work to cover their financial needs, and to recreational sites to meet friends and satisfy their social needs. While existing real-world LBSN data sets are trivially small, the proposed framework provides a source for massive LBSN benchmark data that closely mimics the real-world. As such, it allows us to capture 100% of the (simulated) population without any data uncertainty, privacy-related concerns, or incompleteness. It allows researchers to see the (simulated) world through the lens of an omniscient entity having perfect data. Our framework is made available to the community. In addition, we provide a series of simulated benchmark LBSN data sets using different synthetic towns and real-world urban environments obtained from OpenStreetMap. The simulation software and data sets, which comprise gigabytes of spatio-temporal and temporal social network data, are made available to the research community. 
    more » « less