skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 1, 2025

Title: A Large-Scale Geographically Explicit Synthetic Population with Social Networks for the United States
Within the geo-simulation research domain, micro-simulation and agent-based modeling often require the creation of synthetic populations. Creating such data is a time-consuming task and often lacks social networks, which are crucial for studying human interactions (e.g., disease spread, disaster response) while at the same time impacting decision-making. We address these challenges by introducing a Python based method that uses the open data including that from 2020 U.S. Census data to generate a large-scale realistic geographically explicit synthetic population for America’s 50 states and Washington D.C. along with the stylized social networks (e.g., home, work and schools). The resulting synthetic population can be utilized within various geo-simulation approaches (e.g., agent-based modeling), exploring the emergence of complex phenomena through human interactions and further fostering the study of urban digital twins.  more » « less
Award ID(s):
2200173
PAR ID:
10630446
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Nature
Date Published:
Journal Name:
Scientific Data
Volume:
11
Issue:
1
ISSN:
2052-4463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. na (Ed.)
    Over the last two decades, there has been a growth in the applications of geographically-explicit agent-based models. One thing such models have in common is the creation of synthetic populations to initialize the artificial worlds in which the agents inhabit. One challenge such models face is that it is often difficult to create reusable geographically-explicit synthetic populations with social networks. In this paper, we introduce a Python based method that generates a reusable geographically-explicit synthetic population dataset along with its social networks. In addition, we present a pipeline for using the population datasets for model initialization. With this pipeline, multiple spatial and temporal scales of geographically-explicit agent-based models are presented focusing on Western New York. Such models not only demonstrate the utility of our synthetic population on commuting patterns but also how social networks can impact the simulation of disease spread and vaccination uptake. By doing so, this pipeline could benefit any modeler wishing to reuse synthetic populations with realistic geographic locations and social networks. 
    more » « less
  2. Location-based social networks (LBSNs) have been studied extensively in recent years. However, utilizing real-world LBSN data sets yields several weaknesses: sparse and small data sets, privacy concerns, and a lack of authoritative ground-truth. To overcome these weaknesses, we leverage a large-scale LBSN simulation to create a framework to simulate human behavior and to create synthetic but realistic LBSN data based on human patterns of life. Such data not only captures the location of users over time but also their interactions via social networks. Patterns of life are simulated by giving agents (i.e., people) an array of “needs” that they aim to satisfy, e.g., agents go home when they are tired, to restaurants when they are hungry, to work to cover their financial needs, and to recreational sites to meet friends and satisfy their social needs. While existing real-world LBSN data sets are trivially small, the proposed framework provides a source for massive LBSN benchmark data that closely mimics the real-world. As such, it allows us to capture 100% of the (simulated) population without any data uncertainty, privacy-related concerns, or incompleteness. It allows researchers to see the (simulated) world through the lens of an omniscient entity having perfect data. Our framework is made available to the community. In addition, we provide a series of simulated benchmark LBSN data sets using different synthetic towns and real-world urban environments obtained from OpenStreetMap. The simulation software and data sets, which comprise gigabytes of spatio-temporal and temporal social network data, are made available to the research community. 
    more » « less
  3. Data generators have been heavily used in creating massive trajectory datasets to address common challenges of real-world datasets, including privacy, cost of data collection, and data quality. However, such generators often overlook social and physiological characteristics of individuals and as such their results are often limited to simple movement patterns. To address these shortcomings, we propose an agent-based simulation framework that facilitates the development of behavioral models in which agents correspond to individuals that act based on personal preferences, goals, and needs within a realistic geographical environment. Researchers can use a drag-and-drop interface to design and control their own world including the geospatial and social (i.e. geo-social) properties. The framework is capable of generating and streaming very large data that captures the basic patterns of life in urban areas. Streaming data from the simulation can be accessed in real time through a dedicated API. 
    more » « less
  4. The social media have been increasingly used for disaster management (DM) via providing real time data on a broad scale. For example, some smartphone applications (e.g. Disaster Alert and Federal Emergency Management Agency (FEMA) App) can be used to increase the efficiency of prepositioning supplies and to enhance the effectiveness of disaster relief efforts. To maximize utilities of these apps, it is imperative to have robust human behavior models in social networks with detailed expressions of individual decision-making processes and of the interactions among people. In this paper, we introduce a hierarchical human behavior model by associating extended Decision Field Theory (e-DFT) with the opinion formation and innovation diffusion models. Particularly, its expressiveness and validity are addressed in three ways. First, we estimate individual’s choice patterns in social networks by deriving people’s asymptotic choice probabilities within e-DFT. Second, by analyzing opinion formation models and innovation diffusion models in different types of social networks, the effects of neighbor’s opinions on people and their interactions are demonstrated. Finally, an agent-based simulation is used to trace agents’ dynamic behaviors in different scenarios. The simulated results reveal that the proposed models can be used to establish better disaster management strategies in natural disasters. 
    more » « less
  5. Location-based social networks (LBSNs) have been studied extensively in recent years. However, utilizing real-world LBSN datasets in such studies has severe weaknesses: sparse and small datasets, privacy concerns, and a lack of authoritative ground-truth. Our vision is to create a large scale geo-simulation framework to simulate human behavior and to create synthetic but realistic LBSN data that captures the location of users over time as well as social interactions of users in a social network. While existing LBSN datasets are trivially small, such a framework would provide the first source of massive LBSN benchmark data which would closely mimic the real world, containing high-fidelity information of location, and social connections of millions of simulated agents over several years of simulated time. Therefore, it would serve the research community by revitalizing and reshaping research on LBSNs by allowing researchers to see the (simulated) world through the lens of an omniscient entity having perfect data. These evaluations will guide future research enabling us to develop solutions to improve LBSN applications such as user-location recommendation, friend recommendation, location prediction, and location privacy. 
    more » « less