skip to main content

Title: Multiscale dynamic human mobility flow dataset in the U.S. during the COVID-19 epidemic

Understanding dynamic human mobility changes and spatial interaction patterns at different geographic scales is crucial for assessing the impacts of non-pharmaceutical interventions (such as stay-at-home orders) during the COVID-19 pandemic. In this data descriptor, we introduce a regularly-updated multiscale dynamic human mobility flow dataset across the United States, with data starting from March 1st, 2020. By analysing millions of anonymous mobile phone users’ visits to various places provided by SafeGraph, the daily and weekly dynamic origin-to-destination (O-D) population flows are computed, aggregated, and inferred at three geographic scales: census tract, county, and state. There is high correlation between our mobility flow dataset and openly available data sources, which shows the reliability of the produced data. Such a high spatiotemporal resolution human mobility flow dataset at different geographic scales over time may help monitor epidemic spreading dynamics, inform public health policy, and deepen our understanding of human behaviour changes under the unprecedented public health crisis. This up-to-date O-D flow open data can support many other social sensing and transportation applications.

; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
Scientific Data
Nature Publishing Group
Sponsoring Org:
National Science Foundation
More Like this
  1. Background Human movement is one of the forces that drive the spatial spread of infectious diseases. To date, reducing and tracking human movement during the COVID-19 pandemic has proven effective in limiting the spread of the virus. Existing methods for monitoring and modeling the spatial spread of infectious diseases rely on various data sources as proxies of human movement, such as airline travel data, mobile phone data, and banknote tracking. However, intrinsic limitations of these data sources prevent us from systematic monitoring and analyses of human movement on different spatial scales (from local to global). Objective Big data from social media such as geotagged tweets have been widely used in human mobility studies, yet more research is needed to validate the capabilities and limitations of using such data for studying human movement at different geographic scales (eg, from local to global) in the context of global infectious disease transmission. This study aims to develop a novel data-driven public health approach using big data from Twitter coupled with other human mobility data sources and artificial intelligence to monitor and analyze human movement at different spatial scales (from global to regional to local). Methods We will first develop a database with optimizedmore »spatiotemporal indexing to store and manage the multisource data sets collected in this project. This database will be connected to our in-house Hadoop computing cluster for efficient big data computing and analytics. We will then develop innovative data models, predictive models, and computing algorithms to effectively extract and analyze human movement patterns using geotagged big data from Twitter and other human mobility data sources, with the goal of enhancing situational awareness and risk prediction in public health emergency response and disease surveillance systems. Results This project was funded as of May 2020. We have started the data collection, processing, and analysis for the project. Conclusions Research findings can help government officials, public health managers, emergency responders, and researchers answer critical questions during the pandemic regarding the current and future infectious risk of a state, county, or community and the effectiveness of social/physical distancing practices in curtailing the spread of the virus. International Registered Report Identifier (IRRID) DERR1-10.2196/24432« less
  2. Abstract Non-pharmacologic interventions (NPIs) promote protective actions to lessen exposure risk to COVID-19 by reducing mobility patterns. However, there is a limited understanding of the underlying mechanisms associated with reducing mobility patterns especially for socially vulnerable populations. The research examines two datasets at a granular scale for five urban locations. Through exploratory analysis of networks, statistics, and spatial clustering, the research extensively investigates the exposure risk reduction after the implementation of NPIs to socially vulnerable populations, specifically lower income and non-white populations. The mobility dataset tracks population movement across ZIP codes for an origin–destination (O–D) network analysis. The population activity dataset uses the visits from census block groups (cbg) to points-of-interest (POIs) for network analysis of population-facilities interactions. The mobility dataset originates from a collaboration with StreetLight Data, a company focusing on transportation analytics, whereas the population activity dataset originates from a collaboration with SafeGraph, a company focusing on POI data. Both datasets indicated that low-income and non-white populations faced higher exposure risk. These findings can assist emergency planners and public health officials in comprehending how different populations are able to implement protective actions and it can inform more equitable and data-driven NPI policies for future epidemics.
  3. Understanding the space-time dynamics of human activities is essential in studying human security issues such as climate change impacts, pandemic spreading, or urban sustainability. Geotagged social media posts provide an open and space-time continuous data source with user locations which is convenient for studying human movement. However, the reliability of Chinese geotagged social media data for representing human mobility remains unclear. This study compares human movement data derived from the posts of Sina Weibo, one of the largest social media software in China, and that of Baidu Qianxi, a high-resolution human movement dataset from ‘Baidu Map’, a popular location-based service in China with 1.3 billion users. Correlation analysis was conducted from multiple dimensions of time periods (weekly and monthly), geographic scales (cities and provinces), and flow directions (inflow and outflow), and a case study on COVID-19 transmission was further explored with such data. The result shows that Sina Weibo data can reveal similar patterns as that of Baidu Qianxi, and that the correlation is higher at the provincial level than at the city level and higher at the monthly scale than at the weekly scale. The study also revealed spatial variations in the degree of similarity between the two sources.more »Findings from this study reveal the values and properties and spatiotemporal heterogeneity of human mobility data extracted from Weibo tweets, providing a reference for the proper use of social media posts as the data sources for human mobility studies.« less
  4. Abstract

    Since the first case of the novel coronavirus disease (COVID-19) was confirmed in Wuhan, China, social distancing has been promoted worldwide, including in the United States, as a major community mitigation strategy. However, our understanding remains limited in how people would react to such control measures, as well as how people would resume their normal behaviours when those orders were relaxed. We utilize an integrated dataset of real-time mobile device location data involving 100 million devices in the contiguous United States (plus Alaska and Hawaii) from February 2, 2020 to May 30, 2020. Built upon the common human mobility metrics, we construct a Social Distancing Index (SDI) to evaluate people’s mobility pattern changes along with the spread of COVID-19 at different geographic levels. We find that both government orders and local outbreak severity significantly contribute to the strength of social distancing. As people tend to practice less social distancing immediately after they observe a sign of local mitigation, we identify several states and counties with higher risks of continuous community transmission and a second outbreak. Our proposed index could help policymakers and researchers monitor people’s real-time mobility behaviours, understand the influence of government orders, and evaluate the risk ofmore »local outbreaks.

    « less
  5. Background Understanding how study design and monitoring strategies shape inference within, and synthesis across, studies is critical across biological disciplines. Many biological and field studies are short term and limited in scope. Monitoring studies are critical for informing public health about potential vectors of concern, such as Ixodes scapularis (black-legged ticks). Black-legged ticks are a taxon of ecological and human health concern due to their status as primary vectors of Borrelia burgdorferi , the bacteria that transmits Lyme disease. However, variation in black-legged tick monitoring, and gaps in data, are currently considered major barriers to understanding population trends and in turn, predicting Lyme disease risk. To understand how variable methodology in black-legged tick studies may influence which population patterns researchers find, we conducted a data synthesis experiment. Materials and Methods We searched for publicly available black-legged tick abundance dataset that had at least 9 years of data, using keywords about ticks in internet search engines, literature databases, data repositories and public health websites. Our analysis included 289 datasets from seven surveys from locations in the US, ranging in length from 9 to 24 years. We used a moving window analysis, a non-random resampling approach, to investigate the temporal stability ofmore »black-legged tick population trajectories across the US. We then used t-tests to assess differences in stability time across different study parameters. Results All of our sampled datasets required 4 or more years to reach stability. We also found several study factors can have an impact on the likelihood of a study reaching stability and of data leading to misleading results if the study does not reach stability. Specifically, datasets collected via dragging reached stability significantly faster than data collected via opportunistic sampling. Datasets that sampled larva reached stability significantly later than those that sampled adults or nymphs. Additionally, datasets collected at the broadest spatial scale (county) reached stability fastest. Conclusion We used 289 datasets from seven long term black-legged tick studies to conduct a non-random data resampling experiment, revealing that sampling design does shape inferences in black-legged tick population trajectories and how many years it takes to find stable patterns. Specifically, our results show the importance of study length, sampling technique, life stage, and geographic scope in understanding black-legged tick populations, in the absence of standardized surveillance methods. Current public health efforts based on existing black-legged tick datasets must take monitoring study parameters into account, to better understand if and how to use monitoring data to inform decisioning. We also advocate that potential future forecasting initiatives consider these parameters when projecting future black-legged tick population trends.« less