skip to main content

Title: Forecasting influenza activity using machine-learned mobility map

Human mobility is a primary driver of infectious disease spread. However, existing data is limited in availability, coverage, granularity, and timeliness. Data-driven forecasts of disease dynamics are crucial for decision-making by health officials and private citizens alike. In this work, we focus on a machine-learned anonymized mobility map (hereon referred to as AMM) aggregated over hundreds of millions of smartphones and evaluate its utility in forecasting epidemics. We factor AMM into a metapopulation model to retrospectively forecast influenza in the USA and Australia. We show that the AMM model performs on-par with those based on commuter surveys, which are sparsely available and expensive. We also compare it with gravity and radiation based models of mobility, and find that the radiation model’s performance is quite similar to AMM and commuter flows. Additionally, we demonstrate our model’s ability to predict disease spread even across state boundaries. Our work contributes towards developing timely infectious disease forecasting at a global scale using human mobility datasets expanding their applications in the area of infectious disease epidemiology.

; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Award ID(s):
1443054 1745207 1633028
Publication Date:
Journal Name:
Nature Communications
Nature Publishing Group
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Human mobility plays an important role in the dynamics of infectious disease spread. Evidence from the initial nationwide lockdowns for COVID− 19 indicates that restricting human mobility is an effective strategy to contain the spread. While a direct correlation was observed early on, it is not known how mobility impacted COVID− 19 infection growth rates once lockdowns are lifted, primarily due to modulation by other factors such as face masks, social distancing, and the non-linear patterns of both mobility and infection growth. This paper introduces a piece-wise approach to better explore the phase-wise association between state-level COVID− 19 incidence data and anonymizedmore »mobile phone data for various states in the United States. Prior literature analyzed the linear correlation between mobility and the number of cases during the early stages of the pandemic. However, it is important to capture the non-linear dynamics of case growth and mobility to be usable for both tracking and forecasting COVID− 19 infections, which is accomplished by the piece-wise approach. The associations between mobility and case growth rate varied widely for various phases of the epidemic curve when the stay-at-home orders were lifted. The mobility growth patterns had a strong positive association of 0.7 with the growth in the number of cases, with a lag of 5 to 7 weeks, for the fast-growth phase of the pandemic, for only 20 states that had a peak between July 1st and September 30, 2020. Overall though, mobility cannot be used to predict the rise in the number of cases after initial lockdowns have been lifted. Our analysis explores the gradual diminishing value of mobility associations in the later stage of the outbreak. Our analysis indicates that the relationship between mobility and the increase in the number of cases, once lockdowns have been lifted, is tenuous at best and there is no strong relationship between these signals. But we identify the remnants of the last associations in specific phases of the growth curve.« less
  2. Background Human movement is one of the forces that drive the spatial spread of infectious diseases. To date, reducing and tracking human movement during the COVID-19 pandemic has proven effective in limiting the spread of the virus. Existing methods for monitoring and modeling the spatial spread of infectious diseases rely on various data sources as proxies of human movement, such as airline travel data, mobile phone data, and banknote tracking. However, intrinsic limitations of these data sources prevent us from systematic monitoring and analyses of human movement on different spatial scales (from local to global). Objective Big data from socialmore »media such as geotagged tweets have been widely used in human mobility studies, yet more research is needed to validate the capabilities and limitations of using such data for studying human movement at different geographic scales (eg, from local to global) in the context of global infectious disease transmission. This study aims to develop a novel data-driven public health approach using big data from Twitter coupled with other human mobility data sources and artificial intelligence to monitor and analyze human movement at different spatial scales (from global to regional to local). Methods We will first develop a database with optimized spatiotemporal indexing to store and manage the multisource data sets collected in this project. This database will be connected to our in-house Hadoop computing cluster for efficient big data computing and analytics. We will then develop innovative data models, predictive models, and computing algorithms to effectively extract and analyze human movement patterns using geotagged big data from Twitter and other human mobility data sources, with the goal of enhancing situational awareness and risk prediction in public health emergency response and disease surveillance systems. Results This project was funded as of May 2020. We have started the data collection, processing, and analysis for the project. Conclusions Research findings can help government officials, public health managers, emergency responders, and researchers answer critical questions during the pandemic regarding the current and future infectious risk of a state, county, or community and the effectiveness of social/physical distancing practices in curtailing the spread of the virus. International Registered Report Identifier (IRRID) DERR1-10.2196/24432« less
  3. null (Ed.)
    Understanding the dynamics of the spread of COVID-19 between connected communities is fundamental in planning appropriate mitigation measures. To that end, we propose and analyze a novel metapopulation network model, particularly suitable for modeling commuter traffic patterns, that takes into account the connectivity between a heterogeneous set of communities, each with its own infection dynamics. In the novel metapopulation model that we propose here, transport schemes developed in optimal transport theory provide an efficient and easily implementable way of describing the temporary population redistribution due to traffic, such as the daily commuter traffic between work and residence. Locally, infection dynamicsmore »in individual communities are described in terms of a susceptible-exposed-infected-recovered (SEIR) compartment model, modified to account for the specific features of COVID-19, most notably its spread by asymptomatic and presymptomatic infected individuals. The mathematical foundation of our metapopulation network model is akin to a transport scheme between two population distributions, namely the residential distribution and the workplace distribution, whose interface can be inferred from commuter mobility data made available by the US Census Bureau. We use the proposed metapopulation model to test the dynamics of the spread of COVID-19 on two networks, a smaller one comprising 7 counties in the Greater Cleveland area in Ohio, and a larger one consisting of 74 counties in the Pittsburgh–Cleveland–Detroit corridor following the Lake Erie’s American coastline. The model simulations indicate that densely populated regions effectively act as amplifiers of the infection for the surrounding, less densely populated areas, in agreement with the pattern of infections observed in the course of the COVID-19 pandemic. Computed examples show that the model can be used also to test different mitigation strategies, including one based on state-level travel restrictions, another on county level triggered social distancing, as well as a combination of the two.« less
  4. Abstract The early detection of the coronavirus disease 2019 (COVID-19) outbreak is important to save people’s lives and restart the economy quickly and safely. People’s social behavior, reflected in their mobility data, plays a major role in spreading the disease. Therefore, we used the daily mobility data aggregated at the county level beside COVID-19 statistics and demographic information for short-term forecasting of COVID-19 outbreaks in the United States. The daily data are fed to a deep learning model based on Long Short-Term Memory (LSTM) to predict the accumulated number of COVID-19 cases in the next two weeks. A significant averagemore »correlation was achieved ( r =0.83 ( p = 0.005 )) between the model predicted and actual accumulated cases in the interval from August 1, 2020 until January 22, 2021. The model predictions had r > 0.7 for 87% of the counties across the United States. A lower correlation was reported for the counties with total cases of <1000 during the test interval. The average mean absolute error (MAE) was 605.4 and decreased with a decrease in the total number of cases during the testing interval. The model was able to capture the effect of government responses on COVID-19 cases. Also, it was able to capture the effect of age demographics on the COVID-19 spread. It showed that the average daily cases decreased with a decrease in the retiree percentage and increased with an increase in the young percentage. Lessons learned from this study not only can help with managing the COVID-19 pandemic but also can help with early and effective management of possible future pandemics. The code used for this study was made publicly available on« less
  5. Abstract Background Ensemble modeling aims to boost the forecasting performance by systematically integrating the predictive accuracy across individual models. Here we introduce a simple-yet-powerful ensemble methodology for forecasting the trajectory of dynamic growth processes that are defined by a system of non-linear differential equations with applications to infectious disease spread. Methods We propose and assess the performance of two ensemble modeling schemes with different parametric bootstrapping procedures for trajectory forecasting and uncertainty quantification. Specifically, we conduct sequential probabilistic forecasts to evaluate their forecasting performance using simple dynamical growth models with good track records including the Richards model, the generalized-logistic growthmore »model, and the Gompertz model. We first test and verify the functionality of the method using simulated data from phenomenological models and a mechanistic transmission model. Next, the performance of the method is demonstrated using a diversity of epidemic datasets including scenario outbreak data of the Ebola Forecasting Challenge and real-world epidemic data outbreaks of including influenza, plague, Zika, and COVID-19. Results We found that the ensemble method that randomly selects a model from the set of individual models for each time point of the trajectory of the epidemic frequently outcompeted the individual models as well as an alternative ensemble method based on the weighted combination of the individual models and yields broader and more realistic uncertainty bounds for the trajectory envelope, achieving not only better coverage rate of the 95% prediction interval but also improved mean interval scores across a diversity of epidemic datasets. Conclusion Our new methodology for ensemble forecasting outcompete component models and an alternative ensemble model that differ in how the variance is evaluated for the generation of the prediction intervals of the forecasts.« less