skip to main content

This content will become publicly available on June 10, 2022

Title: Evaluating the utility of high-resolution proximity metrics in predicting the spread of COVID-19
High resolution mobility datasets have become increasingly available in the past few years and have enabled detailed models for infectious disease spread including those for COVID-19. However, there are open questions on how such a mobility data can be used effectively within epidemic models and for which tasks they are best suited. In this paper, we extract a number of graph-based proximity metrics from high resolution cellphone trace data from X-Mode and use it to study COVID-19 epidemic spread in 50 land grant university counties in the US. We present an approach to estimate the effect of mobility on cases by fitting an ODE based model and performing multivariate linear regression to explain the estimated time varying transmissibility. We find that, while mobility plays a significant role, the contribution is heterogeneous across the counties, as exemplified by a subsequent correlation analysis. We subsequently evaluate the metrics’ utility for case surge prediction defined as a supervised classification problem, and show that the learnt model can predict surges with 95% accuracy and 87% F1-score.
; ; ; ;
Award ID(s):
1633028 1916805 1918656 2028004 2027541
Publication Date:
Journal Name:
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The objective of this study is to examine the transmission risk of COVID-19 based on cross-county population co-location data from Facebook. The rapid spread of COVID-19 in the United States has imposed a major threat to public health, the real economy, and human well-being. With the absence of effective vaccines, the preventive actions of social distancing, travel reduction and stay-at-home orders are recognized as essential non-pharmacologic approaches to control the infection and spatial spread of COVID-19. Prior studies demonstrated that human movement and mobility drove the spatiotemporal distribution of COVID-19 in China. Little is known, however, about the patternsmore »and effects of co-location reduction on cross-county transmission risk of COVID-19. This study utilizes Facebook co-location data for all counties in the United States from March to early May 2020 for conducting spatial network analysis where nodes represent counties and edge weights are associated with the co-location probability of populations of the counties. The analysis examines the synchronicity and time lag between travel reduction and pandemic growth trajectory to evaluate the efficacy of social distancing in ceasing the population co-location probabilities, and subsequently the growth in weekly new cases across counties. The results show that the mitigation effects of co-location reduction appear in the growth of weekly new confirmed cases with one week of delay. The analysis categorizes counties based on the number of confirmed COVID-19 cases and examines co-location patterns within and across groups. Significant segregation is found among different county groups. The results suggest that within-group co-location probabilities (e.g., co-location probabilities among counties with high numbers of cases) remain stable, and social distancing policies primarily resulted in reduced cross-group co-location probabilities (due to travel reduction from counties with large number of cases to counties with low numbers of cases). These findings could have important practical implications for local governments to inform their intervention measures for monitoring and reducing the spread of COVID-19, as well as for adoption in future pandemics. Public policy, economic forecasting, and epidemic modeling need to account for population co-location patterns in evaluating transmission risk of COVID-19 across counties.« less
  2. Reopening of colleges and universities for the Fall semester of 2020 across the United States has caused signi ficant COVID-19 case spikes, requiring reactive responses such as temporary closures and switching to online learning. Until sufficient levels of immunity are reached through vaccination, Institutions of Higher Education will need to balance academic operations with COVID-19 spread risk within and outside the student community. In this work, we study the impact of proximity statistics obtained from high resolution mobility traces in predicting case rate surges in university counties. We focus on 50 land-grant university counties (LGUCs) across the country and showmore »high correlation (PCC > 0.6) between proximity statistics and COVID-19 case rates for several LGUCs during the period around Fall 2020 reopenings. These observations provide a lead time of up to 3 weeks in preparing resources and planning containment efforts. We also show how features such as total population, population affiliated with university, median income and case rate intensity could explain some of the observed high correlation. We believe these easily explainable mobility metrics along with other disease surveillance indicators can help universities be better prepared for the Spring 2021 semester.« less
  3. The COVID-19 pandemic is a global threat presenting health, economic, and social challenges that continue to escalate. Metapopulation epidemic modeling studies in the susceptible–exposed–infectious–removed (SEIR) style have played important roles in informing public health policy making to mitigate the spread of COVID-19. These models typically rely on a key assumption on the homogeneity of the population. This assumption certainly cannot be expected to hold true in real situations; various geographic, socioeconomic, and cultural environments affect the behaviors that drive the spread of COVID-19 in different communities. What’s more, variation of intracounty environments creates spatial heterogeneity of transmission in different regions.more »To address this issue, we develop a human mobility flow-augmented stochastic SEIR-style epidemic modeling framework with the ability to distinguish different regions and their corresponding behaviors. This modeling framework is then combined with data assimilation and machine learning techniques to reconstruct the historical growth trajectories of COVID-19 confirmed cases in two counties in Wisconsin. The associations between the spread of COVID-19 and business foot traffic, race and ethnicity, and age structure are then investigated. The results reveal that, in a college town (Dane County), the most important heterogeneity is age structure, while, in a large city area (Milwaukee County), racial and ethnic heterogeneity becomes more apparent. Scenario studies further indicate a strong response of the spread rate to various reopening policies, which suggests that policy makers may need to take these heterogeneities into account very carefully when designing policies for mitigating the ongoing spread of COVID-19 and reopening.

    « less
  4. The COVID-19 pandemic represents the most significant public health disaster since the 1918 influenza pandemic. During pandemics such as COVID-19, timely and reliable spatiotemporal forecasting of epidemic dynamics is crucial. Deep learning-based time series models for forecasting have recently gained popularity and have been successfully used for epidemic forecasting. Here we focus on the design and analysis of deep learning-based models for COVID-19 forecasting. We implement multiple recurrent neural network-based deep learning models and combine them using the stacking ensemble technique. In order to incorporate the effects of multiple factors in COVID-19 spread, we consider multiple sources such as COVID-19more »confirmed and death case count data and testing data for better predictions. To overcome the sparsity of training data and to address the dynamic correlation of the disease, we propose clustering-based training for high-resolution forecasting. The methods help us to identify the similar trends of certain groups of regions due to various spatio-temporal effects. We examine the proposed method for forecasting weekly COVID-19 new confirmed cases at county-, state-, and country-level. A comprehensive comparison between different time series models in COVID-19 context is conducted and analyzed. The results show that simple deep learning models can achieve comparable or better performance when compared with more complicated models. We are currently integrating our methods as a part of our weekly forecasts that we provide state and federal authorities.« less
  5. Abstract The objective of this study was to investigate the importance of multiple county-level features in the trajectory of COVID-19. We examined feature importance across 2787 counties in the United States using data-driven machine learning models. Existing mathematical models of disease spread usually focused on the case prediction with different infection rates without incorporating multiple heterogeneous features that could impact the spatial and temporal trajectory of COVID-19. Recognizing this, we trained a data-driven model using 23 features representing six key influencing factors affecting the pandemic spread: social demographics of counties, population activities, mobility within the counties, movement across counties, diseasemore »attributes, and social network structure. Also, we categorized counties into multiple groups according to their population densities, and we divided the trajectory of COVID-19 into three stages: the outbreak stage, the social distancing stage, and the reopening stage. The study aimed to answer two research questions: (1) The extent to which the importance of heterogeneous features evolved at different stages; (2) The extent to which the importance of heterogeneous features varied across counties with different characteristics. We fitted a set of random forest models to determine weekly feature importance. The results showed that: (1) Social demographic features, such as gross domestic product, population density, and minority status maintained high-importance features throughout stages of COVID-19 across 2787 studied counties; (2) Within-county mobility features had the highest importance in counties with higher population densities; (3) The feature reflecting the social network structure (Facebook, social connectedness index), had higher importance for counties with higher population densities. The results showed that the data-driven machine learning models could provide important insights to inform policymakers regarding feature importance for counties with various population densities and at different stages of a pandemic life cycle.« less