skip to main content


Title: Graph Frequency Analysis of COVID-19 Incidence to Identify County-Level Contagion Patterns in the United States
The COVID-19 pandemic severely changed the way of life in the United States (US). From early scattered regional outbreaks to current country-wide spread, and from rural areas to highly populated cities, the contagion exhibits diverse patterns at various timescales and locations. We thus conduct a graph frequency analysis to inves- tigate the spread patterns of COVID-19 in different US counties. The commute flows between all 3142 US counties were used to construct a graph capturing the population mobility. The numbers of daily confirmed COVID-19 cases per county were collected and represented as graph signals, which were then mapped into the frequency domain via the graph Fourier transform. The concept of graph frequency in Graph Signal Processing (GSP) enables the decomposition of graph signals (i.e., daily confirmed cases) into modes with smooth or rapid variations with respect to the underlying mobility graph. These different modes of variability are shown to relate to COVID-19 spread patterns within and across counties. Changes in the nature of spread within geographical regions are also revealed by graph frequency analysis at finer temporal scales. Overall, our GSP-based approach leverages case count and mobility data to unveil spatio-temporal contagion patterns of COVID-19 incidence for each US county. Results here support the promising prospect of using GSP tools for epidemiology knowledge discovery on graphs.  more » « less
Award ID(s):
1809356 1750428 1934962
NSF-PAR ID:
10243495
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Page Range / eLocation ID:
3230 - 3234
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract This project is funded by the US National Science Foundation (NSF) through their NSF RAPID program under the title “Modeling Corona Spread Using Big Data Analytics.” The project is a joint effort between the Department of Computer & Electrical Engineering and Computer Science at FAU and a research group from LexisNexis Risk Solutions. The novel coronavirus Covid-19 originated in China in early December 2019 and has rapidly spread to many countries around the globe, with the number of confirmed cases increasing every day. Covid-19 is officially a pandemic. It is a novel infection with serious clinical manifestations, including death, and it has reached at least 124 countries and territories. Although the ultimate course and impact of Covid-19 are uncertain, it is not merely possible but likely that the disease will produce enough severe illness to overwhelm the worldwide health care infrastructure. Emerging viral pandemics can place extraordinary and sustained demands on public health and health systems and on providers of essential community services. Modeling the Covid-19 pandemic spread is challenging. But there are data that can be used to project resource demands. Estimates of the reproductive number (R) of SARS-CoV-2 show that at the beginning of the epidemic, each infected person spreads the virus to at least two others, on average (Emanuel et al. in N Engl J Med. 2020, Livingston and Bucher in JAMA 323(14):1335, 2020). A conservatively low estimate is that 5 % of the population could become infected within 3 months. Preliminary data from China and Italy regarding the distribution of case severity and fatality vary widely (Wu and McGoogan in JAMA 323(13):1239–42, 2020). A recent large-scale analysis from China suggests that 80 % of those infected either are asymptomatic or have mild symptoms; a finding that implies that demand for advanced medical services might apply to only 20 % of the total infected. Of patients infected with Covid-19, about 15 % have severe illness and 5 % have critical illness (Emanuel et al. in N Engl J Med. 2020). Overall, mortality ranges from 0.25 % to as high as 3.0 % (Emanuel et al. in N Engl J Med. 2020, Wilson et al. in Emerg Infect Dis 26(6):1339, 2020). Case fatality rates are much higher for vulnerable populations, such as persons over the age of 80 years (> 14 %) and those with coexisting conditions (10 % for those with cardiovascular disease and 7 % for those with diabetes) (Emanuel et al. in N Engl J Med. 2020). Overall, Covid-19 is substantially deadlier than seasonal influenza, which has a mortality of roughly 0.1 %. Public health efforts depend heavily on predicting how diseases such as those caused by Covid-19 spread across the globe. During the early days of a new outbreak, when reliable data are still scarce, researchers turn to mathematical models that can predict where people who could be infected are going and how likely they are to bring the disease with them. These computational methods use known statistical equations that calculate the probability of individuals transmitting the illness. Modern computational power allows these models to quickly incorporate multiple inputs, such as a given disease’s ability to pass from person to person and the movement patterns of potentially infected people traveling by air and land. This process sometimes involves making assumptions about unknown factors, such as an individual’s exact travel pattern. By plugging in different possible versions of each input, however, researchers can update the models as new information becomes available and compare their results to observed patterns for the illness. In this paper we describe the development a model of Corona spread by using innovative big data analytics techniques and tools. We leveraged our experience from research in modeling Ebola spread (Shaw et al. Modeling Ebola Spread and Using HPCC/KEL System. In: Big Data Technologies and Applications 2016 (pp. 347-385). Springer, Cham) to successfully model Corona spread, we will obtain new results, and help in reducing the number of Corona patients. We closely collaborated with LexisNexis, which is a leading US data analytics company and a member of our NSF I/UCRC for Advanced Knowledge Enablement. The lack of a comprehensive view and informative analysis of the status of the pandemic can also cause panic and instability within society. Our work proposes the HPCC Systems Covid-19 tracker, which provides a multi-level view of the pandemic with the informative virus spreading indicators in a timely manner. The system embeds a classical epidemiological model known as SIR and spreading indicators based on causal model. The data solution of the tracker is built on top of the Big Data processing platform HPCC Systems, from ingesting and tracking of various data sources to fast delivery of the data to the public. The HPCC Systems Covid-19 tracker presents the Covid-19 data on a daily, weekly, and cumulative basis up to global-level and down to the county-level. It also provides statistical analysis for each level such as new cases per 100,000 population. The primary analysis such as Contagion Risk and Infection State is based on causal model with a seven-day sliding window. Our work has been released as a publicly available website to the world and attracted a great volume of traffic. The project is open-sourced and available on GitHub. The system was developed on the LexisNexis HPCC Systems, which is briefly described in the paper. 
    more » « less
  2. null (Ed.)
    Abstract The objective of this study is to examine the transmission risk of COVID-19 based on cross-county population co-location data from Facebook. The rapid spread of COVID-19 in the United States has imposed a major threat to public health, the real economy, and human well-being. With the absence of effective vaccines, the preventive actions of social distancing, travel reduction and stay-at-home orders are recognized as essential non-pharmacologic approaches to control the infection and spatial spread of COVID-19. Prior studies demonstrated that human movement and mobility drove the spatiotemporal distribution of COVID-19 in China. Little is known, however, about the patterns and effects of co-location reduction on cross-county transmission risk of COVID-19. This study utilizes Facebook co-location data for all counties in the United States from March to early May 2020 for conducting spatial network analysis where nodes represent counties and edge weights are associated with the co-location probability of populations of the counties. The analysis examines the synchronicity and time lag between travel reduction and pandemic growth trajectory to evaluate the efficacy of social distancing in ceasing the population co-location probabilities, and subsequently the growth in weekly new cases across counties. The results show that the mitigation effects of co-location reduction appear in the growth of weekly new confirmed cases with one week of delay. The analysis categorizes counties based on the number of confirmed COVID-19 cases and examines co-location patterns within and across groups. Significant segregation is found among different county groups. The results suggest that within-group co-location probabilities (e.g., co-location probabilities among counties with high numbers of cases) remain stable, and social distancing policies primarily resulted in reduced cross-group co-location probabilities (due to travel reduction from counties with large number of cases to counties with low numbers of cases). These findings could have important practical implications for local governments to inform their intervention measures for monitoring and reducing the spread of COVID-19, as well as for adoption in future pandemics. Public policy, economic forecasting, and epidemic modeling need to account for population co-location patterns in evaluating transmission risk of COVID-19 across counties. 
    more » « less
  3. null (Ed.)
    Background Population mobility is closely associated with COVID-19 transmission, and it could be used as a proximal indicator to predict future outbreaks, which could inform proactive nonpharmaceutical interventions for disease control. South Carolina is one of the US states that reopened early, following which it experienced a sharp increase in COVID-19 cases. Objective The aims of this study are to examine the spatial-temporal relationship between population mobility and COVID-19 outbreaks and use population mobility data to predict daily new cases at both the state and county level in South Carolina. Methods This longitudinal study used disease surveillance data and Twitter-based population mobility data from March 6 to November 11, 2020, in South Carolina and its five counties with the largest number of cumulative confirmed COVID-19 cases. Population mobility was assessed based on the number of Twitter users with a travel distance greater than 0.5 miles. A Poisson count time series model was employed for COVID-19 forecasting. Results Population mobility was positively associated with state-level daily COVID-19 incidence as well as incidence in the top five counties (ie, Charleston, Greenville, Horry, Spartanburg, and Richland). At the state level, the final model with a time window within the last 7 days had the smallest prediction error, and the prediction accuracy was as high as 98.7%, 90.9%, and 81.6% for the next 3, 7, and 14 days, respectively. Among Charleston, Greenville, Horry, Spartanburg, and Richland counties, the best predictive models were established based on their observations in the last 9, 14, 28, 20, and 9 days, respectively. The 14-day prediction accuracy ranged from 60.3%-74.5%. Conclusions Using Twitter-based population mobility data could provide acceptable predictions of COVID-19 daily new cases at both the state and county level in South Carolina. Population mobility measured via social media data could inform proactive measures and resource relocations to curb disease outbreaks and their negative influences. 
    more » « less
  4. Deep Learning for Time-series plays a key role in AI for healthcare. To predict the progress of infectious disease outbreaks and demonstrate clear population-level impact, more granular analyses are urgently needed that control for important and potentially confounding county-level socioeconomic and health factors. We forecast US county-level COVID-19 infections using the Temporal Fusion Transformer (TFT). We focus on heterogeneous time-series deep learning model prediction while interpreting the complex spatiotemporal features learned from the data. The significance of the work is grounded in a real-world COVID-19 infection prediction with highly non-stationary, finely granular, and heterogeneous data. 1) Our model can capture the detailed daily changes of temporal and spatial model behaviors and achieves better prediction performance compared to other time-series models. 2) We analyzed the attention patterns from TFT to interpret the temporal and spatial patterns learned by the model. 3) We collected around 2.5 years of socioeconomic and health features for 3142 US counties, such as observed cases, and a number of static (age distribution and health disparity) and dynamic features (vaccination, disease spread, transmissible cases, and social distancing). Using the proposed framework, we have shown that our model can learn complex interactions. Interpreting different impacts at the county level would be crucial for understanding the infection process that can help effective public health decision-making. 
    more » « less
  5. Deep Learning for Time-series plays a key role in AI for healthcare. To predict the progress of infectious disease outbreaks and demonstrate clear population-level impact, more granular analyses are urgently needed that control for important and potentially confounding county-level socioeconomic and health factors. We forecast US county-level COVID-19 infections using the Temporal Fusion Transformer (TFT). We focus on heterogeneous time-series deep learning model prediction while interpreting the complex spatiotemporal features learned from the data. The significance of the work is grounded in a real-world COVID-19 infection prediction with highly non-stationary, finely granular, and heterogeneous data. 1) Our model can capture the detailed daily changes of temporal and spatial model behaviors and achieves better prediction performance compared to other time-series models. 2) We analyzed the attention patterns from TFT to interpret the temporal and spatial patterns learned by the model. 3) We collected around 2.5 years of socioeconomic and health features for 3142 US counties, such as observed cases, and a number of static (age distribution and health disparity) and dynamic features (vaccination, disease spread, transmissible cases, and social distancing). Using the proposed framework, we have shown that our model can learn complex interactions. Interpreting different impacts at the county level would be crucial for understanding the infection process that can help effective public health decision-making. 
    more » « less