skip to main content


Title: The forecast of COVID-19 spread risk at the county level
Abstract The early detection of the coronavirus disease 2019 (COVID-19) outbreak is important to save people’s lives and restart the economy quickly and safely. People’s social behavior, reflected in their mobility data, plays a major role in spreading the disease. Therefore, we used the daily mobility data aggregated at the county level beside COVID-19 statistics and demographic information for short-term forecasting of COVID-19 outbreaks in the United States. The daily data are fed to a deep learning model based on Long Short-Term Memory (LSTM) to predict the accumulated number of COVID-19 cases in the next two weeks. A significant average correlation was achieved ( r =0.83 ( p = 0.005 )) between the model predicted and actual accumulated cases in the interval from August 1, 2020 until January 22, 2021. The model predictions had r > 0.7 for 87% of the counties across the United States. A lower correlation was reported for the counties with total cases of <1000 during the test interval. The average mean absolute error (MAE) was 605.4 and decreased with a decrease in the total number of cases during the testing interval. The model was able to capture the effect of government responses on COVID-19 cases. Also, it was able to capture the effect of age demographics on the COVID-19 spread. It showed that the average daily cases decreased with a decrease in the retiree percentage and increased with an increase in the young percentage. Lessons learned from this study not only can help with managing the COVID-19 pandemic but also can help with early and effective management of possible future pandemics. The code used for this study was made publicly available on https://github.com/Murtadha44/covid-19-spread-risk.  more » « less
Award ID(s):
1936586
NSF-PAR ID:
10343168
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Journal of Big Data
Volume:
8
Issue:
1
ISSN:
2196-1115
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The spread of pandemics such as COVID-19 is strongly linked to human activities. The objective of this article is to specify and examine early indicators of disease spread risk in cities during the initial stages of outbreak based on patterns of human activities obtained from digital trace data. In this study, the Venables distance ( D v ) and the activity density ( D a ) are used to quantify and evaluate human activities for 193 United States counties, whose cumulative number of confirmed cases was greater than 100 as of March 31, 2020. Venables distance provides a measure of the agglomeration of the level of human activities based on the average distance of human activities across a city or a county (less distance could lead to a greater contact risk). Activity density provides a measure of level of overall activity level in a county or a city (more activity could lead to a greater risk). Accordingly, Pearson correlation analysis is used to examine the relationship between the two human activity indicators and the basic reproduction number in the following weeks. The results show statistically significant correlations between the indicators of human activities and the basic reproduction number in all counties, as well as a significant leader-follower relationship (time lag) between them. The results also show one to two weeks’ lag between the change in activity indicators and the decrease in the basic reproduction number. This result implies that the human activity indicators provide effective early indicators for the spread risk of the pandemic during the early stages of the outbreak. Hence, the results could be used by the authorities to proactively assess the risk of disease spread by monitoring the daily Venables distance and activity density in a proactive manner. 
    more » « less
  2. Abstract During the Coronavirus Disease 2019 (COVID-19) epidemic, many health professionals used social media to promote preventative health behaviors. We conducted a randomized controlled trial of the effect of a Facebook advertising campaign consisting of short videos recorded by doctors and nurses to encourage users to stay at home for the Thanksgiving and Christmas holidays ( NCT04644328 and AEARCTR-0006821 ). We randomly assigned counties to high intensity ( n  = 410 (386) at Thanksgiving (Christmas)) or low intensity ( n  = 410 (381)). The intervention was delivered to a large fraction of Facebook subscribers in 75% and 25% of randomly assigned zip codes in high- and low-intensity counties, respectively. In total, 6,998 (6,716) zip codes were included, and 11,954,109 (23,302,290) users were reached at Thanksgiving (Christmas). The first two primary outcomes were holiday travel and fraction leaving home, both measured using mobile phone location data of Facebook users. Average distance traveled in high-intensity counties decreased by −0.993 percentage points (95% confidence interval (CI): –1.616, −0.371; P = 0.002) for the 3 days before each holiday compared to low-intensity counties. The fraction of people who left home on the holiday was not significantly affected (adjusted difference: 0.030; 95% CI: −0.361, 0.420; P = 0.881). The third primary outcome was COVID-19 infections recorded at the zip code level in the 2-week period starting 5 days after the holiday. Infections declined by 3.5% (adjusted 95% CI: −6.2%, −0.7%; P = 0.013) in intervention compared to control zip codes. Social media messages recorded by health professionals before the winter holidays in the United States led to a significant reduction in holiday travel and subsequent COVID-19 infections. 
    more » « less
  3. Abstract This project is funded by the US National Science Foundation (NSF) through their NSF RAPID program under the title “Modeling Corona Spread Using Big Data Analytics.” The project is a joint effort between the Department of Computer & Electrical Engineering and Computer Science at FAU and a research group from LexisNexis Risk Solutions. The novel coronavirus Covid-19 originated in China in early December 2019 and has rapidly spread to many countries around the globe, with the number of confirmed cases increasing every day. Covid-19 is officially a pandemic. It is a novel infection with serious clinical manifestations, including death, and it has reached at least 124 countries and territories. Although the ultimate course and impact of Covid-19 are uncertain, it is not merely possible but likely that the disease will produce enough severe illness to overwhelm the worldwide health care infrastructure. Emerging viral pandemics can place extraordinary and sustained demands on public health and health systems and on providers of essential community services. Modeling the Covid-19 pandemic spread is challenging. But there are data that can be used to project resource demands. Estimates of the reproductive number (R) of SARS-CoV-2 show that at the beginning of the epidemic, each infected person spreads the virus to at least two others, on average (Emanuel et al. in N Engl J Med. 2020, Livingston and Bucher in JAMA 323(14):1335, 2020). A conservatively low estimate is that 5 % of the population could become infected within 3 months. Preliminary data from China and Italy regarding the distribution of case severity and fatality vary widely (Wu and McGoogan in JAMA 323(13):1239–42, 2020). A recent large-scale analysis from China suggests that 80 % of those infected either are asymptomatic or have mild symptoms; a finding that implies that demand for advanced medical services might apply to only 20 % of the total infected. Of patients infected with Covid-19, about 15 % have severe illness and 5 % have critical illness (Emanuel et al. in N Engl J Med. 2020). Overall, mortality ranges from 0.25 % to as high as 3.0 % (Emanuel et al. in N Engl J Med. 2020, Wilson et al. in Emerg Infect Dis 26(6):1339, 2020). Case fatality rates are much higher for vulnerable populations, such as persons over the age of 80 years (> 14 %) and those with coexisting conditions (10 % for those with cardiovascular disease and 7 % for those with diabetes) (Emanuel et al. in N Engl J Med. 2020). Overall, Covid-19 is substantially deadlier than seasonal influenza, which has a mortality of roughly 0.1 %. Public health efforts depend heavily on predicting how diseases such as those caused by Covid-19 spread across the globe. During the early days of a new outbreak, when reliable data are still scarce, researchers turn to mathematical models that can predict where people who could be infected are going and how likely they are to bring the disease with them. These computational methods use known statistical equations that calculate the probability of individuals transmitting the illness. Modern computational power allows these models to quickly incorporate multiple inputs, such as a given disease’s ability to pass from person to person and the movement patterns of potentially infected people traveling by air and land. This process sometimes involves making assumptions about unknown factors, such as an individual’s exact travel pattern. By plugging in different possible versions of each input, however, researchers can update the models as new information becomes available and compare their results to observed patterns for the illness. In this paper we describe the development a model of Corona spread by using innovative big data analytics techniques and tools. We leveraged our experience from research in modeling Ebola spread (Shaw et al. Modeling Ebola Spread and Using HPCC/KEL System. In: Big Data Technologies and Applications 2016 (pp. 347-385). Springer, Cham) to successfully model Corona spread, we will obtain new results, and help in reducing the number of Corona patients. We closely collaborated with LexisNexis, which is a leading US data analytics company and a member of our NSF I/UCRC for Advanced Knowledge Enablement. The lack of a comprehensive view and informative analysis of the status of the pandemic can also cause panic and instability within society. Our work proposes the HPCC Systems Covid-19 tracker, which provides a multi-level view of the pandemic with the informative virus spreading indicators in a timely manner. The system embeds a classical epidemiological model known as SIR and spreading indicators based on causal model. The data solution of the tracker is built on top of the Big Data processing platform HPCC Systems, from ingesting and tracking of various data sources to fast delivery of the data to the public. The HPCC Systems Covid-19 tracker presents the Covid-19 data on a daily, weekly, and cumulative basis up to global-level and down to the county-level. It also provides statistical analysis for each level such as new cases per 100,000 population. The primary analysis such as Contagion Risk and Infection State is based on causal model with a seven-day sliding window. Our work has been released as a publicly available website to the world and attracted a great volume of traffic. The project is open-sourced and available on GitHub. The system was developed on the LexisNexis HPCC Systems, which is briefly described in the paper. 
    more » « less
  4. null (Ed.)
    The COVID-19 pandemic severely changed the way of life in the United States (US). From early scattered regional outbreaks to current country-wide spread, and from rural areas to highly populated cities, the contagion exhibits diverse patterns at various timescales and locations. We thus conduct a graph frequency analysis to inves- tigate the spread patterns of COVID-19 in different US counties. The commute flows between all 3142 US counties were used to construct a graph capturing the population mobility. The numbers of daily confirmed COVID-19 cases per county were collected and represented as graph signals, which were then mapped into the frequency domain via the graph Fourier transform. The concept of graph frequency in Graph Signal Processing (GSP) enables the decomposition of graph signals (i.e., daily confirmed cases) into modes with smooth or rapid variations with respect to the underlying mobility graph. These different modes of variability are shown to relate to COVID-19 spread patterns within and across counties. Changes in the nature of spread within geographical regions are also revealed by graph frequency analysis at finer temporal scales. Overall, our GSP-based approach leverages case count and mobility data to unveil spatio-temporal contagion patterns of COVID-19 incidence for each US county. Results here support the promising prospect of using GSP tools for epidemiology knowledge discovery on graphs. 
    more » « less
  5. Abstract

    Since the first case of the novel coronavirus disease (COVID-19) was confirmed in Wuhan, China, social distancing has been promoted worldwide, including in the United States, as a major community mitigation strategy. However, our understanding remains limited in how people would react to such control measures, as well as how people would resume their normal behaviours when those orders were relaxed. We utilize an integrated dataset of real-time mobile device location data involving 100 million devices in the contiguous United States (plus Alaska and Hawaii) from February 2, 2020 to May 30, 2020. Built upon the common human mobility metrics, we construct a Social Distancing Index (SDI) to evaluate people’s mobility pattern changes along with the spread of COVID-19 at different geographic levels. We find that both government orders and local outbreak severity significantly contribute to the strength of social distancing. As people tend to practice less social distancing immediately after they observe a sign of local mitigation, we identify several states and counties with higher risks of continuous community transmission and a second outbreak. Our proposed index could help policymakers and researchers monitor people’s real-time mobility behaviours, understand the influence of government orders, and evaluate the risk of local outbreaks.

     
    more » « less