skip to main content

Title: Modelling the COVID-19 Infection Trajectory: A Piecewise Linear Quantile Trend Model
Abstract

We propose a piecewise linear quantile trend model to analyse the trajectory of the COVID-19 daily new cases (i.e. the infection curve) simultaneously across multiple quantiles. The model is intuitive, interpretable and naturally captures the phase transitions of the epidemic growth rate via change-points. Unlike the mean trend model and least squares estimation, our quantile-based approach is robust to outliers, captures heteroscedasticity (commonly exhibited by COVID-19 infection curves) and automatically delivers both point and interval forecasts with minimal assumptions. Building on a self-normalized (SN) test statistic, this paper proposes a novel segmentation algorithm for multiple change-point estimation. Theoretical guarantees such as segmentation consistency are established under mild and verifiable assumptions. Using the proposed method, we analyse the COVID-19 infection curves in 35 major countries and discover patterns with potentially relevant implications for effectiveness of the pandemic responses by different countries. A simple change-adaptive two-stage forecasting scheme is further designed to generate short-term prediction of COVID-19 cumulative new cases and is shown to deliver accurate forecast valuable to public health decision-making.

Authors:
; ;
Publication Date:
NSF-PAR ID:
10400999
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Volume:
84
Issue:
5
Page Range or eLocation-ID:
p. 1589-1607
ISSN:
1369-7412
Publisher:
Oxford University Press
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract This project is funded by the US National Science Foundation (NSF) through their NSF RAPID program under the title “Modeling Corona Spread Using Big Data Analytics.” The project is a joint effort between the Department of Computer & Electrical Engineering and Computer Science at FAU and a research group from LexisNexis Risk Solutions. The novel coronavirus Covid-19 originated in China in early December 2019 and has rapidly spread to many countries around the globe, with the number of confirmed cases increasing every day. Covid-19 is officially a pandemic. It is a novel infection with serious clinical manifestations, including death, and it has reached at least 124 countries and territories. Although the ultimate course and impact of Covid-19 are uncertain, it is not merely possible but likely that the disease will produce enough severe illness to overwhelm the worldwide health care infrastructure. Emerging viral pandemics can place extraordinary and sustained demands on public health and health systems and on providers of essential community services. Modeling the Covid-19 pandemic spread is challenging. But there are data that can be used to project resource demands. Estimates of the reproductive number (R) of SARS-CoV-2 show that at the beginning of the epidemic, each infectedmore »person spreads the virus to at least two others, on average (Emanuel et al. in N Engl J Med. 2020, Livingston and Bucher in JAMA 323(14):1335, 2020). A conservatively low estimate is that 5 % of the population could become infected within 3 months. Preliminary data from China and Italy regarding the distribution of case severity and fatality vary widely (Wu and McGoogan in JAMA 323(13):1239–42, 2020). A recent large-scale analysis from China suggests that 80 % of those infected either are asymptomatic or have mild symptoms; a finding that implies that demand for advanced medical services might apply to only 20 % of the total infected. Of patients infected with Covid-19, about 15 % have severe illness and 5 % have critical illness (Emanuel et al. in N Engl J Med. 2020). Overall, mortality ranges from 0.25 % to as high as 3.0 % (Emanuel et al. in N Engl J Med. 2020, Wilson et al. in Emerg Infect Dis 26(6):1339, 2020). Case fatality rates are much higher for vulnerable populations, such as persons over the age of 80 years (> 14 %) and those with coexisting conditions (10 % for those with cardiovascular disease and 7 % for those with diabetes) (Emanuel et al. in N Engl J Med. 2020). Overall, Covid-19 is substantially deadlier than seasonal influenza, which has a mortality of roughly 0.1 %. Public health efforts depend heavily on predicting how diseases such as those caused by Covid-19 spread across the globe. During the early days of a new outbreak, when reliable data are still scarce, researchers turn to mathematical models that can predict where people who could be infected are going and how likely they are to bring the disease with them. These computational methods use known statistical equations that calculate the probability of individuals transmitting the illness. Modern computational power allows these models to quickly incorporate multiple inputs, such as a given disease’s ability to pass from person to person and the movement patterns of potentially infected people traveling by air and land. This process sometimes involves making assumptions about unknown factors, such as an individual’s exact travel pattern. By plugging in different possible versions of each input, however, researchers can update the models as new information becomes available and compare their results to observed patterns for the illness. In this paper we describe the development a model of Corona spread by using innovative big data analytics techniques and tools. We leveraged our experience from research in modeling Ebola spread (Shaw et al. Modeling Ebola Spread and Using HPCC/KEL System. In: Big Data Technologies and Applications 2016 (pp. 347-385). Springer, Cham) to successfully model Corona spread, we will obtain new results, and help in reducing the number of Corona patients. We closely collaborated with LexisNexis, which is a leading US data analytics company and a member of our NSF I/UCRC for Advanced Knowledge Enablement. The lack of a comprehensive view and informative analysis of the status of the pandemic can also cause panic and instability within society. Our work proposes the HPCC Systems Covid-19 tracker, which provides a multi-level view of the pandemic with the informative virus spreading indicators in a timely manner. The system embeds a classical epidemiological model known as SIR and spreading indicators based on causal model. The data solution of the tracker is built on top of the Big Data processing platform HPCC Systems, from ingesting and tracking of various data sources to fast delivery of the data to the public. The HPCC Systems Covid-19 tracker presents the Covid-19 data on a daily, weekly, and cumulative basis up to global-level and down to the county-level. It also provides statistical analysis for each level such as new cases per 100,000 population. The primary analysis such as Contagion Risk and Infection State is based on causal model with a seven-day sliding window. Our work has been released as a publicly available website to the world and attracted a great volume of traffic. The project is open-sourced and available on GitHub. The system was developed on the LexisNexis HPCC Systems, which is briefly described in the paper.« less
  2. This work quanti es mobility changes observed during the di erent phases of the pandemic world-wide at multiple resolutions { county, state, country { using an anonymized aggregate mobility map that captures population ows between geographic cells of size 5 km2. As we overlay the global mobility map with epidemic incidence curves and dates of government interventions, we observe that as case counts rose, mobility fell and has since then seen a slow but steady increase in ows. Further, in order to understand mixing within a region, we propose a new metric to quantify the e ect of social distancing on the basis of mobility.Taking two very di erent countries sampled from the global spectrum, We analyze in detail the mobility patterns of the United States (US) and India. We then carry out a counterfactual analysis of delaying the lockdown and show that a one week delay would have doubled the reported number of cases in the US and India. Finally, we quantify the e ect of college students returning back to school for the fall semester on COVID-19 dynamics in the surrounding community. We employ the data from a recent university outbreak (reported on August 16, 2020) to infermore »possible Re values and mobility ows combined with daily prevalence data and census data to obtain an estimate of new cases that might arrive on a college campus. We nd that maintaining social distancing at existing levels would be e ective in mitigating the extra seeding of cases. However, potential behavioral change and increased social interaction amongst students (30% increase in Re ) along with extra seeding can increase the number of cases by 20% over a period of one month in the encompassing county. To our knowledge, this work is the rst to model in near real-time, the interplay of human mobility, epidemic dynamics and public policies across multiple spatial resolutions and at a global scale.« less
  3. COVID-19 is a respiratory disease caused by a recently discovered, novel coronavirus, SARS-COV-2. The disease has led to over 81 million confirmed cases of COVID-19, with close to two million deaths. In the current social climate, the risk of COVID-19 infection is driven by individual and public perception of risk and sentiments. A number of factors influences public perception, including an individual’s belief system, prior knowledge about a disease and information about a disease. In this article, we develop a model for COVID-19 using a system of ordinary differential equations following the natural history of the infection. The model uniquely incorporates social behavioral aspects such as quarantine and quarantine violation. The model is further driven by people’s sentiments (positive and negative) which accounts for the influence of disinformation. People’s sentiments were obtained by parsing through and analyzing COVID-19 related tweets from Twitter, a social media platform across six countries. Our results show that our model incorporating public sentiments is able to capture the trend in the trajectory of the epidemic curve of the reported cases. Furthermore, our results show that positive public sentiments reduce disease burden in the community. Our results also show that quarantine violation and early discharge ofmore »the infected population amplifies the disease burden on the community. Hence, it is important to account for public sentiment and individual social behavior in epidemic models developed to study diseases like COVID-19.« less
  4. Abstract

    The spatial distribution of population affects disease transmission, especially when shelter in place orders restrict mobility for a large fraction of the population. The spatial network structure of settlements therefore imposes a fundamental constraint on the spatial distribution of the population through which a communicable disease can spread. In this analysis we use the spatial network structure of lighted development as a proxy for the distribution of ambient population to compare the spatiotemporal evolution of COVID-19 confirmed cases in the USA and China. The Visible Infrared Imaging Radiometer Suite (VIIRS) Day/Night Band sensor on the NASA/NOAA Suomi satellite has been imaging night light at ~ 700 m resolution globally since 2012. Comparisons with sub-kilometer resolution census observations in different countries across different levels of development indicate that night light luminance scales with population density over ~ 3 orders of magnitude. However, VIIRS’ constant ~ 700 m resolution can provide a more detailed representation of population distribution in peri-urban and rural areas where aggregated census blocks lack comparable spatial detail. By varying the low luminance threshold of VIIRS-derived night light, we depict spatial networks of lighted development of varying degrees of connectivity within which populations are distributed. The resulting size distributions of spatial network componentsmore »(connected clusters of nodes) vary with degree of connectivity, but maintain consistent scaling over a wide range (5 × to 10 × in area & number) of network sizes. At continental scales, spatial network rank-size distributions obtained from VIIRS night light brightness are well-described by power laws with exponents near −2 (slopes near −1) for a wide range of low luminance thresholds. The largest components (104to 105km2) represent spatially contiguous agglomerations of urban, suburban and periurban development, while the smallest components represent isolated rural settlements. Projecting county and city-level numbers of confirmed cases of COVID-19 for the USA and China (respectively) onto the corresponding spatial networks of lighted development allows the spatiotemporal evolution of the epidemic (infection and detection) to be quantified as propagation within networks of varying connectivity. Results for China show rapid nucleation and diffusion in January 2020 followed by rapid decreases in new cases in February. While most of the largest cities in China showed new confirmed cases approaching zero before the end of February, most of these cities also showed distinct second waves of cases in March or April. Whereas new cases in Wuhan did not approach zero until mid-March, as of December 2020 it has not yet experienced a second wave of cases. In contrast, the results for the USA show a wide range of trajectories, with an abrupt transition from slow increases in confirmed cases in a small number of network components in January and February, to rapid geographic dispersion to a larger number of components shortly before mobility reductions occurred in March. Results indicate that while most of the upper tail of the network had been exposed by the end of March, the lower tail of the component size distribution has only shown steep increases since mid-June.

    « less
  5. Abstract

    Most COVID-19 studies commonly report figures of the overall infection at a state- or county-level. This aggregation tends to miss out on fine details of virus propagation. In this paper, we analyze a high-resolution COVID-19 dataset in Cali, Colombia, that records the precise time and location of every confirmed case. We develop a non-stationary spatio-temporal point process equipped with a neural network-based kernel to capture the heterogeneous correlations among COVID-19 cases. The kernel is carefully crafted to enhance expressiveness while maintaining model interpretability. We also incorporate some exogenous influences imposed by city landmarks. Our approach outperforms the state-of-the-art in forecasting new COVID-19 cases with the capability to offer vital insights into the spatio-temporal interaction between individuals concerning the disease spread in a metropolis.