skip to main content


Title: Using test positivity and reported case rates to estimate state-level COVID-19 prevalence and seroprevalence in the United States
Accurate estimates of infection prevalence and seroprevalence are essential for evaluating and informing public health responses and vaccination coverage needed to address the ongoing spread of COVID-19 in each United States (U.S.) state. However, reliable, timely data based on representative population sampling are unavailable, and reported case and test positivity rates are highly biased. A simple data-driven Bayesian semi-empirical modeling framework was developed and used to evaluate state-level prevalence and seroprevalence of COVID-19 using daily reported cases and test positivity ratios. The model was calibrated to and validated using published state-wide seroprevalence data, and further compared against two independent data-driven mathematical models. The prevalence of undiagnosed COVID-19 infections is found to be well-approximated by a geometrically weighted average of the positivity rate and the reported case rate. Our model accurately fits state-level seroprevalence data from across the U.S. Prevalence estimates of our semi-empirical model compare favorably to those from two data-driven epidemiological models. As of December 31, 2020, we estimate nation-wide a prevalence of 1.4% [Credible Interval (CrI): 1.0%-1.9%] and a seroprevalence of 13.2% [CrI: 12.3%-14.2%], with state-level prevalence ranging from 0.2% [CrI: 0.1%-0.3%] in Hawaii to 2.8% [CrI: 1.8%-4.1%] in Tennessee, and seroprevalence from 1.5% [CrI: 1.2%-2.0%] in Vermont to 23% [CrI: 20%-28%] in New York. Cumulatively, reported cases correspond to only one third of actual infections. The use of this simple and easy-to-communicate approach to estimating COVID-19 prevalence and seroprevalence will improve the ability to make public health decisions that effectively respond to the ongoing COVID-19 pandemic.  more » « less
Award ID(s):
2028632
NSF-PAR ID:
10293886
Author(s) / Creator(s):
;
Editor(s):
Althouse, Benjamin Muir
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
17
Issue:
9
ISSN:
1553-7358
Page Range / eLocation ID:
e1009374
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background The COVID-19 outbreak in Wuhan started in December 2019 and was under control by the end of March 2020 with a total of 50,006 confirmed cases by the implementation of a series of nonpharmaceutical interventions (NPIs) including unprecedented lockdown of the city. This study analyzes the complete outbreak data from Wuhan, assesses the impact of these public health interventions, and estimates the asymptomatic, undetected and total cases for the COVID-19 outbreak in Wuhan. Methods By taking different stages of the outbreak into account, we developed a time-dependent compartmental model to describe the dynamics of disease transmission and case detection and reporting. Model coefficients were parameterized by using the reported cases and following key events and escalated control strategies. Then the model was used to calibrate the complete outbreak data by using the Monte Carlo Markov Chain (MCMC) method. Finally we used the model to estimate asymptomatic and undetected cases and approximate the overall antibody prevalence level. Results We found that the transmission rate between Jan 24 and Feb 1, 2020, was twice as large as that before the lockdown on Jan 23 and 67.6 % (95% CI [0.584,0.759]) of detectable infections occurred during this period. Based on the reported estimates that around 20% of infections were asymptomatic and their transmission ability was about 70% of symptomatic ones, we estimated that there were about 14,448 asymptomatic and undetected cases (95% CI [12,364,23,254]), which yields an estimate of a total of 64,454 infected cases (95% CI [62,370,73,260]), and the overall antibody prevalence level in the population of Wuhan was 0.745% (95% CI [0.693 % ,0.814 % ]) by March 31, 2020. Conclusions We conclude that the control of the COVID-19 outbreak in Wuhan was achieved via the enforcement of a combination of multiple NPIs: the lockdown on Jan 23, the stay-at-home order on Feb 2, the massive isolation of all symptomatic individuals via newly constructed special shelter hospitals on Feb 6, and the large scale screening process on Feb 18. Our results indicate that the population in Wuhan is far away from establishing herd immunity and provide insights for other affected countries and regions in designing control strategies and planing vaccination programs. 
    more » « less
  2. Abstract Summary

    A major challenge in understanding the spread of certain newly emerging viruses is the presence of asymptomatic cases. Their prevalence is hard to measure in the absence of testing tools, and yet the information is critical for tracking disease spread and shaping public health policies. Here, we introduce a framework that combines classic compartmental models with travel networks and we use it to estimate asymptomatic rates. Our platform, traSIR (“tracer”), is an augmented susceptible-infectious-recovered (SIR) model that incorporates multiple locations and the flow of people between them; it has a compartment model for each location and estimates of commuting traffic between compartments. TraSIR models both asymptomatic and symptomatic infections, as well as the dampening effect symptomatic infections have on traffic between locations. We derive analytical formulae to express the asymptomatic rate as a function of other key model parameters. Next, we use simulations to show that empirical data fitting yields excellent agreement with actual asymptomatic rates using only information about the number of symptomatic infections over time and compartments. Finally, we apply our model to COVID-19 data consisting of reported daily infections in the New York metropolitan area and estimate asymptomatic rates of COVID-19 to be ∼34%, which is within the 30–40% interval derived from widespread testing. Overall, our work demonstrates that traSIR is a powerful approach to express viral propagation dynamics over geographical networks and estimate key parameters relevant to virus transmission.

    Availability and implementation

    No public repository.

     
    more » « less
  3. The ongoing highly contagious coronavirus disease 2019 (COVID-19) pandemic, which started in Wuhan, China, in December 2019, has now become a global public health problem. Using publicly available data from the COVID-19 data repository of Our World in Data, we aimed to investigate the influences of spatial socio-economic vulnerabilities and neighbourliness on the COVID-19 burden in African countries. We analyzed the first wave (January–September 2020) and second wave (October 2020 to May 2021) of the COVID-19 pandemic using spatial statistics regression models. As of 31 May 2021, there was a total of 4,748,948 confirmed COVID-19 cases, with an average, median, and range per country of 101,041, 26,963, and 2191 to 1,665,617, respectively. We found that COVID-19 prevalence in an Africa country was highly dependent on those of neighbouring Africa countries as well as its economic wealth, transparency, and proportion of the population aged 65 or older (p-value < 0.05). Our finding regarding the high COVID-19 burden in countries with better transparency and higher economic wealth is surprising and counterintuitive. We believe this is a reflection on the differences in COVID-19 testing capacity, which is mostly higher in more developed countries, or data modification by less transparent governments. Country-wide integrated COVID suppression strategies such as limiting human mobility from more urbanized to less urbanized countries, as well as an understanding of a county’s social-economic characteristics, could prepare a country to promptly and effectively respond to future outbreaks of highly contagious viral infections such as COVID-19. 
    more » « less
  4. Abstract Background

    New York City (NYC) has been one of the hotspots of the COVID‐19 pandemic in the United States. By the end of April 2020, close to 165 000 cases and 13 000 deaths were reported in the city with considerable variability across the city's ZIP codes.

    Objectives

    In this study, we examine: (a) the extent to which the variability in ZIP code‐level case positivity can be explained by aggregate markers of socioeconomic status (SES) and daily change in mobility; and (b) the extent to which daily change in mobility independently predicts case positivity.

    Methods

    COVID‐19 case positivity by ZIP code was modeled using multivariable linear regression with generalized estimating equations to account for within‐ZIP clustering. Daily case positivity was obtained from NYC Department of Health and Mental Hygiene and measures of SES were based on data from the American Community Survey. Changes in human mobility were estimated using anonymized aggregated mobile phone location systems.

    Results

    Our analysis indicates that the socioeconomic markers considered together explained 56% of the variability in case positivity through April 1 and their explanatory power decreased to 18% by April 30. Changes in mobility during this time period are not likely to be acting as a mediator of the relationship between ZIP‐level SES and case positivity. During the middle of April, increases in mobility were independently associated with decreased case positivity.

    Conclusions

    Together, these findings present evidence that heterogeneity in COVID‐19 case positivity during NYC’s spring outbreak was largely driven by residents’ SES.

     
    more » « less
  5. Abstract This project is funded by the US National Science Foundation (NSF) through their NSF RAPID program under the title “Modeling Corona Spread Using Big Data Analytics.” The project is a joint effort between the Department of Computer & Electrical Engineering and Computer Science at FAU and a research group from LexisNexis Risk Solutions. The novel coronavirus Covid-19 originated in China in early December 2019 and has rapidly spread to many countries around the globe, with the number of confirmed cases increasing every day. Covid-19 is officially a pandemic. It is a novel infection with serious clinical manifestations, including death, and it has reached at least 124 countries and territories. Although the ultimate course and impact of Covid-19 are uncertain, it is not merely possible but likely that the disease will produce enough severe illness to overwhelm the worldwide health care infrastructure. Emerging viral pandemics can place extraordinary and sustained demands on public health and health systems and on providers of essential community services. Modeling the Covid-19 pandemic spread is challenging. But there are data that can be used to project resource demands. Estimates of the reproductive number (R) of SARS-CoV-2 show that at the beginning of the epidemic, each infected person spreads the virus to at least two others, on average (Emanuel et al. in N Engl J Med. 2020, Livingston and Bucher in JAMA 323(14):1335, 2020). A conservatively low estimate is that 5 % of the population could become infected within 3 months. Preliminary data from China and Italy regarding the distribution of case severity and fatality vary widely (Wu and McGoogan in JAMA 323(13):1239–42, 2020). A recent large-scale analysis from China suggests that 80 % of those infected either are asymptomatic or have mild symptoms; a finding that implies that demand for advanced medical services might apply to only 20 % of the total infected. Of patients infected with Covid-19, about 15 % have severe illness and 5 % have critical illness (Emanuel et al. in N Engl J Med. 2020). Overall, mortality ranges from 0.25 % to as high as 3.0 % (Emanuel et al. in N Engl J Med. 2020, Wilson et al. in Emerg Infect Dis 26(6):1339, 2020). Case fatality rates are much higher for vulnerable populations, such as persons over the age of 80 years (> 14 %) and those with coexisting conditions (10 % for those with cardiovascular disease and 7 % for those with diabetes) (Emanuel et al. in N Engl J Med. 2020). Overall, Covid-19 is substantially deadlier than seasonal influenza, which has a mortality of roughly 0.1 %. Public health efforts depend heavily on predicting how diseases such as those caused by Covid-19 spread across the globe. During the early days of a new outbreak, when reliable data are still scarce, researchers turn to mathematical models that can predict where people who could be infected are going and how likely they are to bring the disease with them. These computational methods use known statistical equations that calculate the probability of individuals transmitting the illness. Modern computational power allows these models to quickly incorporate multiple inputs, such as a given disease’s ability to pass from person to person and the movement patterns of potentially infected people traveling by air and land. This process sometimes involves making assumptions about unknown factors, such as an individual’s exact travel pattern. By plugging in different possible versions of each input, however, researchers can update the models as new information becomes available and compare their results to observed patterns for the illness. In this paper we describe the development a model of Corona spread by using innovative big data analytics techniques and tools. We leveraged our experience from research in modeling Ebola spread (Shaw et al. Modeling Ebola Spread and Using HPCC/KEL System. In: Big Data Technologies and Applications 2016 (pp. 347-385). Springer, Cham) to successfully model Corona spread, we will obtain new results, and help in reducing the number of Corona patients. We closely collaborated with LexisNexis, which is a leading US data analytics company and a member of our NSF I/UCRC for Advanced Knowledge Enablement. The lack of a comprehensive view and informative analysis of the status of the pandemic can also cause panic and instability within society. Our work proposes the HPCC Systems Covid-19 tracker, which provides a multi-level view of the pandemic with the informative virus spreading indicators in a timely manner. The system embeds a classical epidemiological model known as SIR and spreading indicators based on causal model. The data solution of the tracker is built on top of the Big Data processing platform HPCC Systems, from ingesting and tracking of various data sources to fast delivery of the data to the public. The HPCC Systems Covid-19 tracker presents the Covid-19 data on a daily, weekly, and cumulative basis up to global-level and down to the county-level. It also provides statistical analysis for each level such as new cases per 100,000 population. The primary analysis such as Contagion Risk and Infection State is based on causal model with a seven-day sliding window. Our work has been released as a publicly available website to the world and attracted a great volume of traffic. The project is open-sourced and available on GitHub. The system was developed on the LexisNexis HPCC Systems, which is briefly described in the paper. 
    more » « less