skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Using test positivity and reported case rates to estimate state-level COVID-19 prevalence and seroprevalence in the United States
Accurate estimates of infection prevalence and seroprevalence are essential for evaluating and informing public health responses and vaccination coverage needed to address the ongoing spread of COVID-19 in each United States (U.S.) state. However, reliable, timely data based on representative population sampling are unavailable, and reported case and test positivity rates are highly biased. A simple data-driven Bayesian semi-empirical modeling framework was developed and used to evaluate state-level prevalence and seroprevalence of COVID-19 using daily reported cases and test positivity ratios. The model was calibrated to and validated using published state-wide seroprevalence data, and further compared against two independent data-driven mathematical models. The prevalence of undiagnosed COVID-19 infections is found to be well-approximated by a geometrically weighted average of the positivity rate and the reported case rate. Our model accurately fits state-level seroprevalence data from across the U.S. Prevalence estimates of our semi-empirical model compare favorably to those from two data-driven epidemiological models. As of December 31, 2020, we estimate nation-wide a prevalence of 1.4% [Credible Interval (CrI): 1.0%-1.9%] and a seroprevalence of 13.2% [CrI: 12.3%-14.2%], with state-level prevalence ranging from 0.2% [CrI: 0.1%-0.3%] in Hawaii to 2.8% [CrI: 1.8%-4.1%] in Tennessee, and seroprevalence from 1.5% [CrI: 1.2%-2.0%] in Vermont to 23% [CrI: 20%-28%] in New York. Cumulatively, reported cases correspond to only one third of actual infections. The use of this simple and easy-to-communicate approach to estimating COVID-19 prevalence and seroprevalence will improve the ability to make public health decisions that effectively respond to the ongoing COVID-19 pandemic.  more » « less
Award ID(s):
2028632
PAR ID:
10293886
Author(s) / Creator(s):
;
Editor(s):
Althouse, Benjamin Muir
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
17
Issue:
9
ISSN:
1553-7358
Page Range / eLocation ID:
e1009374
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ImportanceIdentifying and tracking new infections during an emerging pandemic is crucial to design and deploy interventions to protect populations and mitigate the pandemic’s effects, yet it remains a challenging task. ObjectiveTo characterize the ability of nonprobability online surveys to longitudinally estimate the number of COVID-19 infections in the population both in the presence and absence of institutionalized testing. Design, Setting, and ParticipantsInternet-based online nonprobability surveys were conducted among residents aged 18 years or older across 50 US states and the District of Columbia, using the PureSpectrum survey vendor, approximately every 6 weeks between June 1, 2020, and January 31, 2023, for a multiuniversity consortium—the COVID States Project. Surveys collected information on COVID-19 infections with representative state-level quotas applied to balance age, sex, race and ethnicity, and geographic distribution. Main Outcomes and MeasuresThe main outcomes were (1) survey-weighted estimates of new monthly confirmed COVID-19 cases in the US from January 2020 to January 2023 and (2) estimates of uncounted test-confirmed cases from February 1, 2022, to January 1, 2023. These estimates were compared with institutionally reported COVID-19 infections collected by Johns Hopkins University and wastewater viral concentrations for SARS-CoV-2 from Biobot Analytics. ResultsThe survey spanned 17 waves deployed from June 1, 2020, to January 31, 2023, with a total of 408 515 responses from 306 799 respondents (mean [SD] age, 42.8 [13.0] years; 202 416 women [66.0%]). Overall, 64 946 respondents (15.9%) self-reported a test-confirmed COVID-19 infection. National survey-weighted test-confirmed COVID-19 estimates were strongly correlated with institutionally reported COVID-19 infections (Pearson correlation,r = 0.96;P < .001) from April 2020 to January 2022 (50-state correlation mean [SD] value,r = 0.88 [0.07]). This was before the government-led mass distribution of at-home rapid tests. After January 2022, correlation was diminished and no longer statistically significant (r = 0.55;P = .08; 50-state correlation mean [SD] value,r = 0.48 [0.23]). In contrast, survey COVID-19 estimates correlated highly with SARS-CoV-2 viral concentrations in wastewater both before (r = 0.92;P < .001) and after (r = 0.89;P < .001) January 2022. Institutionally reported COVID-19 cases correlated (r = 0.79;P < .001) with wastewater viral concentrations before January 2022, but poorly (r = 0.31;P = .35) after, suggesting that both survey and wastewater estimates may have better captured test-confirmed COVID-19 infections after January 2022. Consistent correlation patterns were observed at the state level. Based on national-level survey estimates, approximately 54 million COVID-19 cases were likely unaccounted for in official records between January 2022 and January 2023. Conclusions and RelevanceThis study suggests that nonprobability survey data can be used to estimate the temporal evolution of test-confirmed infections during an emerging disease outbreak. Self-reporting tools may enable government and health care officials to implement accessible and affordable at-home testing for efficient infection monitoring in the future. 
    more » « less
  2. Abstract Background The COVID-19 outbreak in Wuhan started in December 2019 and was under control by the end of March 2020 with a total of 50,006 confirmed cases by the implementation of a series of nonpharmaceutical interventions (NPIs) including unprecedented lockdown of the city. This study analyzes the complete outbreak data from Wuhan, assesses the impact of these public health interventions, and estimates the asymptomatic, undetected and total cases for the COVID-19 outbreak in Wuhan. Methods By taking different stages of the outbreak into account, we developed a time-dependent compartmental model to describe the dynamics of disease transmission and case detection and reporting. Model coefficients were parameterized by using the reported cases and following key events and escalated control strategies. Then the model was used to calibrate the complete outbreak data by using the Monte Carlo Markov Chain (MCMC) method. Finally we used the model to estimate asymptomatic and undetected cases and approximate the overall antibody prevalence level. Results We found that the transmission rate between Jan 24 and Feb 1, 2020, was twice as large as that before the lockdown on Jan 23 and 67.6 % (95% CI [0.584,0.759]) of detectable infections occurred during this period. Based on the reported estimates that around 20% of infections were asymptomatic and their transmission ability was about 70% of symptomatic ones, we estimated that there were about 14,448 asymptomatic and undetected cases (95% CI [12,364,23,254]), which yields an estimate of a total of 64,454 infected cases (95% CI [62,370,73,260]), and the overall antibody prevalence level in the population of Wuhan was 0.745% (95% CI [0.693 % ,0.814 % ]) by March 31, 2020. Conclusions We conclude that the control of the COVID-19 outbreak in Wuhan was achieved via the enforcement of a combination of multiple NPIs: the lockdown on Jan 23, the stay-at-home order on Feb 2, the massive isolation of all symptomatic individuals via newly constructed special shelter hospitals on Feb 6, and the large scale screening process on Feb 18. Our results indicate that the population in Wuhan is far away from establishing herd immunity and provide insights for other affected countries and regions in designing control strategies and planing vaccination programs. 
    more » « less
  3. null (Ed.)
    The U.S. has merely 4% of the world population, but contains 25% of the world’s COVID-19 cases. Since the COVID-19 outbreak in the U.S., Massachusetts has been leading other states in the total number of COVID-19 cases. Racial residential segregation is a fundamental cause of racial disparities in health. Moreover, disparities of access to health care have a large impact on COVID-19 cases. Thus, this study estimates racial segregation and disparities in testing site access and employs economic, demographic, and transportation variables at the city/town level in Massachusetts. Spatial regression models are applied to evaluate the relationships between COVID-19 incidence rate and related variables. This is the first study to apply spatial analysis methods across neighborhoods in the U.S. to examine the COVID-19 incidence rate. The findings are: (1) Residential segregations of Hispanic and Non-Hispanic Black/African Americans have a significantly positive association with COVID-19 incidence rate, indicating the higher susceptibility of COVID-19 infections among minority groups. (2) Non-Hispanic Black/African Americans have the shortest drive time to testing sites, followed by Hispanic, Non-Hispanic Asians, and Non-Hispanic Whites. The drive time to testing sites is significantly negatively associated with the COVID-19 incidence rate, implying the importance of the accessibility of testing sites by all populations. (3) Poverty rate and road density are significant explanatory variables. Importantly, overcrowding represented by more than one person per room is a significant variable found to be positively associated with COVID-19 incidence rate, suggesting the effectiveness of social distancing for reducing infection. (4) Different from the findings of previous studies, the elderly population rate is not statistically significantly correlated with the incidence rate because the elderly population in Massachusetts is less distributed in the hotspot regions of COVID-19 infections. The findings in this study provide useful insights for policymakers to propose new strategies to contain the COVID-19 transmissions in Massachusetts. 
    more » « less
  4. Abstract Reconstructing the incidence of SARS-CoV-2 infection is central to understanding the state of the pandemic. Seroprevalence studies are often used to assess cumulative infections as they can identify asymptomatic infection. Since July 2020, commercial laboratories have conducted nationwide serosurveys for the U.S. CDC. They employed three assays, with different sensitivities and specificities, potentially introducing biases in seroprevalence estimates. Using models, we show that accounting for assays explains some of the observed state-to-state variation in seroprevalence, and when integrating case and death surveillance data, we show that when using the Abbott assay, estimates of proportions infected can differ substantially from seroprevalence estimates. We also found that states with higher proportions infected (before or after vaccination) had lower vaccination coverages, a pattern corroborated using a separate dataset. Finally, to understand vaccination rates relative to the increase in cases, we estimated the proportions of the population that received a vaccine prior to infection. 
    more » « less
  5. Abstract Serological assays used to estimate the prevalence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) often rely on manufacturers’ cutoffs established on the basis of severe cases. We conducted a household-based serosurvey of 4,677 individuals in Chennai, India, from January to May 2021. Samples were tested for SARS-CoV-2 immunoglobulin G (IgG) antibodies to the spike (S) and nucleocapsid (N) proteins. We calculated seroprevalence, defining seropositivity using manufacturer cutoffs and using a mixture model based on measured IgG level. Using manufacturer cutoffs, there was a 5-fold difference in seroprevalence estimated by each assay. This difference was largely reconciled using the mixture model, with estimated anti-S and anti-N IgG seroprevalence of 64.9% (95% credible interval (CrI): 63.8, 66.0) and 51.5% (95% CrI: 50.2, 52.9), respectively. Age and socioeconomic factors showed inconsistent relationships with anti-S and anti-N IgG seropositivity using manufacturer cutoffs. In the mixture model, age was not associated with seropositivity, and improved household ventilation was associated with lower seropositivity odds. With global vaccine scale-up, the utility of the more stable anti-S IgG assay may be limited due to the inclusion of the S protein in several vaccines. Estimates of SARS-CoV-2 seroprevalence using alternative targets must consider heterogeneity in seroresponse to ensure that seroprevalence is not underestimated and correlates are not misinterpreted. 
    more » « less