skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A statistical model of COVID-19 testing in populations: effects of sampling bias and testing errors
We develop a statistical model for the testing of disease prevalence in a population. The model assumes a binary test result, positive or negative, but allows for biases in sample selection and both type I (false positive) and type II (false negative) testing errors. Our model also incorporates multiple test types and is able to distinguish between retesting and exclusion after testing. Our quantitative framework allows us to directly interpret testing results as a function of errors and biases. By applying our testing model to COVID-19 testing data and actual case data from specific jurisdictions, we are able to estimate and provide uncertainty quantification of indices that are crucial in a pandemic, such as disease prevalence and fatality ratios. This article is part of the theme issue ‘Data science approach to infectious disease surveillance’.  more » « less
Award ID(s):
1814090
PAR ID:
10342817
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
Volume:
380
Issue:
2214
ISSN:
1364-503X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Pantea, Casian (Ed.)
    Limited testing capacity for COVID-19 has hampered the pandemic response. Pooling is a testing method wherein samples from specimens (e.g., swabs) from multiple subjects are combined into a pool and screened with a single test. If the pool tests positive, then new samples from the collected specimens are individually tested, while if the pool tests negative, the subjects are classified as negative for the disease. Pooling can substantially expand COVID-19 testing capacity and throughput, without requiring additional resources. We develop a mathematical model to determine the best pool size for different risk groups , based on each group’s estimated COVID-19 prevalence. Our approach takes into consideration the sensitivity and specificity of the test, and a dynamic and uncertain prevalence, and provides a robust pool size for each group. For practical relevance, we also develop a companion COVID-19 pooling design tool (through a spread sheet). To demonstrate the potential value of pooling, we study COVID-19 screening using testing data from Iceland for the period, February-28-2020 to June-14-2020, for subjects stratified into high- and low-risk groups. We implement the robust pooling strategy within a sequential framework, which updates pool sizes each week, for each risk group, based on prior week’s testing data. Robust pooling reduces the number of tests, over individual testing, by 88.5% to 90.2%, and 54.2% to 61.9%, respectively, for the low-risk and high-risk groups (based on test sensitivity values in the range [0.71, 0.98] as reported in the literature). This results in much shorter times, on average, to get the test results compared to individual testing (due to the higher testing throughput), and also allows for expanded screening to cover more individuals. Thus, robust pooling can potentially be a valuable strategy for COVID-19 screening. 
    more » « less
  2. Abstract BackgroundPathogenic infections pose a significant threat to global health, affecting millions of people every year and presenting substantial challenges to healthcare systems worldwide. Efficient and timely testing plays a critical role in disease control and transmission prevention. Group testing is a well-established method for reducing the number of tests needed to screen large populations when the disease prevalence is low. However, it does not fully utilize the quantitative information provided by qPCR methods, nor is it able to accommodate a wide range of pathogen loads. ResultsTo address these issues, we introduce a novel adaptive semi-quantitative group testing (SQGT) scheme to efficiently screen populations via two-stage qPCR testing. The SQGT method quantizes cycle threshold (Ct) values into multiple bins, leveraging the information from the first stage of screening to improve the detection sensitivity. DynamicCtthreshold adjustments mitigate dilution effects and enhance test accuracy. Comparisons with traditional binary outcome GT methods show that SQGT reduces the number of tests by 24% on the only complete real-world qPCR group testing dataset from Israel, while maintaining a negligible false negative rate. ConclusionIn conclusion, our adaptive SQGT approach, utilizing qPCR data and dynamic threshold adjustments, offers a promising solution for efficient population screening. With a reduction in the number of tests and minimal false negatives, SQGT holds potential to enhance disease control and testing strategies on a global scale. 
    more » « less
  3. The use of machine learning algorithms in healthcare can amplify social injustices and health inequities. While the exacerbation of biases can occur and be compounded during problem selection, data collection, and outcome definition, this research pertains to the generalizability impediments that occur during the development and post-deployment of machine learning classification algorithms. Using the Framingham coronary heart disease data as a case study, we show how to effectively select a probability cutoff to convert a regression model for a dichotomous variable into a classifier. We then compare the sampling distribution of the predictive performance of eight machine learning classification algorithms under four stratified training/testing scenarios to test their generalizability and their potential to perpetuate biases. We show that both extreme gradient boosting and support vector machine are flawed when trained on an unbalanced dataset. We then show that the double discriminant scoring of type 1 and 2 is the most generalizable with respect to the true positive and negative rates, respectively, as it consistently outperforms the other classification algorithms, regardless of the training/testing scenario. Finally, we introduce a methodology to extract an optimal variable hierarchy for a classification algorithm and illustrate it on the overall, male and female Framingham coronary heart disease data. 
    more » « less
  4. Abstract Background: Results: To address these issues, we introduce a novel adaptive semi-quantitative group testing (SQGT) scheme to e ciently screen populations via two-stage qPCR testing. The SQGT method quantizes cycle threshold (Ct) values into multiple bins, leveraging the information from the rst stage of screening to improve the detection sensitivity. Dynamic Ct threshold adjustments mitigate dilution e ects and enhance test accuracy. Comparisons with traditional binary outcome GT methods show that SQGT reduces the number of tests by 24% on the only complete real-world qPCR group testing dataset from Israel, while maintaining a negligible false negative rate. Conclusion: In conclusion, our adaptive SQGT approach, utilizing qPCR data and dynamic threshold adjustments, o ers a promising solution for e cient population screening. With a reduction in the number of tests and minimal false negatives, SQGT holds potential to enhance disease control and testing strategies on a global scale. Keywords: Group testing, Pooled testing, Semiquantitative group testing, qPCR, Ct values, Viral load, COVID-19 
    more » « less
  5. Feldman, Marcus (Ed.)
    Characterizing the relationship between disease testing behaviors and infectious disease dynamics is of great importance for public health. Tests for both current and past infection can influence disease-related behaviors at the individual level, while population-level knowledge of an epidemic’s course may feed back to affect one’s likelihood of taking a test. The COVID-19 pandemic has generated testing data on an unprecedented scale for tests detecting both current infection (PCR, antigen) and past infection (serology); this opens the way to characterizing the complex relationship between testing behavior and infection dynamics. Leveraging a rich database of individualized COVID-19 testing histories in New Jersey, we analyze the behavioral relationships between PCR and serology tests, infection, and vaccination. We quantify interactions between individuals’ test-taking tendencies and their past testing and infection histories, finding that PCR tests were disproportionately taken by people currently infected, and serology tests were disproportionately taken by people with past infection or vaccination. The effects of previous positive test results on testing behavior are less consistent, as individuals with past PCR positives were more likely to take subsequent PCR and serology tests at some periods of the epidemic time course and less likely at others. Lastly, we fit a model to the titer values collected from serology tests to infer vaccination trends, finding a marked decrease in vaccination rates among individuals who had previously received a positive PCR test. These results exemplify the utility of individualized testing histories in uncovering hidden behavioral variables affecting testing and vaccination. 
    more » « less