skip to main content


Title: A Machine Learning Study of COVID-19 Serology and Molecular Tests and Predictions
Serology and molecular tests are the two most commonly used methods for rapid COVID-19 infection testing. The two types of tests have different mechanisms to detect infection, by measuring the presence of viral SARS-CoV-2 RNA (molecular test) or detecting the presence of antibodies triggered by the SARS-CoV-2 virus (serology test). A handful of studies have shown that symptoms, combined with demographic and/or diagnosis features, can be helpful for the prediction of COVID-19 test outcomes. However, due to nature of the test, serology and molecular tests vary significantly. There is no existing study on the correlation between serology and molecular tests, and what type of symptoms are the key factors indicating the COVID-19 positive tests. In this study, we propose a machine learning based approach to study serology and molecular tests, and use features to predict test outcomes. A total of 2,467 donors, each tested using one or multiple types of COVID-19 tests, are collected as our testbed. By cross checking test types and results, we study correlation between serology and molecular tests. For test outcome prediction, we label 2,467 donors as positive or negative, by using their serology or molecular test results, and create symptom features to represent each donor for learning. Because COVID-19 produces a wide range of symptoms and the data collection process is essentially error prone, we group similar symptoms into bins. This decreases the feature space and sparsity. Using binned symptoms, combined with demographic features, we train five classification algorithms to predict COVID-19 test results. Experiments show that XGBoost achieves the best performance with 76.85% accuracy and 81.4% AUC scores, demonstrating that symptoms are indeed helpful for predicting COVID-19 test outcomes. Our study investigates the relationship between serology and molecular tests, identifies meaningful symptom features associated with COVID-19 infection, and also provides a way for rapid screening and cost effective detection of COVID-19 infection.  more » « less
Award ID(s):
2027339 1763452
NSF-PAR ID:
10357379
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Smart health
ISSN:
2352-6483
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Importance

    The frequent occurrence of cognitive symptoms in post–COVID-19 condition has been described, but the nature of these symptoms and their demographic and functional factors are not well characterized in generalizable populations.

    Objective

    To investigate the prevalence of self-reported cognitive symptoms in post–COVID-19 condition, in comparison with individuals with prior acute SARS-CoV-2 infection who did not develop post–COVID-19 condition, and their association with other individual features, including depressive symptoms and functional status.

    Design, Setting, and Participants

    Two waves of a 50-state nonprobability population-based internet survey conducted between December 22, 2022, and May 5, 2023. Participants included survey respondents aged 18 years and older.

    Exposure

    Post–COVID-19 condition, defined as self-report of symptoms attributed to COVID-19 beyond 2 months after the initial month of illness.

    Main Outcomes and Measures

    Seven items from the Neuro-QoL cognition battery assessing the frequency of cognitive symptoms in the past week and patient Health Questionnaire-9.

    Results

    The 14 767 individuals reporting test-confirmed COVID-19 illness at least 2 months before the survey had a mean (SD) age of 44.6 (16.3) years; 568 (3.8%) were Asian, 1484 (10.0%) were Black, 1408 (9.5%) were Hispanic, and 10 811 (73.2%) were White. A total of 10 037 respondents (68.0%) were women and 4730 (32.0%) were men. Of the 1683 individuals reporting post–COVID-19 condition, 955 (56.7%) reported at least 1 cognitive symptom experienced daily, compared with 3552 of 13 084 (27.1%) of those who did not report post–COVID-19 condition. More daily cognitive symptoms were associated with a greater likelihood of reporting at least moderate interference with functioning (unadjusted odds ratio [OR], 1.31 [95% CI, 1.25-1.36]; adjusted [AOR], 1.30 [95% CI, 1.25-1.36]), lesser likelihood of full-time employment (unadjusted OR, 0.95 [95% CI, 0.91-0.99]; AOR, 0.92 [95% CI, 0.88-0.96]) and greater severity of depressive symptoms (unadjusted coefficient, 1.40 [95% CI, 1.29-1.51]; adjusted coefficient 1.27 [95% CI, 1.17-1.38). After including depressive symptoms in regression models, associations were also found between cognitive symptoms and at least moderate interference with everyday functioning (AOR, 1.27 [95% CI, 1.21-1.33]) and between cognitive symptoms and lower odds of full-time employment (AOR, 0.92 [95% CI, 0.88-0.97]).

    Conclusions and Relevance

    The findings of this survey study of US adults suggest that cognitive symptoms are common among individuals with post–COVID-19 condition and associated with greater self-reported functional impairment, lesser likelihood of full-time employment, and greater depressive symptom severity. Screening for and addressing cognitive symptoms is an important component of the public health response to post–COVID-19 condition.

     
    more » « less
  2. Abstract Rapid testing is essential to fighting pandemics such as coronavirus disease 2019 (COVID-19), the disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Exhaled human breath contains multiple volatile molecules providing powerful potential for non-invasive diagnosis of diverse medical conditions. We investigated breath detection of SARS-CoV-2 infection using cavity-enhanced direct frequency comb spectroscopy (CE-DFCS), a state-of-the-art laser spectroscopic technique capable of a real-time massive collection of broadband molecular absorption features at ro-vibrational quantum state resolution and at parts-per-trillion volume detection sensitivity. Using a total of 170 individual breath samples (83 positive and 87 negative with SARS-CoV-2 based on reverse transcription polymerase chain reaction tests), we report excellent discrimination capability for SARS-CoV-2 infection with an area under the receiver-operating-characteristics curve of 0.849(4). Our results support the development of CE-DFCS as an alternative, rapid, non-invasive test for COVID-19 and highlight its remarkable potential for optical diagnoses of diverse biological conditions and disease states. 
    more » « less
  3. The COVID-19 pandemic demonstrated the public health benefits of reliable and accessible point-of-care (POC) diagnostic tests for viral infections. Despite the rapid development of gold-standard reverse transcription polymerase chain reaction (RT-PCR) assays for SARS-CoV-2 only weeks into the pandemic, global demand created logistical challenges that delayed access to testing for months and helped fuel the spread of COVID-19. Additionally, the extreme sensitivity of RT-PCR had a costly downside as the tests could not differentiate between patients with active infection and those who were no longer infectious but still shedding viral genomes. To address these issues for the future, we propose a novel membrane-based sensor that only detects intact virions. The sensor combines affinity and size based detection on a membrane-based sensor and does not require external power to operate or read. Specifically, the presence of intact virions, but not viral debris, fouls the membrane and triggers a macroscopically visible hydraulic switch after injection of a 40 μL sample with a pipette. The device, which we call the μSiM-DX (microfluidic device featuring a silicon membrane for diagnostics), features a biotin-coated microslit membrane with pores ∼2–3× larger than the intact virus. Streptavidin-conjugated antibody recognizing viral surface proteins are incubated with the sample for ∼1 hour prior to injection into the device, and positive/negative results are obtained within ten seconds of sample injection. Proof-of-principle tests have been performed using preparations of vaccinia virus. After optimizing slit pore sizes and porous membrane area, the fouling-based sensor exhibits 100% specificity and 97% sensitivity for vaccinia virus ( n = 62). Moreover, the dynamic range of the sensor extends at least from 10 5.9 virions per mL to 10 10.4 virions per mL covering the range of mean viral loads in symptomatic COVID-19 patients (10 5.6 –10 7 RNA copies per mL). Forthcoming work will test the ability of our sensor to perform similarly in biological fluids and with SARS-CoV-2, to fully test the potential of a membrane fouling-based sensor to serve as a PCR-free alternative for POC containment efforts in the spread of infectious disease. 
    more » « less
  4. Abstract Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) the causal agent for COVID-19, is a communicable disease spread through close contact. It is known to disproportionately impact certain communities due to both biological susceptibility and inequitable exposure. In this study, we investigate the most important health, social, and environmental factors impacting the early phases (before July, 2020) of per capita COVID-19 transmission and per capita all-cause mortality in US counties. We aggregate county-level physical and mental health, environmental pollution, access to health care, demographic characteristics, vulnerable population scores, and other epidemiological data to create a large feature set to analyze per capita COVID-19 outcomes. Because of the high-dimensionality, multicollinearity, and unknown interactions of the data, we use ensemble machine learning and marginal prediction methods to identify the most salient factors associated with several COVID-19 outbreak measure. Our variable importance results show that measures of ethnicity, public transportation and preventable diseases are the strongest predictors for both per capita COVID-19 incidence and mortality. Specifically, the CDC measures for minority populations, CDC measures for limited English, and proportion of Black- and/or African-American individuals in a county were the most important features for per capita COVID-19 cases within a month after the pandemic started in a county and also at the latest date examined. For per capita all-cause mortality at day 100 and total to date, we find that public transportation use and proportion of Black- and/or African-American individuals in a county are the strongest predictors. The methods predict that, keeping all other factors fixed, a 10% increase in public transportation use, all other factors remaining fixed at the observed values, is associated with increases mortality at day 100 of 2012 individuals (95% CI [1972, 2356]) and likewise a 10% increase in the proportion of Black- and/or African-American individuals in a county is associated with increases total deaths at end of study of 2067 (95% CI [1189, 2654]). Using data until the end of study, the same metric suggests ethnicity has double the association as the next most important factors, which are location, disease prevalence, and transit factors. Our findings shed light on societal patterns that have been reported and experienced in the U.S. by using robust methods to understand the features most responsible for transmission and sectors of society most vulnerable to infection and mortality. In particular, our results provide evidence of the disproportionate impact of the COVID-19 pandemic on minority populations. Our results suggest that mitigation measures, including how vaccines are distributed, could have the greatest impact if they are given with priority to the highest risk communities. 
    more » « less
  5. Population-scale and rapid testing for SARS-CoV-2 continues to be a priority for several parts of the world. We revisit the in vitro technology platforms for COVID-19 testing and diagnostics—molecular tests and rapid antigen tests, serology or antibody tests, and tests for the management of COVID-19 patients. Within each category of tests, we review the commercialized testing platforms, their analyzing systems, specimen collection protocols, testing methodologies, supply chain logistics, and related attributes. Our discussion is essentially focused on test products that have been granted emergency use authorization by the FDA to detect and diagnose COVID-19 infections. Different strategies for scaled-up and faster screening are covered here, such as pooled testing, screening programs, and surveillance testing. The near-term challenges lie in detecting subtle infectivity profiles, mapping the transmission dynamics of new variants, lowering the cost for testing, training a large healthcare workforce, and providing test kits for the masses. Through this review, we try to understand the feasibility of universal access to COVID-19 testing and diagnostics in the near future while being cognizant of the implicit tradeoffs during the development and distribution cycles of new testing platforms. 
    more » « less