skip to main content

Title: A Machine Learning Study of COVID-19 Serology and Molecular Tests and Predictions
Serology and molecular tests are the two most commonly used methods for rapid COVID-19 infection testing. The two types of tests have different mechanisms to detect infection, by measuring the presence of viral SARS-CoV-2 RNA (molecular test) or detecting the presence of antibodies triggered by the SARS-CoV-2 virus (serology test). A handful of studies have shown that symptoms, combined with demographic and/or diagnosis features, can be helpful for the prediction of COVID-19 test outcomes. However, due to nature of the test, serology and molecular tests vary significantly. There is no existing study on the correlation between serology and molecular tests, and what type of symptoms are the key factors indicating the COVID-19 positive tests. In this study, we propose a machine learning based approach to study serology and molecular tests, and use features to predict test outcomes. A total of 2,467 donors, each tested using one or multiple types of COVID-19 tests, are collected as our testbed. By cross checking test types and results, we study correlation between serology and molecular tests. For test outcome prediction, we label 2,467 donors as positive or negative, by using their serology or molecular test results, and create symptom features to represent each donor more » for learning. Because COVID-19 produces a wide range of symptoms and the data collection process is essentially error prone, we group similar symptoms into bins. This decreases the feature space and sparsity. Using binned symptoms, combined with demographic features, we train five classification algorithms to predict COVID-19 test results. Experiments show that XGBoost achieves the best performance with 76.85% accuracy and 81.4% AUC scores, demonstrating that symptoms are indeed helpful for predicting COVID-19 test outcomes. Our study investigates the relationship between serology and molecular tests, identifies meaningful symptom features associated with COVID-19 infection, and also provides a way for rapid screening and cost effective detection of COVID-19 infection. « less
Authors:
;
Award ID(s):
2027339 1763452
Publication Date:
NSF-PAR ID:
10357379
Journal Name:
Smart health
ISSN:
2352-6483
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) the causal agent for COVID-19, is a communicable disease spread through close contact. It is known to disproportionately impact certain communities due to both biological susceptibility and inequitable exposure. In this study, we investigate the most important health, social, and environmental factors impacting the early phases (before July, 2020) of per capita COVID-19 transmission and per capita all-cause mortality in US counties. We aggregate county-level physical and mental health, environmental pollution, access to health care, demographic characteristics, vulnerable population scores, and other epidemiological data to create a large feature set to analyze per capita COVID-19 outcomes. Because of the high-dimensionality, multicollinearity, and unknown interactions of the data, we use ensemble machine learning and marginal prediction methods to identify the most salient factors associated with several COVID-19 outbreak measure. Our variable importance results show that measures of ethnicity, public transportation and preventable diseases are the strongest predictors for both per capita COVID-19 incidence and mortality. Specifically, the CDC measures for minority populations, CDC measures for limited English, and proportion of Black- and/or African-American individuals in a county were the most important features for per capita COVID-19 cases within a month after the pandemicmore »started in a county and also at the latest date examined. For per capita all-cause mortality at day 100 and total to date, we find that public transportation use and proportion of Black- and/or African-American individuals in a county are the strongest predictors. The methods predict that, keeping all other factors fixed, a 10% increase in public transportation use, all other factors remaining fixed at the observed values, is associated with increases mortality at day 100 of 2012 individuals (95% CI [1972, 2356]) and likewise a 10% increase in the proportion of Black- and/or African-American individuals in a county is associated with increases total deaths at end of study of 2067 (95% CI [1189, 2654]). Using data until the end of study, the same metric suggests ethnicity has double the association as the next most important factors, which are location, disease prevalence, and transit factors. Our findings shed light on societal patterns that have been reported and experienced in the U.S. by using robust methods to understand the features most responsible for transmission and sectors of society most vulnerable to infection and mortality. In particular, our results provide evidence of the disproportionate impact of the COVID-19 pandemic on minority populations. Our results suggest that mitigation measures, including how vaccines are distributed, could have the greatest impact if they are given with priority to the highest risk communities.« less
  2. Population-scale and rapid testing for SARS-CoV-2 continues to be a priority for several parts of the world. We revisit the in vitro technology platforms for COVID-19 testing and diagnostics—molecular tests and rapid antigen tests, serology or antibody tests, and tests for the management of COVID-19 patients. Within each category of tests, we review the commercialized testing platforms, their analyzing systems, specimen collection protocols, testing methodologies, supply chain logistics, and related attributes. Our discussion is essentially focused on test products that have been granted emergency use authorization by the FDA to detect and diagnose COVID-19 infections. Different strategies for scaled-up and faster screening are covered here, such as pooled testing, screening programs, and surveillance testing. The near-term challenges lie in detecting subtle infectivity profiles, mapping the transmission dynamics of new variants, lowering the cost for testing, training a large healthcare workforce, and providing test kits for the masses. Through this review, we try to understand the feasibility of universal access to COVID-19 testing and diagnostics in the near future while being cognizant of the implicit tradeoffs during the development and distribution cycles of new testing platforms.
  3. Abstract Background Accurate diagnostic strategies to identify SARS-CoV-2 positive individuals rapidly for management of patient care and protection of health care personnel are urgently needed. The predominant diagnostic test is viral RNA detection by RT-PCR from nasopharyngeal swabs specimens, however the results are not promptly obtainable in all patient care locations. Routine laboratory testing, in contrast, is readily available with a turn-around time (TAT) usually within 1-2 hours. Method We developed a machine learning model incorporating patient demographic features (age, sex, race) with 27 routine laboratory tests to predict an individual’s SARS-CoV-2 infection status. Laboratory testing results obtained within 2 days before the release of SARS-CoV-2 RT-PCR result were used to train a gradient boosting decision tree (GBDT) model from 3,356 SARS-CoV-2 RT-PCR tested patients (1,402 positive and 1,954 negative) evaluated at a metropolitan hospital. Results The model achieved an area under the receiver operating characteristic curve (AUC) of 0.854 (95% CI: 0.829-0.878). Application of this model to an independent patient dataset from a separate hospital resulted in a comparable AUC (0.838), validating the generalization of its use. Moreover, our model predicted initial SARS-CoV-2 RT-PCR positivity in 66% individuals whose RT-PCR result changed from negative to positive within 2 days. Conclusionmore »This model employing routine laboratory test results offers opportunities for early and rapid identification of high-risk SARS-CoV-2 infected patients before their RT-PCR results are available. It may play an important role in assisting the identification of SARS-CoV-2 infected patients in areas where RT-PCR testing is not accessible due to financial or supply constraints.« less
  4. The COVID-19 pandemic demonstrated the public health benefits of reliable and accessible point-of-care (POC) diagnostic tests for viral infections. Despite the rapid development of gold-standard reverse transcription polymerase chain reaction (RT-PCR) assays for SARS-CoV-2 only weeks into the pandemic, global demand created logistical challenges that delayed access to testing for months and helped fuel the spread of COVID-19. Additionally, the extreme sensitivity of RT-PCR had a costly downside as the tests could not differentiate between patients with active infection and those who were no longer infectious but still shedding viral genomes. To address these issues for the future, we propose a novel membrane-based sensor that only detects intact virions. The sensor combines affinity and size based detection on a membrane-based sensor and does not require external power to operate or read. Specifically, the presence of intact virions, but not viral debris, fouls the membrane and triggers a macroscopically visible hydraulic switch after injection of a 40 μL sample with a pipette. The device, which we call the μSiM-DX (microfluidic device featuring a silicon membrane for diagnostics), features a biotin-coated microslit membrane with pores ∼2–3× larger than the intact virus. Streptavidin-conjugated antibody recognizing viral surface proteins are incubated with the samplemore »for ∼1 hour prior to injection into the device, and positive/negative results are obtained within ten seconds of sample injection. Proof-of-principle tests have been performed using preparations of vaccinia virus. After optimizing slit pore sizes and porous membrane area, the fouling-based sensor exhibits 100% specificity and 97% sensitivity for vaccinia virus ( n = 62). Moreover, the dynamic range of the sensor extends at least from 10 5.9 virions per mL to 10 10.4 virions per mL covering the range of mean viral loads in symptomatic COVID-19 patients (10 5.6 –10 7 RNA copies per mL). Forthcoming work will test the ability of our sensor to perform similarly in biological fluids and with SARS-CoV-2, to fully test the potential of a membrane fouling-based sensor to serve as a PCR-free alternative for POC containment efforts in the spread of infectious disease.« less
  5. Background Limited data are available regarding the balance of risks and benefits from human milk and/or breastfeeding during and following maternal infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Objective To investigate whether SARS-CoV-2 can be detected in milk and on the breast after maternal coronavirus disease 2019 (COVID-19) diagnosis; and characterize concentrations of milk immunoglobulin (Ig) A specific to the SARS-CoV-2 spike glycoprotein receptor binding domain (RBD) during the 2 months after onset of symptoms or positive diagnostic test. Methods Using a longitudinal study design, we collected milk and breast skin swabs one to seven times from 64 lactating women with COVID-19 over a 2-month period, beginning as early as the week of diagnosis. Milk and breast swabs were analyzed for SARS-CoV-2 RNA, and milk was tested for anti-RBD IgA. Results SARS-CoV-2 was not detected in any milk sample or on 71% of breast swabs. Twenty-seven out of 29 (93%) breast swabs collected after breast washing tested negative for SARS-CoV-2. Detection of SARS-CoV-2 on the breast was associated with maternal coughing and other household COVID-19. Most (75%; 95% CI, 70-79%; n=316) milk samples contained anti-RBD IgA, and concentrations increased ( P =.02) during the first two weeks followingmore »onset of COVID-19 symptoms or positive test. Milk-borne anti-RBD IgA persisted for at least two months in 77% of women. Conclusion Milk produced by women with COVID-19 does not contain SARS-CoV-2 and is likely a lasting source of passive immunity via anti-RBD IgA. These results support recommendations encouraging lactating women to continue breastfeeding during and after COVID-19 illness.« less