skip to main content


Title: Passive detection of COVID-19 with wearable sensors and explainable machine learning algorithms
Abstract

Individual smartwatch or fitness band sensor data in the setting of COVID-19 has shown promise to identify symptomatic and pre-symptomatic infection or the need for hospitalization, correlations between peripheral temperature and self-reported fever, and an association between changes in heart-rate-variability and infection. In our study, a total of 38,911 individuals (61% female, 15% over 65) have been enrolled between March 25, 2020 and April 3, 2021, with 1118 reported testing positive and 7032 negative for COVID-19 by nasopharyngeal PCR swab test. We propose an explainable gradient boosting prediction model based on decision trees for the detection of COVID-19 infection that can adapt to the absence of self-reported symptoms and to the available sensor data, and that can explain the importance of each feature and the post-test-behavior for the individuals. We tested it in a cohort of symptomatic individuals who exhibited an AUC of 0.83 [0.81–0.85], or AUC = 0.78 [0.75–0.80] when considering only data before the test date, outperforming state-of-the-art algorithm in these conditions. The analysis of all individuals (including asymptomatic and pre-symptomatic) when self-reported symptoms were excluded provided an AUC of 0.78 [0.76–0.79], or AUC of 0.70 [0.69–0.72] when considering only data before the test date. Extending the use of predictive algorithms for detection of COVID-19 infection based only on passively monitored data from any device, we showed that it is possible to scale up this platform and apply the algorithm in other settings where self-reported symptoms can not be collected.

 
more » « less
Award ID(s):
2040727
NSF-PAR ID:
10383796
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
npj Digital Medicine
Volume:
4
Issue:
1
ISSN:
2398-6352
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abd El-Aty, A. M. (Ed.)
    Background Higher viral loads in SARS-CoV-2 infections may be linked to more rapid spread of emerging variants of concern (VOC). Rapid detection and isolation of cases with highest viral loads, even in pre- or asymptomatic individuals, is essential for the mitigation of community outbreaks. Methods and findings In this study, we analyze Ct values from 1297 SARS-CoV-2 positive patient saliva samples collected at the Clemson University testing lab in upstate South Carolina. Samples were identified as positive using RT-qPCR, and clade information was determined via whole genome sequencing at nearby commercial labs. We also obtained patient-reported information on symptoms and exposures at the time of testing. The lowest Ct values were observed among those infected with Delta (median: 22.61, IQR: 16.72–28.51), followed by Alpha (23.93, 18.36–28.49), Gamma (24.74, 18.84–30.64), and the more historic clade 20G (25.21, 20.50–29.916). There was a statistically significant difference in Ct value between Delta and all other clades (all p.adj<0.01), as well as between Alpha and 20G (p.adj<0.05). Additionally, pre- or asymptomatic patients (n = 1093) showed the same statistical differences between Delta and all other clades (all p.adj<0.01); however, symptomatic patients (n = 167) did not show any significant differences between clades. Our weekly testing strategy ensures that cases are caught earlier in the infection cycle, often before symptoms are present, reducing this sample size in our population. Conclusions COVID-19 variants Alpha and Delta have substantially higher viral loads in saliva compared to more historic clades. This trend is especially observed in individuals who are pre- or asymptomatic, which provides evidence supporting higher transmissibility and more rapid spread of emerging variants. Understanding the viral load of variants spreading within a community can inform public policy and clinical decision making. 
    more » « less
  2. Abstract Early detection of diseases such as COVID-19 could be a critical tool in reducing disease transmission by helping individuals recognize when they should self-isolate, seek testing, and obtain early medical intervention. Consumer wearable devices that continuously measure physiological metrics hold promise as tools for early illness detection. We gathered daily questionnaire data and physiological data using a consumer wearable (Oura Ring) from 63,153 participants, of whom 704 self-reported possible COVID-19 disease. We selected 73 of these 704 participants with reliable confirmation of COVID-19 by PCR testing and high-quality physiological data for algorithm training to identify onset of COVID-19 using machine learning classification. The algorithm identified COVID-19 an average of 2.75 days before participants sought diagnostic testing with a sensitivity of 82% and specificity of 63%. The receiving operating characteristic (ROC) area under the curve (AUC) was 0.819 (95% CI [0.809, 0.830]). Including continuous temperature yielded an AUC 4.9% higher than without this feature. For further validation, we obtained SARS CoV-2 antibody in a subset of participants and identified 10 additional participants who self-reported COVID-19 disease with antibody confirmation. The algorithm had an overall ROC AUC of 0.819 (95% CI [0.809, 0.830]), with a sensitivity of 90% and specificity of 80% in these additional participants. Finally, we observed substantial variation in accuracy based on age and biological sex. Findings highlight the importance of including temperature assessment, using continuous physiological features for alignment, and including diverse populations in algorithm development to optimize accuracy in COVID-19 detection from wearables. 
    more » « less
  3. Background Internet data can be used to improve infectious disease models. However, the representativeness and individual-level validity of internet-derived measures are largely unexplored as this requires ground truth data for study. Objective This study sought to identify relationships between Web-based behaviors and/or conversation topics and health status using a ground truth, survey-based dataset. Methods This study leveraged a unique dataset of self-reported surveys, microbiological laboratory tests, and social media data from the same individuals toward understanding the validity of individual-level constructs pertaining to influenza-like illness in social media data. Logistic regression models were used to identify illness in Twitter posts using user posting behaviors and topic model features extracted from users’ tweets. Results Of 396 original study participants, only 81 met the inclusion criteria for this study. Of these participants’ tweets, we identified only two instances that were related to health and occurred within 2 weeks (before or after) of a survey indicating symptoms. It was not possible to predict when participants reported symptoms using features derived from topic models (area under the curve [AUC]=0.51; P=.38), though it was possible using behavior features, albeit with a very small effect size (AUC=0.53; P≤.001). Individual symptoms were also generally not predictable either. The study sample and a random sample from Twitter are predictably different on held-out data (AUC=0.67; P≤.001), meaning that the content posted by people who participated in this study was predictably different from that posted by random Twitter users. Individuals in the random sample and the GoViral sample used Twitter with similar frequencies (similar @ mentions, number of tweets, and number of retweets; AUC=0.50; P=.19). Conclusions To our knowledge, this is the first instance of an attempt to use a ground truth dataset to validate infectious disease observations in social media data. The lack of signal, the lack of predictability among behaviors or topics, and the demonstrated volunteer bias in the study population are important findings for the large and growing body of disease surveillance using internet-sourced data. 
    more » « less
  4. Abstract STUDY QUESTION

    To what extent is preconception maternal or paternal coronavirus disease 2019 (COVID-19) vaccination associated with miscarriage incidence?

    SUMMARY ANSWER

    COVID-19 vaccination in either partner at any time before conception is not associated with an increased rate of miscarriage.

    WHAT IS KNOWN ALREADY

    Several observational studies have evaluated the safety of COVID-19 vaccination during pregnancy and found no association with miscarriage, though no study prospectively evaluated the risk of early miscarriage (gestational weeks [GW] <8) in relation to COVID-19 vaccination. Moreover, no study has evaluated the role of preconception vaccination in both male and female partners.

    STUDY DESIGN, SIZE, DURATION

    An Internet-based, prospective preconception cohort study of couples residing in the USA and Canada. We analyzed data from 1815 female participants who conceived during December 2020–November 2022, including 1570 couples with data on male partner vaccination.

    PARTICIPANTS/MATERIALS, SETTING, METHODS

    Eligible female participants were aged 21–45 years and were trying to conceive without use of fertility treatment at enrollment. Female participants completed questionnaires at baseline, every 8 weeks until pregnancy, and during early and late pregnancy; they could also invite their male partners to complete a baseline questionnaire. We collected data on COVID-19 vaccination (brand and date of doses), history of SARS-CoV-2 infection (yes/no and date of positive test), potential confounders (demographic, reproductive, and lifestyle characteristics), and pregnancy status on all questionnaires. Vaccination status was categorized as never (0 doses before conception), ever (≥1 dose before conception), having a full primary sequence before conception, and completing the full primary sequence ≤3 months before conception. These categories were not mutually exclusive. Participants were followed up from their first positive pregnancy test until miscarriage or a censoring event (induced abortion, ectopic pregnancy, loss to follow-up, 20 weeks’ gestation), whichever occurred first. We estimated incidence rate ratios (IRRs) for miscarriage and corresponding 95% CIs using Cox proportional hazards models with GW as the time scale. We used propensity score fine stratification weights to adjust for confounding.

    MAIN RESULTS AND THE ROLE OF CHANCE

    Among 1815 eligible female participants, 75% had received at least one dose of a COVID-19 vaccine by the time of conception. Almost one-quarter of pregnancies resulted in miscarriage, and 75% of miscarriages occurred <8 weeks’ gestation. The propensity score-weighted IRR comparing female participants who received at least one dose any time before conception versus those who had not been vaccinated was 0.85 (95% CI: 0.63, 1.14). COVID-19 vaccination was not associated with increased risk of either early miscarriage (GW: <8) or late miscarriage (GW: 8–19). There was no indication of an increased risk of miscarriage associated with male partner vaccination (IRR = 0.90; 95% CI: 0.56, 1.44).

    LIMITATIONS, REASONS FOR CAUTION

    The present study relied on self-reported vaccination status and infection history. Thus, there may be some non-differential misclassification of exposure status. While misclassification of miscarriage is also possible, the preconception cohort design and high prevalence of home pregnancy testing in this cohort reduced the potential for under-ascertainment of miscarriage. As in all observational studies, residual or unmeasured confounding is possible.

    WIDER IMPLICATIONS OF THE FINDINGS

    This is the first study to evaluate prospectively the relation between preconception COVID-19 vaccination in both partners and miscarriage, with more complete ascertainment of early miscarriages than earlier studies of vaccination. The findings are informative for individuals planning a pregnancy and their healthcare providers.

    STUDY FUNDING/COMPETING INTEREST(S)

    This work was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development, the National Institute of Health [R01-HD086742 (PI: L.A.W.); R01-HD105863S1 (PI: L.A.W. and M.L.E.)], the National Institute of Allergy and Infectious Diseases (R03-AI154544; PI: A.K.R.), and the National Science Foundation (NSF-1914792; PI: L.A.W.). The funders had no role in the study design, data collection, analysis and interpretation of data, writing of the report, or the decision to submit the paper for publication. L.A.W. is a fibroid consultant for AbbVie, Inc. She also receives in-kind donations from Swiss Precision Diagnostics (Clearblue home pregnancy tests) and Kindara.com (fertility apps). M.L.E. received consulting fees from Ro, Hannah, Dadi, VSeat, and Underdog, holds stock in Ro, Hannah, Dadi, and Underdog, is a past president of SSMR, and is a board member of SMRU. K.F.H. reports being an investigator on grants to her institution from UCB and Takeda, unrelated to this study. S.H.-D. reports being an investigator on grants to her institution from Takeda, unrelated to this study, and a methods consultant for UCB and Roche for unrelated drugs. The authors report no other relationships or activities that could appear to have influenced the submitted work.

    TRIAL REGISTRATION NUMBER

    N/A.

     
    more » « less
  5. Serology and molecular tests are the two most commonly used methods for rapid COVID-19 infection testing. The two types of tests have different mechanisms to detect infection, by measuring the presence of viral SARS-CoV-2 RNA (molecular test) or detecting the presence of antibodies triggered by the SARS-CoV-2 virus (serology test). A handful of studies have shown that symptoms, combined with demographic and/or diagnosis features, can be helpful for the prediction of COVID-19 test outcomes. However, due to nature of the test, serology and molecular tests vary significantly. There is no existing study on the correlation between serology and molecular tests, and what type of symptoms are the key factors indicating the COVID-19 positive tests. In this study, we propose a machine learning based approach to study serology and molecular tests, and use features to predict test outcomes. A total of 2,467 donors, each tested using one or multiple types of COVID-19 tests, are collected as our testbed. By cross checking test types and results, we study correlation between serology and molecular tests. For test outcome prediction, we label 2,467 donors as positive or negative, by using their serology or molecular test results, and create symptom features to represent each donor for learning. Because COVID-19 produces a wide range of symptoms and the data collection process is essentially error prone, we group similar symptoms into bins. This decreases the feature space and sparsity. Using binned symptoms, combined with demographic features, we train five classification algorithms to predict COVID-19 test results. Experiments show that XGBoost achieves the best performance with 76.85% accuracy and 81.4% AUC scores, demonstrating that symptoms are indeed helpful for predicting COVID-19 test outcomes. Our study investigates the relationship between serology and molecular tests, identifies meaningful symptom features associated with COVID-19 infection, and also provides a way for rapid screening and cost effective detection of COVID-19 infection. 
    more » « less