skip to main content


Title: Development and External Validation of a Machine Learning Tool to Rule Out COVID-19 Among Adults in the Emergency Department Using Routine Blood Tests: A Large, Multicenter, Real-World Study
Background Conventional diagnosis of COVID-19 with reverse transcription polymerase chain reaction (RT-PCR) testing (hereafter, PCR) is associated with prolonged time to diagnosis and significant costs to run the test. The SARS-CoV-2 virus might lead to characteristic patterns in the results of widely available, routine blood tests that could be identified with machine learning methodologies. Machine learning modalities integrating findings from these common laboratory test results might accelerate ruling out COVID-19 in emergency department patients. Objective We sought to develop (ie, train and internally validate with cross-validation techniques) and externally validate a machine learning model to rule out COVID 19 using only routine blood tests among adults in emergency departments. Methods Using clinical data from emergency departments (EDs) from 66 US hospitals before the pandemic (before the end of December 2019) or during the pandemic (March-July 2020), we included patients aged ≥20 years in the study time frame. We excluded those with missing laboratory results. Model training used 2183 PCR-confirmed cases from 43 hospitals during the pandemic; negative controls were 10,000 prepandemic patients from the same hospitals. External validation used 23 hospitals with 1020 PCR-confirmed cases and 171,734 prepandemic negative controls. The main outcome was COVID 19 status predicted using same-day routine laboratory results. Model performance was assessed with area under the receiver operating characteristic (AUROC) curve as well as sensitivity, specificity, and negative predictive value (NPV). Results Of 192,779 patients included in the training, external validation, and sensitivity data sets (median age decile 50 [IQR 30-60] years, 40.5% male [78,249/192,779]), AUROC for training and external validation was 0.91 (95% CI 0.90-0.92). Using a risk score cutoff of 1.0 (out of 100) in the external validation data set, the model achieved sensitivity of 95.9% and specificity of 41.7%; with a cutoff of 2.0, sensitivity was 92.6% and specificity was 59.9%. At the cutoff of 2.0, the NPVs at a prevalence of 1%, 10%, and 20% were 99.9%, 98.6%, and 97%, respectively. Conclusions A machine learning model developed with multicenter clinical data integrating commonly collected ED laboratory data demonstrated high rule-out accuracy for COVID-19 status, and might inform selective use of PCR-based testing.  more » « less
Award ID(s):
2014934
NSF-PAR ID:
10275416
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
Journal of Medical Internet Research
Volume:
22
Issue:
12
ISSN:
1438-8871
Page Range / eLocation ID:
e24048
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Background . New York City (NYC) experienced an initial surge and gradual decline in the number of SARS-CoV-2-confirmed cases in 2020. A change in the pattern of laboratory test results in COVID-19 patients over this time has not been reported or correlated with patient outcome. Methods . We performed a retrospective study of routine laboratory and SARS-CoV-2 RT-PCR test results from 5,785 patients evaluated in a NYC hospital emergency department from March to June employing machine learning analysis. Results . A COVID-19 high-risk laboratory test result profile (COVID19-HRP), consisting of 21 routine blood tests, was identified to characterize the SARS-CoV-2 patients. Approximately half of the SARS-CoV-2 positive patients had the distinct COVID19-HRP that separated them from SARS-CoV-2 negative patients. SARS-CoV-2 patients with the COVID19-HRP had higher SARS-CoV-2 viral loads, determined by cycle threshold values from the RT-PCR, and poorer clinical outcome compared to other positive patients without the COVID12-HRP. Furthermore, the percentage of SARS-CoV-2 patients with the COVID19-HRP has significantly decreased from March/April to May/June. Notably, viral load in the SARS-CoV-2 patients declined, and their laboratory profile became less distinguishable from SARS-CoV-2 negative patients in the later phase. Conclusions . Our longitudinal analysis illustrates the temporal change of laboratory test result profile in SARS-CoV-2 patients and the COVID-19 evolvement in a US epicenter. This analysis could become an important tool in COVID-19 population disease severity tracking and prediction. In addition, this analysis may play an important role in prioritizing high-risk patients, assisting in patient triaging and optimizing the usage of resources. 
    more » « less
  2. null (Ed.)
    Background The novel coronavirus SARS-CoV-2 and its associated disease, COVID-19, have caused worldwide disruption, leading countries to take drastic measures to address the progression of the disease. As SARS-CoV-2 continues to spread, hospitals are struggling to allocate resources to patients who are most at risk. In this context, it has become important to develop models that can accurately predict the severity of infection of hospitalized patients to help guide triage, planning, and resource allocation. Objective The aim of this study was to develop accurate models to predict the mortality of hospitalized patients with COVID-19 using basic demographics and easily obtainable laboratory data. Methods We performed a retrospective study of 375 hospitalized patients with COVID-19 in Wuhan, China. The patients were randomly split into derivation and validation cohorts. Regularized logistic regression and support vector machine classifiers were trained on the derivation cohort, and accuracy metrics (F1 scores) were computed on the validation cohort. Two types of models were developed: the first type used laboratory findings from the entire length of the patient’s hospital stay, and the second type used laboratory findings that were obtained no later than 12 hours after admission. The models were further validated on a multicenter external cohort of 542 patients. Results Of the 375 patients with COVID-19, 174 (46.4%) died of the infection. The study cohort was composed of 224/375 men (59.7%) and 151/375 women (40.3%), with a mean age of 58.83 years (SD 16.46). The models developed using data from throughout the patients’ length of stay demonstrated accuracies as high as 97%, whereas the models with admission laboratory variables possessed accuracies of up to 93%. The latter models predicted patient outcomes an average of 11.5 days in advance. Key variables such as lactate dehydrogenase, high-sensitivity C-reactive protein, and percentage of lymphocytes in the blood were indicated by the models. In line with previous studies, age was also found to be an important variable in predicting mortality. In particular, the mean age of patients who survived COVID-19 infection (50.23 years, SD 15.02) was significantly lower than the mean age of patients who died of the infection (68.75 years, SD 11.83; P<.001). Conclusions Machine learning models can be successfully employed to accurately predict outcomes of patients with COVID-19. Our models achieved high accuracies and could predict outcomes more than one week in advance; this promising result suggests that these models can be highly useful for resource allocation in hospitals. 
    more » « less
  3. Abstract Objective: The aim of this study was to investigate the performance of key hospital units associated with emergency care of both routine emergency and pandemic (COVID-19) patients under capacity enhancing strategies. Methods: This investigation was conducted using whole-hospital, resource-constrained, patient-based, stochastic, discrete-event, simulation models of a generic 200-bed urban U.S. tertiary hospital serving routine emergency and COVID-19 patients. Systematically designed numerical experiments were conducted to provide generalizable insights into how hospital functionality may be affected by the care of COVID-19 pandemic patients along specially designated care paths, under changing pandemic situations, from getting ready to turning all of its resources to pandemic care. Results: Several insights are presented. For example, each day of reduction in average ICU length of stay increases intensive care unit patient throughput by up to 24% for high COVID-19 daily patient arrival levels. The potential of 5 specific interventions and 2 critical shifts in care strategies to significantly increase hospital capacity is also described. Conclusions: These estimates enable hospitals to repurpose space, modify operations, implement crisis standards of care, collaborate with other health care facilities, or request external support, thereby increasing the likelihood that arriving patients will find an open staffed bed when 1 is needed. 
    more » « less
  4. Abstract

    The strain on healthcare resources brought forth by the recent COVID-19 pandemic has highlighted the need for efficient resource planning and allocation through the prediction of future consumption. Machine learning can predict resource utilization such as the need for hospitalization based on past medical data stored in electronic medical records (EMR). We conducted this study on 3194 patients (46% male with mean age 56.7 (±16.8), 56% African American, 7% Hispanic) flagged as COVID-19 positive cases in 12 centers under Emory Healthcare network from February 2020 to September 2020, to assess whether a COVID-19 positive patient’s need for hospitalization can be predicted at the time of RT-PCR test using the EMR data prior to the test. Five main modalities of EMR, i.e., demographics, medication, past medical procedures, comorbidities, and laboratory results, were used as features for predictive modeling, both individually and fused together using late, middle, and early fusion. Models were evaluated in terms of precision, recall, F1-score (within 95% confidence interval). The early fusion model is the most effective predictor with 84% overall F1-score [CI 82.1–86.1]. The predictive performance of the model drops by 6 % when using recent clinical data while omitting the long-term medical history. Feature importance analysis indicates that history of cardiovascular disease, emergency room visits in the past year prior to testing, and demographic factors are predictive of the disease trajectory. We conclude that fusion modeling using medical history and current treatment data can forecast the need for hospitalization for patients infected with COVID-19 at the time of the RT-PCR test.

     
    more » « less
  5. The COVID-19 pandemic demonstrated the public health benefits of reliable and accessible point-of-care (POC) diagnostic tests for viral infections. Despite the rapid development of gold-standard reverse transcription polymerase chain reaction (RT-PCR) assays for SARS-CoV-2 only weeks into the pandemic, global demand created logistical challenges that delayed access to testing for months and helped fuel the spread of COVID-19. Additionally, the extreme sensitivity of RT-PCR had a costly downside as the tests could not differentiate between patients with active infection and those who were no longer infectious but still shedding viral genomes. To address these issues for the future, we propose a novel membrane-based sensor that only detects intact virions. The sensor combines affinity and size based detection on a membrane-based sensor and does not require external power to operate or read. Specifically, the presence of intact virions, but not viral debris, fouls the membrane and triggers a macroscopically visible hydraulic switch after injection of a 40 μL sample with a pipette. The device, which we call the μSiM-DX (microfluidic device featuring a silicon membrane for diagnostics), features a biotin-coated microslit membrane with pores ∼2–3× larger than the intact virus. Streptavidin-conjugated antibody recognizing viral surface proteins are incubated with the sample for ∼1 hour prior to injection into the device, and positive/negative results are obtained within ten seconds of sample injection. Proof-of-principle tests have been performed using preparations of vaccinia virus. After optimizing slit pore sizes and porous membrane area, the fouling-based sensor exhibits 100% specificity and 97% sensitivity for vaccinia virus ( n = 62). Moreover, the dynamic range of the sensor extends at least from 10 5.9 virions per mL to 10 10.4 virions per mL covering the range of mean viral loads in symptomatic COVID-19 patients (10 5.6 –10 7 RNA copies per mL). Forthcoming work will test the ability of our sensor to perform similarly in biological fluids and with SARS-CoV-2, to fully test the potential of a membrane fouling-based sensor to serve as a PCR-free alternative for POC containment efforts in the spread of infectious disease. 
    more » « less