skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 2 until 12:00 AM ET on Saturday, May 3 due to maintenance. We apologize for the inconvenience.


Title: A multi-instance support vector machine with incomplete data for clinical outcome prediction of COVID-19
In order to manage the public health crisis associated with COVID-19, it is critically important that healthcare workers can quickly identify high-risk patients in order to provide effective treatment with limited resources. Statistical learning tools have the potential to help predict serious infection early-on in the progression of the disease. However, many of these techniques are unable to take full advantage of temporal data on a per-patient basis as they handle the problem as a single-instance classification. Furthermore, these algorithms rely on complete data to make their predictions. In this work, we present a novel approach to handle the temporal and missing data problems, simultaneously; our proposed Simultaneous Imputation-Multi Instance Support Vector Machine method illustrates how multiple instance learning techniques and low-rank data imputation can be utilized to accurately predict clinical outcomes of COVID-19 patients. We compare our approach against recent methods used to predict outcomes on a public dataset with a cohort of 361 COVID-19 positive patients. In addition to improved prediction performance early on in the progression of the disease, our method identifies a collection of biomarkers associated with the liver, immune system, and blood, that deserve additional study and may provide additional insight into causes of patient mortality due to COVID-19. We publish the source code for our method online.  more » « less
Award ID(s):
2029543
PAR ID:
10294507
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health
Page Range / eLocation ID:
1 to 6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Background The novel coronavirus SARS-CoV-2 and its associated disease, COVID-19, have caused worldwide disruption, leading countries to take drastic measures to address the progression of the disease. As SARS-CoV-2 continues to spread, hospitals are struggling to allocate resources to patients who are most at risk. In this context, it has become important to develop models that can accurately predict the severity of infection of hospitalized patients to help guide triage, planning, and resource allocation. Objective The aim of this study was to develop accurate models to predict the mortality of hospitalized patients with COVID-19 using basic demographics and easily obtainable laboratory data. Methods We performed a retrospective study of 375 hospitalized patients with COVID-19 in Wuhan, China. The patients were randomly split into derivation and validation cohorts. Regularized logistic regression and support vector machine classifiers were trained on the derivation cohort, and accuracy metrics (F1 scores) were computed on the validation cohort. Two types of models were developed: the first type used laboratory findings from the entire length of the patient’s hospital stay, and the second type used laboratory findings that were obtained no later than 12 hours after admission. The models were further validated on a multicenter external cohort of 542 patients. Results Of the 375 patients with COVID-19, 174 (46.4%) died of the infection. The study cohort was composed of 224/375 men (59.7%) and 151/375 women (40.3%), with a mean age of 58.83 years (SD 16.46). The models developed using data from throughout the patients’ length of stay demonstrated accuracies as high as 97%, whereas the models with admission laboratory variables possessed accuracies of up to 93%. The latter models predicted patient outcomes an average of 11.5 days in advance. Key variables such as lactate dehydrogenase, high-sensitivity C-reactive protein, and percentage of lymphocytes in the blood were indicated by the models. In line with previous studies, age was also found to be an important variable in predicting mortality. In particular, the mean age of patients who survived COVID-19 infection (50.23 years, SD 15.02) was significantly lower than the mean age of patients who died of the infection (68.75 years, SD 11.83; P<.001). Conclusions Machine learning models can be successfully employed to accurately predict outcomes of patients with COVID-19. Our models achieved high accuracies and could predict outcomes more than one week in advance; this promising result suggests that these models can be highly useful for resource allocation in hospitals. 
    more » « less
  2. null (Ed.)
    Alzheimer’s Disease (AD) is a chronic neurodegenerative disease that causes severe problems in patients’ thinking, memory, and behavior. An early diagnosis is crucial to prevent AD progression; to this end, many algorithmic approaches have recently been proposed to predict cognitive decline. However, these predictive models often fail to integrate heterogeneous genetic and neuroimaging biomarkers and struggle to handle missing data. In this work we propose a novel objective function and an associated optimization algorithm to identify cognitive decline related to AD. Our approach is designed to incorporate dynamic neuroimaging data by way of a participant-specific augmentation combined with multimodal data integration aligned via a regression task. Our approach, in order to incorporate additional side-information, utilizes structured regularization techniques popularized in recent AD literature. Armed with the fixed-length vector representation learned from the multimodal dynamic and static modalities, conventional machine learning methods can be used to predict the clinical outcomes associated with AD. Our experimental results show that the proposed augmentation model improves the prediction performance on cognitive assessment scores for a collection of popular machine learning algorithms. The results of our approach are interpreted to validate existing genetic and neuroimaging biomarkers that have been shown to be predictive of cognitive decline. 
    more » « less
  3. Abstract: In 2019, a series of novel pneumonia cases later known as Coronavirus Disease 2019 (COVID-19) were reported in Wuhan, China. Chest computed tomography (CT) has played a key role in the management and prognostication in COVID-19 patients. CT has demonstrated 98%sensitivity in detecting COVID-19, including identifying lung abnormalities that are suggestive of COVID-19, even among asymptomatic individuals. Methods: We conducted a comprehensive literature review of 17 published studies, including focuses on three subgroups, pediatric patients, pregnant women, and patients over 60 years old, to identify key characteristics of chest CT in COVID-19 patients. Results: Our comprehensive review of the 17 studies concluded that the main CT imaging finding is ground glass opacities (GGOs) regardless of patient age. We also identified that crazy paving pattern, reverse halo sign, smooth or irregular septal thickening, and pleural thickening may serve as indicators of disease progression. Lesions on CT scans were dominantly distributed in the peripheral zone with multilobar involvement, specifically concentrated in the lower lobes. In the patients over 60 years old, the proportion of substantial lobe involvement was higher than the controlgroup and crazy paving signs, bronchodilation, and pleural thickening were more commonly present. Conclusion: Based on all 17 studies, CT findings in COVID-19 have shown a predictable pattern of evolution over the disease. These studies have proven that CT may be an effective approach for early screening and detection of COVID-19. 
    more » « less
  4. Abstract Proteins are direct products of the genome and metabolites are functional products of interactions between the host and other factors such as environment, disease state, clinical information, etc. Omics data, including proteins and metabolites, are useful in characterizing biological processes underlying COVID-19 along with patient data and clinical information, yet few methods are available to effectively analyze such diverse and unstructured data. Using an integrated approach that combines proteomics and metabolomics data, we investigated the changes in metabolites and proteins in relation to patient characteristics (e.g., age, gender, and health outcome) and clinical information (e.g., metabolic panel and complete blood count test results). We found significant enrichment of biological indicators of lung, liver, and gastrointestinal dysfunction associated with disease severity using publicly available metabolite and protein profiles. Our analyses specifically identified enriched proteins that play a critical role in responses to injury or infection within these anatomical sites, but may contribute to excessive systemic inflammation within the context of COVID-19. Furthermore, we have used this information in conjunction with machine learning algorithms to predict the health status of patients presenting symptoms of COVID-19. This work provides a roadmap for understanding the biochemical pathways and molecular mechanisms that drive disease severity, progression, and treatment of COVID-19. 
    more » « less
  5. The COVID-19 pandemic has changed the lives of many people around the world. Based on the available data and published reports, most people diagnosed with COVID-19 exhibit no or mild symptoms and could be discharged home for self-isolation. Considering that a substantial portion of them will progress to a severe disease requiring hospitalization and medical management, including respiratory and circulatory support in the form of supplemental oxygen therapy, mechanical ventilation, vasopressors, etc. The continuous monitoring of patient conditions at home for patients with COVID-19 will allow early determination of disease severity and medical intervention to reduce morbidity and mortality. In addition, this will allow early and safe hospital discharge and free hospital beds for patients who are in need of admission. In this review, we focus on the recent developments in next-generation wearable sensors capable of continuous monitoring of disease symptoms, particularly those associated with COVID-19. These include wearable non/minimally invasive biophysical (temperature, respiratory rate, oxygen saturation, heart rate, and heart rate variability) and biochemical (cytokines, cortisol, and electrolytes) sensors, sensor data analytics, and machine learning-enabled early detection and medical intervention techniques. Together, we aim to inspire the future development of wearable sensors integrated with data analytics, which serve as a foundation for disease diagnostics, health monitoring and predictions, and medical interventions. 
    more » « less