skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: CoSev: Data-Driven Optimizations for COVID-19 Severity Assessment in Low-Sample Regimes
Given the pronounced impact COVID-19 continues to have on society—infecting 700 million reported individuals and causing 6.96 million deaths—many deep learning works have recently focused on the virus’s diagnosis. However, assessing severity has remained an open and challenging problem due to a lack of large datasets, the large dimensionality of images for which to find weights, and the compute limitations of modern graphics processing units (GPUs). In this paper, a new, iterative application of transfer learning is demonstrated on the understudied field of 3D CT scans for COVID-19 severity analysis. This methodology allows for enhanced performance on the MosMed Dataset, which is a small and challenging dataset containing 1130 images of patients for five levels of COVID-19 severity (Zero, Mild, Moderate, Severe, and Critical). Specifically, given the large dimensionality of the input images, we create several custom shallow convolutional neural network (CNN) architectures and iteratively refine and optimize them, paying attention to learning rates, layer types, normalization types, filter sizes, dropout values, and more. After a preliminary architecture design, the models are systematically trained on a simplified version of the dataset-building models for two-class, then three-class, then four-class, and finally five-class classification. The simplified problem structure allows the model to start learning preliminary features, which can then be further modified for more difficult classification tasks. Our final model CoSev boosts classification accuracies from below 60% at first to 81.57% with the optimizations, reaching similar performance to the state-of-the-art on the dataset, with much simpler setup procedures. In addition to COVID-19 severity diagnosis, the explored methodology can be applied to general image-based disease detection. Overall, this work highlights innovative methodologies that advance current computer vision practices for high-dimension, low-sample data as well as the practicality of data-driven machine learning and the importance of feature design for training, which can then be implemented for improvements in clinical practices.  more » « less
Award ID(s):
2027456
PAR ID:
10510575
Author(s) / Creator(s):
; ;
Publisher / Repository:
MDPI
Date Published:
Journal Name:
Diagnostics
Volume:
14
Issue:
3
ISSN:
2075-4418
Page Range / eLocation ID:
337
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The class activation map (CAM) represents the neural-network-derived region of interest, which can help clarify the mechanism of the convolutional neural network’s determination of any class of interest. In medical imaging, it can help medical practitioners diagnose diseases like COVID-19 or pneumonia by highlighting the suspicious regions in Computational Tomography (CT) or chest X-ray (CXR) film. Many contemporary deep learning techniques only focus on COVID-19 classification tasks using CXRs, while few attempt to make it explainable with a saliency map. To fill this research gap, we first propose a VGG-16-architecture-based deep learning approach in combination with image enhancement, segmentation-based region of interest (ROI) cropping, and data augmentation steps to enhance classification accuracy. Later, a multi-layer Gradient CAM (ML-Grad-CAM) algorithm is integrated to generate a class-specific saliency map for improved visualization in CXR images. We also define and calculate a Severity Assessment Index (SAI) from the saliency map to quantitatively measure infection severity. The trained model achieved an accuracy score of 96.44% for the three-class CXR classification task, i.e., COVID-19, pneumonia, and normal (healthy patients), outperforming many existing techniques in the literature. The saliency maps generated from the proposed ML-GRAD-CAM algorithm are compared with the original Gran-CAM algorithm. 
    more » « less
  2. The overall purpose of this paper is to demonstrate how data preprocessing, training size variation, and subsampling can dynamically change the performance metrics of imbalanced text classification. The methodology encompasses using two different supervised learning classification approaches of feature engineering and data preprocessing with the use of five machine learning classifiers, five imbalanced sampling techniques, specified intervals of training and subsampling sizes, statistical analysis using R and tidyverse on a dataset of 1000 portable document format files divided into five labels from the World Health Organization Coronavirus Research Downloadable Articles of COVID-19 papers and PubMed Central databases of non-COVID-19 papers for binary classification that affects the performance metrics of precision, recall, receiver operating characteristic area under the curve, and accuracy. One approach that involves labeling rows of sentences based on regular expressions significantly improved the performance of imbalanced sampling techniques verified by performing statistical analysis using a t-test documenting performance metrics of iterations versus another approach that automatically labels the sentences based on how the documents are organized into positive and negative classes. The study demonstrates the effectiveness of ML classifiers and sampling techniques in text classification datasets, with different performance levels and class imbalance issues observed in manual and automatic methods of data processing. 
    more » « less
  3. Sepsis is a severe medical illness with over 1.7 million cases reported each year in the United States. Early diagnosis of sepsis is cr- tical to adress adecuate tre remains a major challenge in healthcare due to the nonspecificity of the initial symptoms and the lack of currently available biomarkers that demonstrate sufficient specificity or sensitiv- ity suitable for clinical practice. Wearable optical technologies, such as photoplethysmography (PPG), whic uses optical technology to measure changes in blood volume in peripheral tissues, enabling continuous mon- itoring. Identifying modest physiological changes that indicate sepsis can be challenging since they occur without a body reaction. Deep Learning (DL) models can help overcome the diagnostic gap in sepsis diagnosis and intervention. This study analyzes sepsis-related characteristics in PPG signals utilizing a collection of waveform records from both sepsis and control cases. The proposed model consists of five layers: input sequence, long short-term memory (LSTM), fully-connected, softmax, and classi- fication. The LSTM layer is chosen to extract and filter features from cycles of PPG signals; then, the features pass through a fully-connected layer to be classified. We tested our LSTM-based model on 915 one- second intervals to identify and classify sepsis severity. Our LSTM-based model accurately detected sepsis (91.30% for training and 89.74% for testing). The sepsis severity categorization achieved an accuracy of 85.9% in training and 81.4% in testing. Multiple training attempts were con- ducted to validate the model’s detecting capabilities. Preliminary results show that a deep learning model utilizing an LSTM layer can detect and categorize sepsis using PPG data, potentially allowing for real-time diagnosis and monitoring within a single cycle. 
    more » « less
  4. While COVID-19 text misinformation has already been investigated by various scholars, fewer research efforts have been devoted to characterizing and understanding COVID-19 misinformation that is carried out through visuals like photographs and memes. In this paper, we present a mixed-method analysis of image-based COVID-19 misinformation in 2020 on Twitter. We deploy a computational pipeline to identify COVID-19 related tweets, download the images contained in them, and group together visually similar images. We then develop a codebook to characterize COVID-19 misinformation and manually label images as misinformation or not. Finally, we perform a quantitative analysis of tweets containing COVID-19 misinformation images. We identify five types of COVID-19 misinformation, from a wrong understanding of the threat severity of COVID-19 to the promotion of fake cures and conspiracy theories. We also find that tweets containing COVID-19 misinformation images do not receive more interactions than baseline tweets with random images posted by the same set of users. As for temporal properties, COVID-19 misinformation images are shared for longer periods of time than non-misinformation ones, as well as have longer burst times. %\ywi added "have'' %\ywFor RQ2, we compare non-misinformation images instead of random images, and so it is not a direct comparison. When looking at the users sharing COVID-19 misinformation images on Twitter from the perspective of their political leanings, we find that pro-Democrat and pro-Republican users share a similar amount of tweets containing misleading or false COVID-19 images. However, the types of images that they share are different: while pro-Democrat users focus on misleading claims about the Trump administration's response to the pandemic, as well as often sharing manipulated images intended as satire, pro-Republican users often promote hydroxychloroquine, an ineffective medicine against COVID-19, as well as conspiracy theories about the origin of the virus. Our analysis sets a basis for better understanding COVID-19 misinformation images on social media and the nuances in effectively moderate them. 
    more » « less
  5. Hemanth, Jude (Ed.)
    Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causes coronavirus disease 2019 (COVID-19). Imaging tests such as chest X-ray (CXR) and computed tomography (CT) can provide useful information to clinical staff for facilitating a diagnosis of COVID-19 in a more efficient and comprehensive manner. As a breakthrough of artificial intelligence (AI), deep learning has been applied to perform COVID-19 infection region segmentation and disease classification by analyzing CXR and CT data. However, prediction uncertainty of deep learning models for these tasks, which is very important to safety-critical applications like medical image processing, has not been comprehensively investigated. In this work, we propose a novel ensemble deep learning model through integrating bagging deep learning and model calibration to not only enhance segmentation performance, but also reduce prediction uncertainty. The proposed method has been validated on a large dataset that is associated with CXR image segmentation. Experimental results demonstrate that the proposed method can improve the segmentation performance, as well as decrease prediction uncertainty. 
    more » « less