skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Rapid Detection of Bacteria Using Raman Spectroscopy and Deep Learning
Bacteria identification can be a time-consuming process. Machine learning algorithms that use deep convolutional neural networks (CNNs) provide a promising alternative. Here, we present a deep learning based approach paired with Raman spectroscopy to rapidly and accurately detect the identity of a bacteria class. We propose a simple 4-layer CNN architecture and use a 30-class bacteria isolate dataset for training and testing. We achieve an identification accuracy of around 86% with identification speeds close to real-time. This optical/biological detection method is promising for applications in the detection of microbes in liquid biopsies and concentrated environmental liquid samples, where fast and accurate detection is crucial. This study uses a recently published dataset of Raman spectra from bacteria samples and an improved CNN model built with TensorFlow. Results show improved identification accuracy and reduced network complexity.  more » « less
Award ID(s):
1757953 1827847
PAR ID:
10281235
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC)
Page Range / eLocation ID:
0796 to 0799
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Pollen identification is necessary for several subfields of geology, ecology, and evolutionary biology. However, the existing methods for pollen identification are laborious, time-consuming, and require highly skilled scientists. Therefore, there is a pressing need for an automated and accurate system for pollen identification, which can be beneficial for both basic research and applied issues such as identifying airborne allergens. In this study, we propose a deep learning (DL) approach to classify pollen grains in the Great Basin Desert, Nevada, USA. Our dataset consisted of 10,000 images of 40 pollen species. To mitigate the limitations imposed by the small volume of our training dataset, we conducted an in-depth comparative analysis of numerous pre-trained Convolutional Neural Network (CNN) architectures utilizing transfer learning methodologies. Simultaneously, we developed and incorporated an innovative CNN model, serving to augment our exploration and optimization of data modeling strategies. We applied different architectures of well-known pre-trained deep CNN models, including AlexNet, VGG-16, MobileNet-V2, ResNet (18, 34, and 50, 101), ResNeSt (50, 101), SE-ResNeXt, and Vision Transformer (ViT), to uncover the most promising modeling approach for the classification of pollen grains in the Great Basin. To evaluate the performance of the pre-trained deep CNN models, we measured accuracy, precision, F1-Score, and recall. Our results showed that the ResNeSt-110 model achieved the best performance, with an accuracy of 97.24%, precision of 97.89%, F1-Score of 96.86%, and recall of 97.13%. Our results also revealed that transfer learning models can deliver better and faster image classification results compared to traditional CNN models built from scratch. The proposed method can potentially benefit various fields that rely on efficient pollen identification. This study demonstrates that DL approaches can improve the accuracy and efficiency of pollen identification, and it provides a foundation for further research in the field. 
    more » « less
  2. In the past few years, there have been many research studies conducted in the field of Satellite Image Classification. The purposes of these studies included flood identification, forest fire monitoring, greenery land identification, and land-usage identification. In this field, finding suitable data is often considered problematic, and some research has also been done to identify and extract suitable datasets for classification. Although satellite data can be challenging to deal with, Convolutional Neural Networks (CNNs), which consist of multiple interconnected neurons, have shown promising results when applied to satellite imagery data. In the present work, first we have manually downloaded satellite images of four different classes in Florida locations using the TerraFly Mapping System, developed and managed by the High Performance Database Research Center at Florida International University. We then develop a CNN architecture suitable for extracting features and capable of multi-class classification in our dataset. We discuss the shortcomings in the classification due to the limited size of the dataset. To address this issue, we first employ data augmentation and then utilize transfer learning methodology for feature extraction with VGG16 and ResNet50 pretrained models. We use these features to classify satellite imagery of Florida. We analyze the misclassification in our model and, to address this issue, we introduce a location-based CNN model. We convert coordinates to geohash codes, use these codes as an additional feature vector and feed them into the CNN model. We believe that the new CNN model combined with geohash codes as location features provides a better accuracy for our dataset. 
    more » « less
  3. Rapid identification of newly emerging or circulating viruses is an important first step toward managing the public health response to potential outbreaks. A portable virus capture device, coupled with label-free Raman spectroscopy, holds the promise of fast detection by rapidly obtaining the Raman signature of a virus followed by a machine learning (ML) approach applied to recognize the virus based on its Raman spectrum, which is used as a fingerprint. We present such an ML approach for analyzing Raman spectra of human and avian viruses. A convolutional neural network (CNN) classifier specifically designed for spectral data achieves very high accuracy for a variety of virus type or subtype identification tasks. In particular, it achieves 99% accuracy for classifying influenza virus type A versus type B, 96% accuracy for classifying four subtypes of influenza A, 95% accuracy for differentiating enveloped and nonenveloped viruses, and 99% accuracy for differentiating avian coronavirus (infectious bronchitis virus [IBV]) from other avian viruses. Furthermore, interpretation of neural net responses in the trained CNN model using a full-gradient algorithm highlights Raman spectral ranges that are most important to virus identification. By correlating ML-selected salient Raman ranges with the signature ranges of known biomolecules and chemical functional groups—for example, amide, amino acid, and carboxylic acid—we verify that our ML model effectively recognizes the Raman signatures of proteins, lipids, and other vital functional groups present in different viruses and uses a weighted combination of these signatures to identify viruses. 
    more » « less
  4. A major challenge in Infrastructure as a Service (IaaS) clouds is its exposure to malware. Malware can spread rapidly within a datacenter and can cause major disruption to a cloud service provider and its clients. This paper introduces and discusses an effective malware detection approach in cloud infrastructure using Convolutional Neural Network (CNN), a deep learning approach. We initially employ a standard 2d CNN by training on metadata available for each of the processes in a virtual machine (VM) obtained by means of the hypervisor. We enhance the CNN classifier accuracy by using a novel 3d CNN (where an input is a collection of samples over a time interval), which greatly helps reduce mislabelled samples during data collection and training. Our experiments are performed on data collected by running various malware (mostly Trojans and Rootkits) on VMs. The malware used in our experiments are randomly selected. This reduces the selection bias of known-to-be highly active malware for easy detection. We demonstrate that our 2d CNN model reaches an accuracy of ' 79%, and our 3d CNN model significantly improves the accuracy to ' 90%. 
    more » « less
  5. Low surface brightness galaxies (LSBGs), galaxies that are fainter than the dark night sky, are famously difficult to detect. Nonetheless, studies of these galaxies are essential to improve our understanding of the formation and evolution of low-mass galaxies. In this work, we train a deep learning model using the Mask R-CNN framework on a set of simulated LSBGs inserted into images from the Dark Energy Survey (DES) Data Release 2 (DR2). This deep learning model is combined with several conventional image pre-processing steps to develop a pipeline for the detection of LSBGs. We apply this pipeline to the full DES DR2 coadd image dataset, and preliminary results show the detection of 22 large, high-quality LSBG candidates that went undetected by conventional algorithms. Furthermore, we find that the performance of our algorithm is greatly improved by including examples of false positives as an additional class during training. 
    more » « less