skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Title: Deep learning data augmentation for Raman spectroscopy cancer tissue classification
Abstract Recently, Raman Spectroscopy (RS) was demonstrated to be a non-destructive way of cancer diagnosis, due to the uniqueness of RS measurements in revealing molecular biochemical changes between cancerous vs. normal tissues and cells. In order to design computational approaches for cancer detection, the quality and quantity of tissue samples for RS are important for accurate prediction. In reality, however, obtaining skin cancer samples is difficult and expensive due to privacy and other constraints. With a small number of samples, the training of the classifier is difficult, and often results in overfitting. Therefore, it is important to have more samples to better train classifiers for accurate cancer tissue classification. To overcome these limitations, this paper presents a novel generative adversarial network based skin cancer tissue classification framework. Specifically, we design a data augmentation module that employs a Generative Adversarial Network (GAN) to generate synthetic RS data resembling the training data classes. The original tissue samples and the generated data are concatenated to train classification modules. Experiments on real-world RS data demonstrate that (1) data augmentation can help improve skin cancer tissue classification accuracy, and (2) generative adversarial network can be used to generate reliable synthetic Raman spectroscopic data.  more » « less
Award ID(s):
2027339 1763452 1828181
PAR ID:
10314146
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Scientific Reports
Volume:
11
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a novel algorithm that is able to generate deep synthetic COVID-19 pneumonia CT scan slices using a very small sample of positive training images in tandem with a larger number of normal images. This generative algorithm produces images of sufficient accuracy to enable a DNN classifier to achieve high classification accuracy using as few as 10 positive training slices (from 10 positive cases), which to the best of our knowledge is one order of magnitude fewer than the next closest published work at the time of writing. Deep learning with extremely small positive training volumes is a very difficult problem and has been an important topic during the COVID-19 pandemic, because for quite some time it was difficult to obtain large volumes of COVID-19-positive images for training. Algorithms that can learn to screen for diseases using few examples are an important area of research. Furthermore, algorithms to produce deep synthetic images with smaller data volumes have the added benefit of reducing the barriers of data sharing between healthcare institutions. We present the cycle-consistent segmentation-generative adversarial network (CCS-GAN). CCS-GAN combines style transfer with pulmonary segmentation and relevant transfer learning from negative images in order to create a larger volume of synthetic positive images for the purposes of improving diagnostic classification performance. The performance of a VGG-19 classifier plus CCS-GAN was trained using a small sample of positive image slices ranging from at most 50 down to as few as 10 COVID-19-positive CT scan images. CCS-GAN achieves high accuracy with few positive images and thereby greatly reduces the barrier of acquiring large training volumes in order to train a diagnostic classifier for COVID-19. 
    more » « less
  2. Human skeleton data provides a compact, low noise representation of relative joint locations that may be used in human identity and activity recognition. Hierarchical Co-occurrence Network (HCN) has been used for human activity recognition because of its ability to consider correlation between joints in convolutional operations in the network. HCN shows good identification accuracy but requires a large number of samples to train. Acquisition of this large-scale data can be time consuming and expensive, motivating synthetic skeleton data generation for data augmentation in HCN. We propose a novel method that integrates an Auxiliary Classifier Generative Adversarial Network (AC-GAN) and HCN hybrid framework for Assessment and Augmented Identity Recognition for Skeletons (AAIRS). The proposed AAIRS method performs generation and evaluation of synthetic 3-dimensional motion capture skeleton videos followed by human identity recognition. Synthetic skeleton data produced by the generator component of the AC-GAN is evaluated using an Inception Score-inspired realism metric computed from the HCN classifier outputs. We study the effect of increasing the percentage of synthetic samples in the training set on HCN performance. Before synthetic data augmentation, we achieve 74.49% HCN performance in 10-fold cross validation for 9-class human identification. With a synthetic-real mixture of 50%-50%, we achieve 78.22% mean accuracy, significantly 
    more » « less
  3. Reflectance confocal microscopy (RCM) is a noninvasive optical imaging modality that allows for cellular-level resolution, in vivo images of skin without performing a traditional skin biopsy. RCM image interpretation currently requires specialized training to interpret the grayscale output images that are difficult to correlate with tissue pathology. Here, we use a deep learning-based framework that uses a convolutional neural network to transform grayscale output images into virtually-stained hematoxylin and eosin (H&E)-like images allowing for the visualization of various skin layers, including the epidermis, dermal-epidermal junction, and superficial dermis layers. To train the deep-learning framework, a stack of a minimum of 7 time-lapsed, successive RCM images of excised tissue were obtained from epidermis to dermis 1.52 microns apart to a depth of 60.96 microns using the Vivascope 3000. The tissue was embedded in agarose tissue and a curette was used to create a tunnel through which drops of 50% acetic acid was used to stain cell nuclei. These acetic acid-stained images were used as “ground truth” to train a deep convolutional neural network using a conditional generative adversarial network (GAN)-based machine learning algorithm to digitally convert the images into GAN-based H&E-stained digital images. We used the already trained machine learning algorithm and retrained the algorithm with new samples to include squamous neoplasms. Through further training and refinement of the algorithm, high-resolution, histological quality images can be obtained to aid in earlier diagnosis and treatment of cutaneous neoplasms. The overall goal of obtaining biopsy-free virtual histology images with this technology can be used to provide real-time outputs of virtually-stained H&E skin lesions, thus decreasing the need for invasive diagnostic procedures and enabling greater uptake of the technology by the medical community. 
    more » « less
  4. Robinson, Peter (Ed.)
    Abstract Motivation Accurate disease phenotype prediction plays an important role in the treatment of heterogeneous diseases like cancer in the era of precision medicine. With the advent of high throughput technologies, more comprehensive multi-omics data is now available that can effectively link the genotype to phenotype. However, the interactive relation of multi-omics datasets makes it particularly challenging to incorporate different biological layers to discover the coherent biological signatures and predict phenotypic outcomes. In this study, we introduce omicsGAN, a generative adversarial network model to integrate two omics data and their interaction network. The model captures information from the interaction network as well as the two omics datasets and fuse them to generate synthetic data with better predictive signals. Results Large-scale experiments on The Cancer Genome Atlas breast cancer, lung cancer and ovarian cancer datasets validate that (i) the model can effectively integrate two omics data (e.g. mRNA and microRNA expression data) and their interaction network (e.g. microRNA-mRNA interaction network). The synthetic omics data generated by the proposed model has a better performance on cancer outcome classification and patients survival prediction compared to original omics datasets. (ii) The integrity of the interaction network plays a vital role in the generation of synthetic data with higher predictive quality. Using a random interaction network does not allow the framework to learn meaningful information from the omics datasets; therefore, results in synthetic data with weaker predictive signals. Availability and implementation Source code is available at: https://github.com/CompbioLabUCF/omicsGAN. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  5. Significant resources have been spent in collecting and storing large and heterogeneous radar datasets during expensive Arctic and Antarctic fieldwork. The vast majority of data available is unlabeled, and the labeling process is both time-consuming and expensive. One possible alternative to the labeling process is the use of synthetically generated data with artificial intelligence. Instead of labeling real images, we can generate synthetic data based on arbitrary labels. In this way, training data can be quickly augmented with additional images. In this research, we evaluated the performance of synthetically generated radar images based on modified cycle-consistent adversarial networks. We conducted several experiments to test the quality of the generated radar imagery. We also tested the quality of a state-of-the-art contour detection algorithm on synthetic data and different combinations of real and synthetic data. Our experiments show that synthetic radar images generated by generative adversarial network (GAN) can be used in combination with real images for data augmentation and training of deep neural networks. However, the synthetic images generated by GANs cannot be used solely for training a neural network (training on synthetic and testing on real) as they cannot simulate all of the radar characteristics such as noise or Doppler effects. To the best of our knowledge, this is the first work in creating radar sounder imagery based on generative adversarial network. 
    more » « less