DNA methylation is a process that can affect gene accessibility and therefore gene expression. In this study, a machine learning pipeline is proposed for the prediction of breast cancer and the identification of significant genes that contribute to the prediction. The current study utilized breast cancer methylation data from The Cancer Genome Atlas (TCGA), specifically the TCGA-BRCA dataset. Feature engineering techniques have been utilized to reduce data volume and make deep learning scalable. A comparative analysis of the proposed approach on Illumina 27K and 450K methylation data reveals that deep learning methodologies for cancer prediction can be coupled with feature selection models to enhance prediction accuracy. Prediction using 450K methylation markers can be accomplished in less than 13 s with an accuracy of 98.75%. Of the list of 685 genes in the feature selected 27K dataset, 578 were mapped to Ensemble Gene IDs. This reduced set was significantly (FDR < 0.05) enriched in five biological processes and one molecular function. Of the list of 1572 genes in the feature selected 450K data set, 1290 were mapped to Ensemble Gene IDs. This reduced set was significantly (FDR < 0.05) enriched in 95 biological processes and 17 molecular functions. Seven oncogene/tumor suppressor genes were common between the 27K and 450K feature selected gene sets. These genes were RTN4IP1, MYO18B, ANP32A, BRF1, SETBP1, NTRK1, and IGF2R. Our bioinformatics deep learning workflow, incorporating imputation and data balancing methods, is able to identify important methylation markers related to functionally important genes in breast cancer with high accuracy compared to deep learning or statistical models alone.
more »
« less
Layer-Wise Pre-Training Low-Rank NMF Model for Mammogram-Based Breast Tumor Classification
Image-based breast tumor classification is an active and challenging problem. In this paper, a robust breast tumor classification framework is presented based on deep feature representation learning and exploiting available information in existing samples. Feature representation learning of mammograms is fulfilled by a modified nonnegative matrix factorization model called LPML-LRNMF, which is motivated by hierarchical learning and layer-wise pre-training (LP) strategy in deep learning. Low-rank (LR) constraint is integrated into the feature representation learning model by considering
more »
« less
- Award ID(s):
- 1719932
- PAR ID:
- 10189616
- Date Published:
- Journal Name:
- Journal of the Operations Research Society of China
- ISSN:
- 2194-668X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A brain tumor is an abnormal growth in the brain that disrupts its functionality and poses a significant threat to human life by damaging neurons. Early detection and classification of brain tumors are crucial to prevent complications and maintain good health. Recent advancements in deep learning techniques have shown immense potential in image classification and segmentation for tumor identification and classification. In this study, we present a platform, BrainView, for detection, and segmentation of brain tumors from Magnetic Resonance Images (MRI) using deep learning. We utilized EfficientNetB7 pre-trained model to design our proposed DeepBrainNet classification model for analyzing brain MRI images to classify its type. We also proposed a EfficinetNetB7 based image segmentation model, called the EffB7-UNet, for tumor localization. Experimental results show significantly high classification (99.96%) and segmentation (92.734%) accuracies for our proposed models. Finally, we discuss the contours of a cloud application for BrainView using Flask and Flutter to help researchers and clinicians use our machine learning models online for research purposes.more » « less
-
Xu, Jinbo (Ed.)Abstract Motivation Motions of transmembrane receptors on cancer cell surfaces can reveal biophysical features of the cancer cells, thus providing a method for characterizing cancer cell phenotypes. While conventional analysis of receptor motions in the cell membrane mostly relies on the mean-squared displacement plots, much information is lost when producing these plots from the trajectories. Here we employ deep learning to classify breast cancer cell types based on the trajectories of epidermal growth factor receptor (EGFR). Our model is an artificial neural network trained on the EGFR motions acquired from six breast cancer cell lines of varying invasiveness and receptor status: MCF7 (hormone receptor positive), BT474 (HER2-positive), SKBR3 (HER2-positive), MDA-MB-468 (triple negative, TN), MDA-MB-231 (TN) and BT549 (TN). Results The model successfully classified the trajectories within individual cell lines with 83% accuracy and predicted receptor status with 85% accuracy. To further validate the method, epithelial–mesenchymal transition (EMT) was induced in benign MCF10A cells, noninvasive MCF7 cancer cells and highly invasive MDA-MB-231 cancer cells, and EGFR trajectories from these cells were tested. As expected, after EMT induction, both MCF10A and MCF7 cells showed higher rates of classification as TN cells, but not the MDA-MB-231 cells. Whereas deep learning-based cancer cell classifications are primarily based on the optical transmission images of cell morphology and the fluorescence images of cell organelles or cytoskeletal structures, here we demonstrated an alternative way to classify cancer cells using a dynamic, biophysical feature that is readily accessible. Availability and implementation A python implementation of deep learning-based classification can be found at https://github.com/soonwoohong/Deep-learning-for-EGFR-trajectory-classification. Supplementary information Supplementary data are available at Bioinformatics online.more » « less
-
This paper proposes to enable deep learning for generic machine learning tasks. Our goal is to allow deep learning to be applied to data which are already represented in instance feature tabular format for a better classification accuracy. Because deep learning relies on spatial/temporal correlation to learn new feature representation, our theme is to convert each instance of the original dataset into a synthetic matrix format to take the full advantage of the feature learning power of deep learning methods. To maximize the correlation of the matrix , we use 0/1 optimization to reorder features such that the ones with strong correlations are adjacent to each other. By using a two dimensional feature reordering, we are able to create a synthetic matrix, as an image, to represent each instance. Because the synthetic image preserves the original feature values and data correlation, existing deep learning algorithms, such as convolutional neural networks (CNN), can be applied to learn effective features for classification. Our experiments on 20 generic datasets, using CNN as the deep learning classifier, confirm that enabling deep learning to generic datasets has clear performance gain, compared to generic machine learning methods. In addition, the proposed method consistently outperforms simple baselines of using CNN for generic dataset. As a result, our research allows deep learning to be broadly applied to generic datasets for learning and classificationmore » « less
-
Breast cancer is the leading cancer affecting women globally. Despite deep learning models making significant strides in diagnosing and treating this disease, ensuring fair outcomes across diverse populations presents a challenge, particularly when certain demographic groups are underrepresented in training datasets. Addressing the fairness of AI models across varied demographic backgrounds is crucial. This study analyzes demographic representation within the publicly accessible Emory Breast Imaging Dataset (EMBED), which includes de-identified mammography and clinical data. We spotlight the data disparities among racial and ethnic groups and assess the biases in mammography image classification models trained on this dataset, specifically ResNet-50 and Swin Transformer V2. Our evaluation of classification accuracies across these groups reveals significant variations in model performance, highlighting concerns regarding the fairness of AI diagnostic tools. This paper emphasizes the imperative need for fairness in AI and suggests directions for future research aimed at increasing the inclusiveness and dependability of these technologies in healthcare settings. Code is available at: https://github.com/kuanhuang0624/EMBEDFairModels.more » « less
An official website of the United States government

