skip to main content


Title: Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers
Background: Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. Method: To discover the critical lncRNAs that can identify the origin of different cancers, we propose the use of the state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. We thus propose a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. The assumption is that a feature appearing in multiple runs carries more meaningful information about the data under consideration. The genome-wide lncRNA expression profiles of 12 different types of cancers, with a total of 4768 samples available in The Cancer Genome Atlas (TCGA), were analyzed to discover the key lncRNAs. The lncRNAs identified by multiple runs of CAE were added to a final list of key lncRNAs that are capable of identifying 12 different cancers. Results: Our results showed that mrCAE performs better in feature selection than single-run CAE, standard autoencoder (AE), and other state-of-the-art feature selection techniques. This study revealed a set of top-ranking 128 lncRNAs that could identify the origin of 12 different cancers with an accuracy of 95%. Survival analysis showed that 76 of 128 lncRNAs have the prognostic capability to differentiate high- and low-risk groups of patients with different cancers. Conclusion: The proposed mrCAE, which selects actual features, outperformed the AE even though it selects the latent or pseudo-features. By selecting actual features instead of pseudo-features, mrCAE can be valuable for precision medicine. The identified prognostic lncRNAs can be further studied to develop therapies for different cancers.  more » « less
Award ID(s):
1901628
NSF-PAR ID:
10328075
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
International Journal of Molecular Sciences
Volume:
22
Issue:
21
ISSN:
1422-0067
Page Range / eLocation ID:
11919
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Long noncoding RNA (lncRNA) plays key roles in tumorigenesis. Misexpression of lncRNA can lead to changes in expression profiles of various target genes, which are involved in cancer initiation and progression. So, identifying key lncRNAs for a cancer would help develop the cancer therapy. Usually, to identify key lncRNAs for a cancer, expression profiles of lncRNAs for normal and cancer samples are required. But, this kind of data are not available for all cancers. In the present study, a computational framework is developed to identify cancer specific key lncRNAs using the lncRNA expression of cancer patients only. The framework consists of two state-of-the-art feature selection techniques - Recursive Feature Elimination (RFE) and Least Absolute Shrinkage and Selection Operator (LASSO); and five machine learning models - Naive Bayes, K-Nearest Neighbor, Random Forest, Support Vector Machine, and Deep Neural Network. For experiment, expression values of lncRNAs for 8 cancers - BLCA, CESC, COAD, HNSC, KIRP, LGG, LIHC, and LUAD - from TCGA are used. The combined dataset consists of 3,656 patients with expression values of 12,309 lncRNAs. Important features or key lncRNAs are identified by using feature selection algorithms RFE and LASSO. Capability of these key lncRNAs in classifying 8 different cancers is checked by the performance of five classification models. This study identified 37 key lncRNAs that can classify 8 different cancer types with an accuracy ranging from 94% to 97%. Finally, survival analysis supports that the discovered key lncRNAs are capable of differentiating between high-risk and low-risk patients. 
    more » « less
  2. Abstract Background & Aims Cancer metastasis into distant organs is an evolutionarily selective process. A better understanding of the driving forces endowing proliferative plasticity of tumor seeds in distant soils is required to develop and adapt better treatment systems for this lethal stage of the disease. To this end, we aimed to utilize transcript expression profiling features to predict the site-specific metastases of primary tumors and second, to identify the determinants of tissue specific progression. Methods We used statistical machine learning for transcript feature selection to optimize classification and built tree-based classifiers to predict tissue specific sites of metastatic progression. Results We developed a novel machine learning architecture that analyzes 33 types of RNA transcriptome profiles from The Cancer Genome Atlas (TCGA) database. Our classifier identifies the tumor type, derives synthetic instances of primary tumors metastasizing to distant organs and classifies the site-specific metastases in 16 types of cancers metastasizing to 12 locations. Conclusions We have demonstrated that site specific metastatic progression is predictable using transcriptomic profiling data from primary tumors and that the overrepresented biological processes in tumors metastasizing to congruent distant loci are highly overlapping. These results indicate site-specific progression was organotropic and core features of biological signaling pathways are identifiable that may describe proliferative plasticity in distant soils. 
    more » « less
  3. Abstract

    The prognosis of hepatocellular carcinoma (HCC) after R0 resection is unsatisfactory due to the high rate of recurrence. In this study, we investigated the recurrence‐related RNAs and the underlying mechanism. The long noncoding RNA (lncRNA), microRNA (miRNA), and messenger RNA (mRNA) expression data and clinical information of 247 patients who underwent R0 resection patients with HCC were obtained from The Cancer Genome Atlas. Comparing the 1‐year recurrence group (n = 56) with the nonrecurrence group (n = 60), we detected 34 differentially expressed lncRNAs (DElncRNAs), five DEmiRNAs, and 216 DEmRNAs. Of these, three DElncRNAs, hsa‐mir‐150‐5p, and 11 DEmRNAs were selected for constructing the competing endogenous RNA (ceRNA) network. Next, two nomogram models were constructed based separately on the lncRNAs and mRNAs that were further selected by Cox and least absolute shrinkage and selection operator regression analysis. The two nomogram models that showed a high prediction accuracy for disease‐free survival with the concordance indexes at 0.725 and 0.639. Further functional enrichment analysis of DEmRNAs showed that the mRNAs in the ceRNA network and nomogram models were associated with immune pathways. Hence, we constructed a hsa‐mir‐150‐5p‐centric ceRNA network and two effective nomogram prognostic models, and the related RNAs may be useful as potential biomarkers for predicting recurrence in patients with HCC.

     
    more » « less
  4. Finding the network biomarkers of cancers and the analysis of cancer driving genes that are involved in these biomarkers are essential for understanding the dynamics of cancer. Clusters of genes in co-expression networks are commonly known as functional units. This work is based on the hypothesis that the dense clusters or communities in the gene co-expression networks of cancer patients may represent functional units regarding cancer initiation and progression. In this study, RNA-seq gene expression data of three cancers - Breast Invasive Carcinoma (BRCA), Colorectal Adenocarcinoma (COAD) and Glioblastoma Multiforme (GBM) - from The Cancer Genome Atlas (TCGA) are used to construct gene co-expression networks using Pearson Correlation. Six well-known community detection algorithms are applied on these networks to identify communities with five or more genes. A permutation test is performed to further mine the communities that are conserved in other cancers, thus calling them conserved communities. Then survival analysis is performed on clinical data of three cancers using the conserved community genes as prognostic co-variates. The communities that could distinguish the cancer patients between high- and low-risk groups are considered as cancer biomarkers. In the present study, 16 such network biomarkers are discovered. 
    more » « less
  5. Uveal melanoma (UM) is a comparatively rare cancer but requires serious consideration since patients with developing metastatic UM survive only for about 6–12 months. Fortunately, increasingly large multi-omics databases allow us to further understand cancer initiation and development. Moreover, previous studies have observed that associations between copy number aberrations (CNA) or methylation (MET) versus messenger RNA (mRNA) expression have affected these processes. From that, we decide to explore the effect of these associations on a case study of UM. Also, the current subtypes of UM display its weak association with biological phenotypes and its lack of therapy suggestions. Therefore, the re-identification of molecular subtypes is a pressing need. In this study, we recruit three omics profiles, including CNA, MET, and mRNA, in a UM cohort from The Cancer Genome Atlas (TCGA). Firstly, we identify two sets of genes, CNAexp and METexp, whose CNA and MET significantly correlated with their corresponding mRNA, respectively. Then, single and integrative analyses of the three data types are performed using the PINSPlus tool. As a result, we discover two novel integrative subgroups, IntSub1 and IntSub2, which could be a useful alternative classification for UM patients in the future. To further explore molecular events behind each subgroup, we identify their subgroup-specific genes computationally. Accordingly, the highest expressed genes among IntSub1-specific genes are mostly enriched with immune-related processes. On the other hand, IntSub2-specific genes are highly associated with cellular cation homeostasis, which responds effectively to chemotherapy using ion channel inhibitor drugs. In addition, we detect that the two integrative subgroups show different age-related risks and survival rates. These discoveries can influence the frequency of metastatic surveillance and support medical practitioners to choose an appropriate treatment regime. 
    more » « less