Long noncoding RNA (lncRNA) plays key roles in tumorigenesis. Misexpression of lncRNA can lead to changes in expression profiles of various target genes, which are involved in cancer initiation and progression. So, identifying key lncRNAs for a cancer would help develop the cancer therapy. Usually, to identify key lncRNAs for a cancer, expression profiles of lncRNAs for normal and cancer samples are required. But, this kind of data are not available for all cancers. In the present study, a computational framework is developed to identify cancer specific key lncRNAs using the lncRNA expression of cancer patients only. The framework consists of two state-of-the-art feature selection techniques - Recursive Feature Elimination (RFE) and Least Absolute Shrinkage and Selection Operator (LASSO); and five machine learning models - Naive Bayes, K-Nearest Neighbor, Random Forest, Support Vector Machine, and Deep Neural Network. For experiment, expression values of lncRNAs for 8 cancers - BLCA, CESC, COAD, HNSC, KIRP, LGG, LIHC, and LUAD - from TCGA are used. The combined dataset consists of 3,656 patients with expression values of 12,309 lncRNAs. Important features or key lncRNAs are identified by using feature selection algorithms RFE and LASSO. Capability of these key lncRNAs in classifying 8 different cancers is checked by the performance of five classification models. This study identified 37 key lncRNAs that can classify 8 different cancer types with an accuracy ranging from 94% to 97%. Finally, survival analysis supports that the discovered key lncRNAs are capable of differentiating between high-risk and low-risk patients.
more »
« less
Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers
Background: Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. Method: To discover the critical lncRNAs that can identify the origin of different cancers, we propose the use of the state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. We thus propose a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. The assumption is that a feature appearing in multiple runs carries more meaningful information about the data under consideration. The genome-wide lncRNA expression profiles of 12 different types of cancers, with a total of 4768 samples available in The Cancer Genome Atlas (TCGA), were analyzed to discover the key lncRNAs. The lncRNAs identified by multiple runs of CAE were added to a final list of key lncRNAs that are capable of identifying 12 different cancers. Results: Our results showed that mrCAE performs better in feature selection than single-run CAE, standard autoencoder (AE), and other state-of-the-art feature selection techniques. This study revealed a set of top-ranking 128 lncRNAs that could identify the origin of 12 different cancers with an accuracy of 95%. Survival analysis showed that 76 of 128 lncRNAs have the prognostic capability to differentiate high- and low-risk groups of patients with different cancers. Conclusion: The proposed mrCAE, which selects actual features, outperformed the AE even though it selects the latent or pseudo-features. By selecting actual features instead of pseudo-features, mrCAE can be valuable for precision medicine. The identified prognostic lncRNAs can be further studied to develop therapies for different cancers.
more »
« less
- Award ID(s):
- 1901628
- PAR ID:
- 10328075
- Date Published:
- Journal Name:
- International Journal of Molecular Sciences
- Volume:
- 22
- Issue:
- 21
- ISSN:
- 1422-0067
- Page Range / eLocation ID:
- 11919
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Finding the network biomarkers of cancers and the analysis of cancer driving genes that are involved in these biomarkers are essential for understanding the dynamics of cancer. Clusters of genes in co-expression networks are commonly known as functional units. This work is based on the hypothesis that the dense clusters or communities in the gene co-expression networks of cancer patients may represent functional units regarding cancer initiation and progression. In this study, RNA-seq gene expression data of three cancers - Breast Invasive Carcinoma (BRCA), Colorectal Adenocarcinoma (COAD) and Glioblastoma Multiforme (GBM) - from The Cancer Genome Atlas (TCGA) are used to construct gene co-expression networks using Pearson Correlation. Six well-known community detection algorithms are applied on these networks to identify communities with five or more genes. A permutation test is performed to further mine the communities that are conserved in other cancers, thus calling them conserved communities. Then survival analysis is performed on clinical data of three cancers using the conserved community genes as prognostic co-variates. The communities that could distinguish the cancer patients between high- and low-risk groups are considered as cancer biomarkers. In the present study, 16 such network biomarkers are discovered.more » « less
-
Abstract Background & Aims Cancer metastasis into distant organs is an evolutionarily selective process. A better understanding of the driving forces endowing proliferative plasticity of tumor seeds in distant soils is required to develop and adapt better treatment systems for this lethal stage of the disease. To this end, we aimed to utilize transcript expression profiling features to predict the site-specific metastases of primary tumors and second, to identify the determinants of tissue specific progression. Methods We used statistical machine learning for transcript feature selection to optimize classification and built tree-based classifiers to predict tissue specific sites of metastatic progression. Results We developed a novel machine learning architecture that analyzes 33 types of RNA transcriptome profiles from The Cancer Genome Atlas (TCGA) database. Our classifier identifies the tumor type, derives synthetic instances of primary tumors metastasizing to distant organs and classifies the site-specific metastases in 16 types of cancers metastasizing to 12 locations. Conclusions We have demonstrated that site specific metastatic progression is predictable using transcriptomic profiling data from primary tumors and that the overrepresented biological processes in tumors metastasizing to congruent distant loci are highly overlapping. These results indicate site-specific progression was organotropic and core features of biological signaling pathways are identifiable that may describe proliferative plasticity in distant soils.more » « less
-
It has been evident that N6-methyladenosine (m6A)-modified long noncoding RNAs (m6A-lncRNAs) involves regulating tumorigenesis, invasion, and metastasis for various cancer types. In this study, we sought to pick computationally up a set of 13 hub m6A-lncRNAs in light of three state-of-the-art tools WGCNA, iWGCNA, and oCEM, and interrogated their prognostic values in brain low-grade gliomas (LGG). Of the 13 hub m6A-lncRNAs, we further detected three hub m6A-lncRNAs as independent prognostic risk factors, including HOXB-AS1, ELOA-AS1, and FLG-AS1 . Then, the m6ALncSig model was built based on these three hub m6A-lncRNAs. Patients with LGG next were divided into two groups, high- and low-risk, based on the median m6ALncSig score. As predicted, the high-risk group was more significantly related to mortality. The prognostic signature of m6ALncSig was validated using internal and external cohorts. In summary, our work introduces a high-confidence prognostic prediction signature and paves the way for using m6A-lncRNAs in the signature as new targets for treatment of LGG.more » « less
-
AACR (Ed.)Abstract Cancer is an intricate disease accountable for the deaths of over 10 million people per year in the United States of America. Several scientific studies showed that the cancer stem cell (CSC) markers have prognostic significance in various cancers and are crucial for designing anticancer drugs to lower cancer death. However, there was a lack of rapid, accurate identification, and analysis, of the prognostic cancer stem cell (CSC) biomarkers in numerous cancer patients. In our laboratory, we identified and analyzed prognostic lung cancer stem cell markers (LCSCs) by using the Immunofluorescence microtissue array (IMA) technique in different lung cancer patient’s tissue biopsy samples and observed that the increased expression of LCSCs principally, CD44 and CD80 in stage IIIA lung cancer tissues compared to normal lung biopsy tissues. We also investigated pancreatic cancer stem cell biomarkers (PAN CSCs) namely CD44 and CD80 with the IMA technique in pancreatic biopsy tissues. The CD44 fluorescence proved an increased expression in adenocarcinoma pancreatic cell tissues when compared to CD80. We also studied and analyzed the stage progression with ovarian cancer stem cell biomarkers (OCSCs) chiefly CD54 and CD44 using the IMA technique in ovarian cancer patients and normal biopsy tissues. The increased expression of CD44 and CD54 were observed in Stage III ovarian cancer tissues compared to normal ovarian tissue indicating the potential role of these OCSC’s biomarkers for the prognosis of ovarian cancer pathogenesis. Our results of prognostic cancer stem cell biomarkers of lung, pancreatic, and ovarian cancers have been analyzed by one-way ANOVA and bioinformatics software (Reactome, Cytoscape PSICQUIC services, STRING) to find underlying molecular mechanism of target gene regulation of increased expression of prognostic CSCs which may give a clue for the prevention and treatment of these cancers. Further research is warranted for these lung, pancreatic, and ovarian CSCs which could be valuable for clinical trials and drug discovery against these CSC biomarkers at early-stage development. Citation Format:Madhumita Das, Kymkecia Henry, Djarie Armstrong, Charle Truman, Charlie Kendrick, Maya S. Saunders, Juan E. Anderson, Malcolm J. Lovett, Rose Stiffin, Ayivi Huisso, Donrie Purcell, Marco Ruiz, Paulo Chaves, Jayanta Kumar Das. Immunofluorescence microtissue array (IMA) for detection of prognostic cancer stem cell biomarkers [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 7077.more » « less
An official website of the United States government

