skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A systematic evaluation of the computational tools for lncRNA identification
Abstract The computational identification of long non-coding RNAs (lncRNAs) is important to study lncRNAs and their functions. Despite the existence of many computation tools for lncRNA identification, to our knowledge, there is no systematic evaluation of these tools on common datasets and no consensus regarding their performance and the importance of the features used. To fill this gap, in this study, we assessed the performance of 17 tools on several common datasets. We also investigated the importance of the features used by the tools. We found that the deep learning-based tools have the best performance in terms of identifying lncRNAs, and the peptide features do not contribute much to the tool accuracy. Moreover, when the transcripts in a cell type were considered, the performance of all tools significantly dropped, and the deep learning-based tools were no longer as good as other tools. Our study will serve as an excellent starting point for selecting tools and features for lncRNA identification.  more » « less
Award ID(s):
1661414 2015838
PAR ID:
10328294
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Briefings in Bioinformatics
Volume:
22
Issue:
6
ISSN:
1467-5463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Background: Long non-coding Ribonucleic Acids (lncRNAs) can be localized to different cellular compartments, such as the nuclear and the cytoplasmic regions. Their biological functions are influenced by the region of the cell where they are located. Compared to the vast number of lncRNAs, only a relatively small proportion have annotations regarding their subcellular localization. It would be helpful if those few annotated lncRNAs could be leveraged to develop predictive models for localization of other lncRNAs. Methods: Conventional computational methods use q-mer profiles from lncRNA sequences and train machine learning models such as support vector machines and logistic regression with the profiles. These methods focus on the exact q-mer. Given possible sequence mutations and other uncertainties in genomic sequences and their role in biological function, a consideration of these variabilities might improve our ability to model lncRNAs and their localization. Thus, we build on inexact q-mers and use machine learning/deep learning techniques to study three specific problems in lncRNA subcellular localization, namely, prediction of lncRNA localization using inexact q-mers, the issue of whether lncRNA localization is cell-type-specific, and the notion of switching (lncRNA) genes. Results: We performed our analysis using data on lncRNA localization across 15 cell lines. Our results showed that using inexact q-mers (with q = 6) can improve the lncRNA localization prediction performance compared to using exact q-mers. Further, we showed that lncRNA localization, in general, is not cell-line-specific. We also identified a category of LncRNAs which switch cellular compartments between different cell lines (we call them switching lncRNAs). These switching lncRNAs complicate the problem of predicting lncRNA localization using machine learning models, showing that lncRNA localization is still a major challenge. 
    more » « less
  2. Long noncoding RNA (lncRNA) plays key roles in tumorigenesis. Misexpression of lncRNA can lead to changes in expression profiles of various target genes, which are involved in cancer initiation and progression. So, identifying key lncRNAs for a cancer would help develop the cancer therapy. Usually, to identify key lncRNAs for a cancer, expression profiles of lncRNAs for normal and cancer samples are required. But, this kind of data are not available for all cancers. In the present study, a computational framework is developed to identify cancer specific key lncRNAs using the lncRNA expression of cancer patients only. The framework consists of two state-of-the-art feature selection techniques - Recursive Feature Elimination (RFE) and Least Absolute Shrinkage and Selection Operator (LASSO); and five machine learning models - Naive Bayes, K-Nearest Neighbor, Random Forest, Support Vector Machine, and Deep Neural Network. For experiment, expression values of lncRNAs for 8 cancers - BLCA, CESC, COAD, HNSC, KIRP, LGG, LIHC, and LUAD - from TCGA are used. The combined dataset consists of 3,656 patients with expression values of 12,309 lncRNAs. Important features or key lncRNAs are identified by using feature selection algorithms RFE and LASSO. Capability of these key lncRNAs in classifying 8 different cancers is checked by the performance of five classification models. This study identified 37 key lncRNAs that can classify 8 different cancer types with an accuracy ranging from 94% to 97%. Finally, survival analysis supports that the discovered key lncRNAs are capable of differentiating between high-risk and low-risk patients. 
    more » « less
  3. Abstract Accurate identification of microRNA (miRNA) targets at base-pair resolution has been an open problem for over a decade. The recent discovery of miRNA isoforms (isomiRs) adds more complexity to this problem. Despite the existence of many methods, none considers isomiRs, and their performance is still suboptimal. We hypothesize that by taking the isomiR–mRNA interactions into account and applying a deep learning model to study miRNA–mRNA interaction features, we may improve the accuracy of miRNA target predictions. We developed a deep learning tool called DMISO to capture the intricate features of miRNA/isomiR–mRNA interactions. Based on tenfold cross-validation, DMISO showed high precision (95%) and recall (90%). Evaluated on three independent datasets, DMISO had superior performance to five tools, including three popular conventional tools and two recently developed deep learning-based tools. By applying two popular feature interpretation strategies, we demonstrated the importance of the miRNA regions other than their seeds and the potential contribution of the RNA-binding motifs within miRNAs/isomiRs and mRNAs to the miRNA/isomiR–mRNA interactions. 
    more » « less
  4. ABSTRACT Long noncoding RNA (lncRNA) plays important roles in sexual development in eukaryotes. In filamentous fungi, however, little is known about the expression and roles of lncRNAs during fruiting body formation. By profiling developmental transcriptomes during the life cycle of the plant-pathogenic fungus Fusarium graminearum , we identified 547 lncRNAs whose expression was highly dynamic, with about 40% peaking at the meiotic stage. Many lncRNAs were found to be antisense to mRNAs, forming 300 sense-antisense pairs. Although small RNAs were produced from these overlapping loci, antisense lncRNAs appeared not to be involved in gene silencing pathways. Genome-wide analysis of small RNA clusters identified many silenced loci at the meiotic stage. However, we found transcriptionally active small RNA clusters, many of which were associated with lncRNAs. Also, we observed that many antisense lncRNAs and their respective sense transcripts were induced in parallel as the fruiting bodies matured. The nonsense-mediated decay (NMD) pathway is known to determine the fates of lncRNAs as well as mRNAs. Thus, we analyzed mutants defective in NMD and identified a subset of lncRNAs that were induced during sexual development but suppressed by NMD during vegetative growth. These results highlight the developmental stage-specific nature and functional potential of lncRNA expression in shaping the fungal fruiting bodies and provide fundamental resources for studying sexual stage-induced lncRNAs. IMPORTANCE Fusarium graminearum is the causal agent of the head blight on our major staple crops, wheat and corn. The fruiting body formation on the host plants is indispensable for the disease cycle and epidemics. Long noncoding RNA (lncRNA) molecules are emerging as key regulatory components for sexual development in animals and plants. To date, however, there is a paucity of information on the roles of lncRNAs in fungal fruiting body formation. Here we characterized hundreds of lncRNAs that exhibited developmental stage-specific expression patterns during fruiting body formation. Also, we discovered that many lncRNAs were induced in parallel with their overlapping transcripts on the opposite DNA strand during sexual development. Finally, we found a subset of lncRNAs that were regulated by an RNA surveillance system during vegetative growth. This research provides fundamental genomic resources that will spur further investigations on lncRNAs that may play important roles in shaping fungal fruiting bodies. 
    more » « less
  5. Simon, Anne E. (Ed.)
    ABSTRACT Long noncoding RNAs (lncRNAs) of virus origin accumulate in cells infected by many positive-strand (+) RNA viruses to bolster viral infectivity. Their biogenesis mostly utilizes exoribonucleases of host cells that degrade viral genomic or subgenomic RNAs in the 5′-to-3′ direction until being stalled by well-defined RNA structures. Here, we report a viral lncRNA that is produced by a novel replication-dependent mechanism. This lncRNA corresponds to the last 283 nucleotides of the turnip crinkle virus (TCV) genome and hence is designated tiny TCV subgenomic RNA (ttsgR). ttsgR accumulated to high levels in TCV-infected Nicotiana benthamiana cells when the TCV-encoded RNA-dependent RNA polymerase (RdRp), also known as p88, was overexpressed. Both (+) and (−) strand forms of ttsgR were produced in a manner dependent on the RdRp functionality. Strikingly, templates as short as ttsgR itself were sufficient to program ttsgR amplification, as long as the TCV-encoded replication proteins p28 and p88 were provided in trans . Consistent with its replicational origin, ttsgR accumulation required a 5′ terminal carmovirus consensus sequence (CCS), a sequence motif shared by genomic and subgenomic RNAs of many viruses phylogenetically related to TCV. More importantly, introducing a new CCS motif elsewhere in the TCV genome was alone sufficient to cause the emergence of another lncRNA. Finally, abolishing ttsgR by mutating its 5′ CCS gave rise to a TCV mutant that failed to compete with wild-type TCV in Arabidopsis . Collectively, our results unveil a replication-dependent mechanism for the biogenesis of viral lncRNAs, thus suggesting that multiple mechanisms, individually or in combination, may be responsible for viral lncRNA production. IMPORTANCE Many positive-strand (+) RNA viruses produce long noncoding RNAs (lncRNAs) during the process of cellular infections and mobilize these lncRNAs to counteract antiviral defenses, as well as coordinate the translation of viral proteins. Most viral lncRNAs arise from 5′-to-3′ degradation of longer viral RNAs being stalled at stable secondary structures. Here, we report a viral lncRNA that is produced by the replication machinery of turnip crinkle virus (TCV). This lncRNA, designated ttsgR, shares the terminal characteristics with TCV genomic and subgenomic RNAs and overaccumulates in the presence of moderately overexpressed TCV RNA-dependent RNA polymerase (RdRp). Furthermore, templates that are of similar sizes as ttsgR are readily replicated by TCV replication proteins (p28 and RdRp) provided from nonviral sources. In summary, this study establishes an approach for uncovering low abundance viral lncRNAs, and characterizes a replicating TCV lncRNA. Similar investigations on human-pathogenic (+) RNA viruses could yield novel therapeutic targets. 
    more » « less