skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The DOI auto-population feature in the Public Access Repository (PAR) will be unavailable from 4:00 PM ET on Tuesday, July 8 until 4:00 PM ET on Wednesday, July 9 due to scheduled maintenance. We apologize for the inconvenience caused.


This content will become publicly available on September 1, 2025

Title: Accurate assembly of circular RNAs with TERRACE
Circular RNA (circRNA) is a class of RNA molecules that forms a closed loop with their 5′ and 3′ ends covalently bonded. CircRNAs are known to be more stable than linear RNAs, have distinct properties and functions, and are promising biomarkers. Existing methods for assembling circRNAs heavily rely on the annotated transcriptomes, hence exhibiting unsatisfactory accuracy without a high-quality transcriptome. We present TERRACE, a new algorithm for full-length assembly of circRNAs from paired-end total RNA-seq data. TERRACE uses the splice graph as the underlying data structure that organizes the splicing and coverage information. We transform the problem of assembling circRNAs into finding paths that “bridge” the three fragments in the splice graph induced by back-spliced reads. We adopt a definition for optimal bridging paths and a dynamic programming algorithm to calculate such optimal paths. TERRACE features an efficient algorithm to detect back-spliced reads missed by RNA-seq aligners, contributing to its much-improved sensitivity. It also incorporates a new machine-learning approach trained to assign a confidence score to each assembled circRNA, which is shown to be superior to using abundance for scoring. On both simulations and biological data sets, TERRACE consistently outperforms existing methods by a large margin in sensitivity while achieving better or comparable precision. In particular, when the annotations are not provided, TERRACE assembles 123%–413% more correct circRNAs than state-of-the-art methods. TERRACE presents a significant advance in assembling full-length circRNAs from RNA-seq data, and we expect it to be widely used in future research on circRNAs.  more » « less
Award ID(s):
2019797 2145171
PAR ID:
10552874
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Cold Spring Harbor Laboratory Press
Date Published:
Journal Name:
Genome Research
Volume:
34
Issue:
9
ISSN:
1088-9051
Page Range / eLocation ID:
1365 to 1370
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Circular RNAs (circRNAs) are covalently closed single‐stranded RNAs, generated through a back‐splicing process that links a downstream 5′ site to an upstream 3′ end. The only distinction in the sequence between circRNA and their linear cognate RNA is the back splice junction. Their low abundance and sequence similarity with their linear origin RNA have made the discovery and identification of circRNA challenging. We have identified almost 6000 novel circRNAs fromLotus japonicusleaf tissue using different enrichment, amplification, and sequencing methods as well as alternative bioinformatics pipelines. The different methodologies identified different pools of circRNA with little overlap. We validated circRNA identified by the different methods using reverse transcription polymerase chain reaction and characterized sequence variations using nanopore sequencing. We compared validated circRNA identified inL. japonicusto other plant species and showed conservation of high‐confidence circRNA‐expressing genes. This is the first identification ofL. japonicuscircRNA and provides a resource for further characterization of their function in gene regulation. CircRNAs identified in this study originated from genes involved in all biological functions of eukaryotic cells. The comparison of methodologies and technologies to sequence, identify, analyze, and validate circRNA from plant tissues will enable further research to characterize the function and biogenesis of circRNA inL. japonicus. 
    more » « less
  2. Summary Although transcriptomics data is typically used to analyze mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on individual samples, and rarely allow comparisons between groups of samples (e.g. healthy vs. diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, ie reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions. Here, we present DifferentialRegulation, a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, vs. state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data. DifferentialRegulation is distributed as a Bioconductor R package. 
    more » « less
  3. The high-throughput short-reads RNA-seq protocols often produce paired-end reads, with the middle portion of the fragments being unsequenced. We explore if the full-length fragments can be com- putationally reconstructed from the sequenced two ends in the absence of the reference genome—a problem here we refer to as de novo bridging. Solving this problem provides longer, more infor- mative RNA-seq reads, and benefits downstream RNA-seq analysis such as transcript assembly, expression quantification, and splic- ing differential analysis. However, de novo bridging is a challeng- ing and complicated task owing to alternative splicing, transcript noises, and sequencing errors. It remains unclear if the data pro- vides sufficient information for accurate bridging, let alone efficient algorithms that determine the true bridges. Methods have been proposed to bridge paired-end reads in the presence of reference genome (called reference-based bridging), but the algorithms are far away from scaling for de novo bridging as the underlying com- pacted de Bruijn graph (cdBG) used in the latter task often contains millions of vertices and edges. We designed a new truncated Dijk- stra’s algorithm for this problem, and proposed a novel algorithm that reuses the shortest path tree to avoid running the truncated Di- jkstra’s algorithm from scratch for all vertices for further speeding up. These innovative techniques result in scalable algorithms that can bridge all paired-end reads in a cdBG with millions of vertices. Our experiments showed that paired-end RNA-seq reads can be accurately bridged to a large extent. The resulting tool is freely available at https://github.com/Shao-Group/rnabridge-denovo. 
    more » « less
  4. CircRNAs are a category of regulatory RNAs that have garnered significant attention in the field of regulatory RNA research due to their structural stability and tissue-specific expression. Their circular configuration, formed via back-splicing, results in a covalently closed structure that exhibits greater resistance to exonucleases compared to linear RNAs. The distinctive regulation of circRNAs is closely associated with several physiological processes, as well as the advancement of pathophysiological processes in several human diseases. Despite a good understanding of the biogenesis of circular RNA, details of their biological roles are still being explored. With the steady rise in the number of investigations being carried out regarding the involvement of circRNAs in various regulatory pathways, understanding the biological and clinical relevance of circRNA-mediated regulation has become challenging. Given the vast landscape of circRNA research in the development of the heart and vasculature, we evaluated cardiovascular system research as a model to critically review the state-of-the-art understanding of the biologically relevant functions of circRNAs. We conclude the review with a discussion of the limitations of current functional studies and provide potential solutions by which these limitations can be addressed to identify and validate the meaningful and impactful functions of circRNAs in different physiological processes and diseases. 
    more » « less
  5. Pissis, Solon P; Sung, Wing-Kin (Ed.)
    Modern sequencing technologies allow for the addition of short-sequence tags, known as anchors, to both ends of a captured molecule. Anchors are useful in assembling the full-length sequence of a captured molecule as they can be used to accurately determine the endpoints. One representative of such anchor-enabled technology is LoopSeq Solo, a synthetic long read (SLR) sequencing protocol. LoopSeq Solo also achieves ultra-high sequencing depth and high purity of short reads covering the entire captured molecule. Despite the availability of many assembly methods, constructing full-length sequence from these anchor-enabled, ultra-high coverage sequencing data remains challenging due to the complexity of the underlying assembly graphs and the lack of specific algorithms leveraging anchors. We present Anchorage, a novel assembler that performs anchor-guided assembly for ultra-high-depth sequencing data. Anchorage starts with a kmer-based approach for precise estimation of molecule lengths. It then formulates the assembly problem as finding an optimal path that connects the two nodes determined by anchors in the underlying compact de Bruijn graph. The optimality is defined as maximizing the weight of the smallest node while matching the estimated sequence length. Anchorage uses a modified dynamic programming algorithm to efficiently find the optimal path. Through both simulations and real data, we show that Anchorage outperforms existing assembly methods, particularly in the presence of sequencing artifacts. Anchorage fills the gap in assembling anchor-enabled data. We anticipate its broad use as anchor-enabled sequencing technologies become prevalent. Anchorage is freely available at https://github.com/Shao-Group/anchorage; the scripts and documents that can reproduce all experiments in this manuscript are available at https://github.com/Shao-Group/anchorage-test. 
    more » « less