skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, June 12 until 2:00 AM ET on Friday, June 13 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on December 13, 2025

Title: Tree polynomials identify a link between co-transcriptional R-loops and nascent RNA folding
R-loops are a class of non-canonical nucleic acid structures that typically form during transcription when the nascent RNA hybridizes the DNA template strand, leaving the non-template DNA strand unpaired. These structures are abundant in nature and play important physiological and pathological roles. Recent research shows that DNA sequence and topology affect R-loops, yet it remains unclear how these and other factors contribute to R-loop formation. In this work, we investigate the link between nascent RNA folding and the formation of R-loops. We introduce tree-polynomials, a new class of representations of RNA secondary structures. A tree-polynomial representation consists of a rooted tree associated with an RNA secondary structure together with a polynomial that is uniquely identified with the rooted tree. Tree-polynomials enable accurate, interpretable and efficient data analysis of RNA secondary structures without pseudoknots. We develop a computational pipeline for investigating and predicting R-loop formation from a genomic sequence. The pipeline obtains nascent RNA secondary structures from a co-transcriptional RNA folding software, and computes the tree-polynomial representations of the structures. By applying this pipeline to plasmid sequences that contain R-loop forming genes, we establish a strong correlation between the coefficient sums of tree-polynomials and the experimental probability of R-loop formation. Such strong correlation indicates that the pipeline can be used for accurate R-loop prediction. Furthermore, the interpretability of tree-polynomials allows us to characterize the features of RNA secondary structure associated with R-loop formation. In particular, we identify that branches with short stems separated by bulges and interior loops are associated with R-loops.  more » « less
Award ID(s):
2107267 2054321
PAR ID:
10582451
Author(s) / Creator(s):
; ; ;
Editor(s):
Chen, Shi-Jie
Publisher / Repository:
PLOS
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
20
Issue:
12
ISSN:
1553-7358
Page Range / eLocation ID:
e1012669
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Segal, R.; Shtylla, B.; Sindi, S (Ed.)
    R-loops are nucleic acid structures consisting of a DNA:RNA hybrid and a DNA single strand. They form naturally during transcription when the nascent RNA hybridizes to the template DNA, forcing the coding DNA strand to wrap around the RNA:DNA duplex. Although formation of R-loops can have deleterious effects on genome integrity, there is evidence of their role as potential regulators of gene expression and DNA repair. Here we initiate an abstract model based on formal grammars to describe RNA:DNA interactions and the formation of R-loops. Separately we use a sliding window approach that accounts for properties of the DNA nucleotide sequence, such as C-richness and CG-skew, to identify segments favoring R-loops. We evaluate these properties on two DNA plasmids that are known to form R-loops and compare results with a recent energetics model from the Chédin Lab. Our abstract approach for R-loops is an initial step toward a more sophisticated framework which can take into account the effect of DNA topology on R-loop formation. 
    more » « less
  2. Segal, R.; Shtylla, B.; Sindi, S (Ed.)
    R-loops are nucleic acid structures consisting of a DNA:RNA hybrid and a DNA single strand. They form naturally during transcription when the nascent RNA hybridizes to the template DNA, forcing the coding DNA strand to wrap around the RNA:DNA duplex. Although formation of R-loops can have deleterious effects on genome integrity, there is evidence of their role as potential regulators of gene expression and DNA repair. Here we initiate an abstract model based on formal grammars to describe RNA:DNA interactions and the formation of R-loops. Separately we use a sliding window approach that accounts for properties of the DNA nucleotide sequence, such as C-richness and CG-skew, to identify segments favoring R-loops. We evaluate these properties on two DNA plasmids that are known to form R-loops and compare results with a recent energetics model from the Chédin Lab. Our abstract approach for R-loops is an initial step toward a more sophisticated framework which can take into account the effect of DNA topology on R-loop formation. 
    more » « less
  3. Type V CRISPR-Cas interference proteins use a single RuvC active site to make RNA-guided breaks in double-stranded DNA substrates, an activity essential for both bacterial immunity and genome editing. The best-studied of these enzymes, Cas12a, initiates DNA cutting by forming a 20-nucleotide R-loop in which the guide RNA displaces one strand of a double-helical DNA substrate, positioning the DNase active site for first-strand cleavage. However, crystal structures and biochemical data have not explained how the second strand is cut to complete the double-strand break. Here, we detect intrinsic instability in DNA flanking the RNA-3′ side of R-loops, which Cas12a can exploit to expose second-strand DNA for cutting. Interestingly, DNA flanking the RNA-5′ side of R-loops is not intrinsically unstable. This asymmetry in R-loop structure may explain the uniformity of guide RNA architecture and the single-active-site cleavage mechanism that are fundamental features of all type V CRISPR-Cas systems. 
    more » « less
  4. The CRISPR-associated protein 9 (Cas9) has been engineered as a precise gene editing tool to make double-strand breaks. CRISPR-associated protein 9 binds the folded guide RNA (gRNA) that serves as a binding scaffold to guide it to the target DNA duplex via a RecA-like strand-displacement mechanism but without ATP binding or hydrolysis. The target search begins with the protospacer adjacent motif or PAM-interacting domain, recognizing it at the major groove of the duplex and melting its downstream duplex where an RNA-DNA heteroduplex is formed at nanomolar affinity. The rate-limiting step is the formation of an R-loop structure where the HNH domain inserts between the target heteroduplex and the displaced non-target DNA strand. Once the R-loop structure is formed, the non-target strand is rapidly cleaved by RuvC and ejected from the active site. This event is immediately followed by cleavage of the target DNA strand by the HNH domain and product release. Within CRISPR-associated protein 9, the HNH domain is inserted into the RuvC domain near the RuvC active site via two linker loops that provide allosteric communication between the two active sites. Due to the high flexibility of these loops and active sites, biophysical techniques have been instrumental in characterizing the dynamics and mechanism of the CRISPR-associated protein 9 nucleases, aiding structural studies in the visualization of the complete active sites and relevant linker structures. Here, we review biochemical, structural, and biophysical studies on the underlying mechanism with emphasis on how CRISPR-associated protein 9 selects the target DNA duplex and rejects non-target sequences. 
    more » « less
  5. Flap endonuclease 1 (FEN1) is an essential enzyme that removes RNA primers and base lesions during DNA lagging strand maturation and long-patch base excision repair (BER). It plays a crucial role in maintaining genome stability and integrity. FEN1 is also implicated in RNA processing and biogenesis. A recent study from our group has shown that FEN1 is involved in trinucleotide repeat deletion by processing the RNA strand in R-loops through BER, further suggesting that the enzyme can modulate genome stability by facilitating the resolution of R-loops. However, it remains unknown how FEN1 can process RNA to resolve an R-loop. In this study, we examined the FEN1 cleavage activity on the RNA:DNA hybrid intermediates generated during DNA lagging strand processing and BER in R-loops. We found that both human and yeast FEN1 efficiently cleaved an RNA flap in the intermediates using its endonuclease activity. We further demonstrated that FEN1 was recruited to R-loops in normal human fibroblasts and senataxin-deficient (AOA2) fibroblasts, and its R-loop recruitment was significantly increased by oxidative DNA damage. We showed that FEN1 specifically employed its endonucleolytic cleavage activity to remove the RNA strand in an R-loop during BER. We found that FEN1 coordinated its DNA and RNA endonucleolytic cleavage activity with the 3′-5′ exonuclease of APE1 to resolve the R-loop. Our results further suggest that FEN1 employed its unique tracking mechanism to endonucleolytically cleave the RNA strand in an R-loop by coordinating with other BER enzymes and cofactors during BER. Our study provides the first evidence that FEN1 endonucleolytic cleavage can result in the resolution of R-loops via the BER pathway, thereby maintaining genome integrity. 
    more » « less