skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: RNA fold prediction by Monte Carlo in graph space and the statistical mechanics of tertiary interactions
Using a graph representation of RNA structures, we have studied the ensembles of secondary and tertiary graphs of two sets of RNA with Monte Carlo simulations. The first consisted of 91 target ribozyme and riboswitch sequences of moderate lengths (<150 nt) having a variety of secondary, H-type pseudoknots and kissing loop interactions. The second set consisted of 71 more diverse sequences across many RNA families. Using a simple empirical energy model for tertiary interactions and only sequence information for each target as input, the simulations examined how tertiary interactions impact the statistical mechanics of the fold ensembles. The results show that the graphs proliferate enormously when tertiary interactions are possible, producing an entropic driving force for the ensemble to access folds having tertiary structures even though they are overall energetically unfavorable in the energy model. For each of the targets in the two test sets, we assessed the quality of the model and the simulations by examining how well the simulated structures were able to predict the native fold, and compared the results to fold predictions from ViennaRNA. Our model generated good or excellent predictions in a large majority of the targets. Overall, this method was able to produce predictions of comparable quality to Vienna, but it outperformed Vienna for structures with H-type pseudoknots. The results suggest that while tertiary interactions are predicated on real-space contacts, their impacts on the folded structure of RNA can be captured by graph space information for sequences of moderate lengths, using a simple tertiary energy model for the loops, the base pairs, and base stacks.  more » « less
Award ID(s):
1664801
PAR ID:
10567457
Author(s) / Creator(s):
;
Publisher / Repository:
RNA Soceity
Date Published:
Journal Name:
RNA
Volume:
31
Issue:
1
ISSN:
1355-8382
Page Range / eLocation ID:
14 to 31
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Human immunodeficiency virus (HIV) continues to be a threat to public health. An emerging technique with promise in the context of fighting HIV type 1 (HIV-1) focuses on targeting ribosomal frameshifting. A crucial –1 programmed ribosomal frameshift (PRF) has been observed in several pathogenic viruses, including HIV-1. Altered folds of the HIV-1 RNA frameshift element (FSE) have been shown to alter frameshifting efficiency. Here, we use RNA-As-Graphs (RAG), a graph-theory based framework for representing and analyzing RNA secondary structures, to perform conformational analysis in motif space to propose how sequence length may influence folding patterns. This combined analysis, along with all-atom modeling and experimental testing of our designed mutants, has already proven valuable for the SARS-CoV-2 FSE. As a first step to launching the same computational/experimental approach for HIV-1, we compare prior experiments and perform SHAPE-guided 2D-fold predictions for the HIV-1 FSE embedded in increasing sequence contexts and predict structure-altering mutations. We find a highly stable upper stem and highly flexible lower stem for the core FSE, with a three-way junction connecting to other motifs at increasing lengths. In particular, we find little support for a pseudoknot or triplex interaction in the core FSE, although pseudoknots can form separately as a connective motif at longer sequences. We also identify sensitive residues in the upper stem and central loop that, when minimally mutated, alter the core stem loop folding. These insights into the FSE fold and structure-altering mutations can be further pursued by all-atom simulations and experimental testing to advance the mechanistic understanding and therapeutic strategies for HIV-1. 
    more » « less
  2. The frameshifting RNA element (FSE) in coronaviruses (CoVs) regulates the programmed −1 ribosomal frameshift (−1 PRF) mechanism common to many viruses. The FSE is of particular interest as a promising drug candidate. Its associated pseudoknot or stem loop structure is thought to play a large role in frameshifting and thus viral protein production. To investigate the FSE structural evolution, we use our graph theory-based methods for representing RNA secondary structures in the RNA-As-Graphs (RAG) framework to calculate conformational landscapes of viral FSEs with increasing sequence lengths for representative 10 Alpha and 13 Beta-CoVs. By following length-dependent conformational changes, we show that FSE sequences encode many possible competing stems which in turn favor certain FSE topologies, including a variety of pseudoknots, stem loops, and junctions. We explain alternative competing stems and topological FSE changes by recurring patterns of mutations. At the same time, FSE topology robustness can be understood by shifted stems within different sequence contexts and base pair coevolution. We further propose that the topology changes reflected by length-dependent conformations contribute to tuning the frameshifting efficiency. Our work provides tools to analyze virus sequence/structure correlations, explains how sequence and FSE structure have evolved for CoVs, and provides insights into potential mutations for therapeutic applications against a broad spectrum of CoV FSEs by targeting key sequence/structural transitions. 
    more » « less
  3. Abstract Motivation RNA secondary structure prediction is widely used to understand RNA function. Recently, there has been a shift away from the classical minimum free energy methods to partition function-based methods that account for folding ensembles and can therefore estimate structure and base pair probabilities. However, the classical partition function algorithm scales cubically with sequence length, and is therefore prohibitively slow for long sequences. This slowness is even more severe than cubic-time free energy minimization due to a substantially larger constant factor in runtime. Results Inspired by the success of our recent LinearFold algorithm that predicts the approximate minimum free energy structure in linear time, we design a similar linear-time heuristic algorithm, LinearPartition, to approximate the partition function and base-pairing probabilities, which is shown to be orders of magnitude faster than Vienna RNAfold and CONTRAfold (e.g. 2.5 days versus 1.3 min on a sequence with length 32 753 nt). More interestingly, the resulting base-pairing probabilities are even better correlated with the ground-truth structures. LinearPartition also leads to a small accuracy improvement when used for downstream structure prediction on families with the longest length sequences (16S and 23S rRNAs), as well as a substantial improvement on long-distance base pairs (500+ nt apart). Availability and implementation Code: http://github.com/LinearFold/LinearPartition; Server: http://linearfold.org/partition. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  4. RNA macromolecules, like proteins, fold to assume shapes that are intimately connected to their broadly recognized biological functions; however, because of their high charge and dynamic nature, RNA structures are far more challenging to determine. We introduce an approach that exploits the high brilliance of x-ray free-electron laser sources to reveal the formation and ready identification of angstrom-scale features in structured and unstructured RNAs. Previously unrecognized structural signatures of RNA secondary and tertiary structures are identified through wide-angle solution scattering experiments. With millisecond time resolution, we observe an RNA fold from a dynamically varying single strand through a base-paired intermediate to assume a triple-helix conformation. While the backbone orchestrates the folding, the final structure is locked in by base stacking. This method may help to rapidly characterize and identify structural elements in nucleic acids in both equilibrium and time-resolved experiments. 
    more » « less
  5. Abstract For many RNA molecules, the secondary structure is essential for the correct function of the RNA. Predicting RNA secondary structure from nucleotide sequences is a long-standing problem in genomics, but the prediction performance has reached a plateau over time. Traditional RNA secondary structure prediction algorithms are primarily based on thermodynamic models through free energy minimization, which imposes strong prior assumptions and is slow to run. Here, we propose a deep learning-based method, called UFold, for RNA secondary structure prediction, trained directly on annotated data and base-pairing rules. UFold proposes a novel image-like representation of RNA sequences, which can be efficiently processed by Fully Convolutional Networks (FCNs). We benchmark the performance of UFold on both within- and cross-family RNA datasets. It significantly outperforms previous methods on within-family datasets, while achieving a similar performance as the traditional methods when trained and tested on distinct RNA families. UFold is also able to predict pseudoknots accurately. Its prediction is fast with an inference time of about 160 ms per sequence up to 1500 bp in length. An online web server running UFold is available at https://ufold.ics.uci.edu. Code is available at https://github.com/uci-cbcl/UFold. 
    more » « less