skip to main content

Title: LazySampling and LinearSampling: fast stochastic sampling of RNA secondary structure with applications to SARS-CoV-2

Many RNAs fold into multiple structures at equilibrium, and there is a need to sample these structures according to their probabilities in the ensemble. The conventional sampling algorithm suffers from two limitations: (i) the sampling phase is slow due to many repeated calculations; and (ii) the end-to-end runtime scales cubically with the sequence length. These issues make it difficult to be applied to long RNAs, such as the full genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). To address these problems, we devise a new sampling algorithm, LazySampling, which eliminates redundant work via on-demand caching. Based on LazySampling, we further derive LinearSampling, an end-to-end linear time sampling algorithm. Benchmarking on nine diverse RNA families, the sampled structures from LinearSampling correlate better with the well-established secondary structures than Vienna RNAsubopt and RNAplfold. More importantly, LinearSampling is orders of magnitude faster than standard tools, being 428× faster (72 s versus 8.6 h) than RNAsubopt on the full genome of SARS-CoV-2 (29 903 nt). The resulting sample landscape correlates well with the experimentally guided secondary structure models, and is closer to the alternative conformations revealed by experimentally driven analysis. Finally, LinearSampling finds 23 regions of 15 nt with high accessibilities in the SARS-CoV-2 genome, which are potential targets for COVID-19 diagnostics and therapeutics.

more » « less
Award ID(s):
2009071 1817231
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Nucleic Acids Research
Page Range / eLocation ID:
p. e7-e7
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single-sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurboFold’s purely in silico prediction not only is close to experimentally guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5 ′ and 3 ′ untranslated regions (UTRs) (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies undiscovered conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, small interfering RNAs (siRNAs), CRISPR-Cas13 guide RNAs, and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies and will be a useful tool in fighting the current and future pandemics. 
    more » « less
  2. Abstract

    Many RNAs function through RNA–RNA interactions. Fast and reliable RNA structure prediction with consideration of RNA–RNA interaction is useful, however, existing tools are either too simplistic or too slow. To address this issue, we present LinearCoFold, which approximates the complete minimum free energy structure of two strands in linear time, and LinearCoPartition, which approximates the cofolding partition function and base pairing probabilities in linear time. LinearCoFold and LinearCoPartition are orders of magnitude faster than RNAcofold. For example, on a sequence pair with combined length of 26,190 nt, LinearCoFold is 86.8× faster than RNAcofold MFE mode, and LinearCoPartition is 642.3× faster than RNAcofold partition function mode. Surprisingly, LinearCoFold and LinearCoPartition’s predictions have higher PPV and sensitivity of intermolecular base pairs. Furthermore, we apply LinearCoFold to predict the RNA–RNA interaction between SARS-CoV-2 genomic RNA (gRNA) and human U4 small nuclear RNA (snRNA), which has been experimentally studied, and observe that LinearCoFold’s prediction correlates better with the wet lab results than RNAcofold’s.

    more » « less
  3. Abstract

    Long-range ribonucleic acid (RNA)–RNA interactions (RRI) are prevalent in positive-strand RNA viruses, including Beta-coronaviruses, and these take part in regulatory roles, including the regulation of sub-genomic RNA production rates. Crosslinking of interacting RNAs and short read-based deep sequencing of resulting RNA–RNA hybrids have shown that these long-range structures exist in severe acute respiratory syndrome coronavirus (SARS-CoV)-2 on both genomic and sub-genomic levels and in dynamic topologies. Furthermore, co-evolution of coronaviruses with their hosts is navigated by genetic variations made possible by its large genome, high recombination frequency and a high mutation rate. SARS-CoV-2’s mutations are known to occur spontaneously during replication, and thousands of aggregate mutations have been reported since the emergence of the virus. Although many long-range RRIs have been experimentally identified using high-throughput methods for the wild-type SARS-CoV-2 strain, evolutionary trajectory of these RRIs across variants, impact of mutations on RRIs and interaction of SARS-CoV-2 RNAs with the host have been largely open questions in the field. In this review, we summarize recent computational tools and experimental methods that have been enabling the mapping of RRIs in viral genomes, with a specific focus on SARS-CoV-2. We also present available informatics resources to navigate the RRI maps and shed light on the impact of mutations on the RRI space in viral genomes. Investigating the evolution of long-range RNA interactions and that of virus–host interactions can contribute to the understanding of new and emerging variants as well as aid in developing improved RNA therapeutics critical for combating future outbreaks.

    more » « less
  4. Remdesivir (RDV) prodrug can be metabolized into a triphosphate form nucleotide analogue (RDV-TP) to bind and insert into the active site of viral RNA dependent RNA polymerase (RdRp) to further interfere with viral genome replication. In this work, we computationally studied how RDV-TP binds and inserts to the SARS-CoV-2 RdRp active site, in comparison with natural nucleotide substrate adenosine triphosphate (ATP). To do that, we first constructed atomic structural models of an initial binding complex (active site open) and a substrate insertion complex (active site closed), based on high-resolution cryo-EM structures determined recently for SARS-CoV-2 RdRp or non-structural protein (nsp) 12, in complex with accessory protein factors nsp7 and nsp8. By conducting all-atom molecular dynamics simulation with umbrella sampling strategies on the nucleotide insertion between the open and closed state RdRp complexes, our studies show that RDV-TP can initially bind in a comparatively stabilized state to the viral RdRp active site, as it primarily forms base stacking with the template uracil nucleotide (nt +1), which under freely fluctuations supports a low free energy barrier of the RDV-TP insertion (∼1.5 kcal mol −1 ). In comparison, the corresponding natural substrate ATP binds initially to the RdRp active site in Watson–Crick base pairing with the template nt, and inserts into the active site with a medium low free energy barrier (∼2.6 kcal mol −1 ), when the fluctuations of the template nt are well quenched. The simulations also show that the initial base stacking of RDV-TP with the template can be specifically stabilized by motif C-S759, S682 (near motif B) with the base, and motif G-K500 with the template backbone. Although the RDV-TP insertion can be hindered by motif F-R555/R553 interaction with the triphosphate, the ATP insertion seems to be facilitated by such interactions. The inserted RDV-TP and ATP can be further distinguished by specific sugar interaction with motif B-T687 and motif A-D623, respectively. 
    more » « less
  5. The s2m, a highly conserved 41-nt hairpin structure in the SARS-CoV-2 genome, serves as an attractive therapeutic target that may have important roles in the virus life cycle or interactions with the host. However, the conserved s2m in Delta SARS-CoV-2, a previously dominant variant characterized by high infectivity and disease severity, has received relatively less attention than that of the original SARS-CoV-2 virus. The focus of this work is to identify and define the s2m changes between Delta and SARS-CoV-2 and the subsequent impact of those changes upon the s2m dimerization and interactions with the host microRNA miR-1307-3p. Bioinformatics analysis of the GISAID database targeting the s2m element reveals a >99% correlation of a single nucleotide mutation at the 15th position (G15U) in Delta SARS-CoV-2. Based on1H NMR spectroscopy assignments comparing the imino proton resonance region of s2m and the s2m G15U at 19°C, we show that the U15–A29 base pair closes, resulting in a stabilization of the upper stem without overall secondary structure deviation. Increased stability of the upper stem did not affect the chaperone activity of the viral N protein, as it was still able to convert the kissing dimers formed by s2m G15U into a stable duplex conformation, consistent with the s2m reference. However, we show that the s2m G15U mutation drastically impacts the binding of host miR-1307-3p. These findings demonstrate that the observed G15U mutation alters the secondary structure of s2m with subsequent impact on viral binding of host miR-1307-3p, with potential consequences on immune responses.

    more » « less