skip to main content


Title: LinearCoFold and LinearCoPartition: linear-time algorithms for secondary structure prediction of interacting RNA molecules
Abstract

Many RNAs function through RNA–RNA interactions. Fast and reliable RNA structure prediction with consideration of RNA–RNA interaction is useful, however, existing tools are either too simplistic or too slow. To address this issue, we present LinearCoFold, which approximates the complete minimum free energy structure of two strands in linear time, and LinearCoPartition, which approximates the cofolding partition function and base pairing probabilities in linear time. LinearCoFold and LinearCoPartition are orders of magnitude faster than RNAcofold. For example, on a sequence pair with combined length of 26,190 nt, LinearCoFold is 86.8× faster than RNAcofold MFE mode, and LinearCoPartition is 642.3× faster than RNAcofold partition function mode. Surprisingly, LinearCoFold and LinearCoPartition’s predictions have higher PPV and sensitivity of intermolecular base pairs. Furthermore, we apply LinearCoFold to predict the RNA–RNA interaction between SARS-CoV-2 genomic RNA (gRNA) and human U4 small nuclear RNA (snRNA), which has been experimentally studied, and observe that LinearCoFold’s prediction correlates better with the wet lab results than RNAcofold’s.

 
more » « less
Award ID(s):
2009071
NSF-PAR ID:
10451609
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Nucleic Acids Research
Volume:
51
Issue:
18
ISSN:
0305-1048
Format(s):
Medium: X Size: p. e94-e94
Size(s):
["p. e94-e94"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation RNA secondary structure prediction is widely used to understand RNA function. Recently, there has been a shift away from the classical minimum free energy methods to partition function-based methods that account for folding ensembles and can therefore estimate structure and base pair probabilities. However, the classical partition function algorithm scales cubically with sequence length, and is therefore prohibitively slow for long sequences. This slowness is even more severe than cubic-time free energy minimization due to a substantially larger constant factor in runtime. Results Inspired by the success of our recent LinearFold algorithm that predicts the approximate minimum free energy structure in linear time, we design a similar linear-time heuristic algorithm, LinearPartition, to approximate the partition function and base-pairing probabilities, which is shown to be orders of magnitude faster than Vienna RNAfold and CONTRAfold (e.g. 2.5 days versus 1.3 min on a sequence with length 32 753 nt). More interestingly, the resulting base-pairing probabilities are even better correlated with the ground-truth structures. LinearPartition also leads to a small accuracy improvement when used for downstream structure prediction on families with the longest length sequences (16S and 23S rRNAs), as well as a substantial improvement on long-distance base pairs (500+ nt apart). Availability and implementation Code: http://github.com/LinearFold/LinearPartition; Server: http://linearfold.org/partition. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  2. Abstract

    For many RNA molecules, the secondary structure is essential for the correct function of the RNA. Predicting RNA secondary structure from nucleotide sequences is a long-standing problem in genomics, but the prediction performance has reached a plateau over time. Traditional RNA secondary structure prediction algorithms are primarily based on thermodynamic models through free energy minimization, which imposes strong prior assumptions and is slow to run. Here, we propose a deep learning-based method, called UFold, for RNA secondary structure prediction, trained directly on annotated data and base-pairing rules. UFold proposes a novel image-like representation of RNA sequences, which can be efficiently processed by Fully Convolutional Networks (FCNs). We benchmark the performance of UFold on both within- and cross-family RNA datasets. It significantly outperforms previous methods on within-family datasets, while achieving a similar performance as the traditional methods when trained and tested on distinct RNA families. UFold is also able to predict pseudoknots accurately. Its prediction is fast with an inference time of about 160 ms per sequence up to 1500 bp in length. An online web server running UFold is available at https://ufold.ics.uci.edu. Code is available at https://github.com/uci-cbcl/UFold.

     
    more » « less
  3. Abstract

    Protein‐protein interactions are critical to protein function, but three‐dimensional (3D) arrangements of interacting proteins have proven hard to predict, even given the identities and 3D structures of the interacting partners. Specifically, identifying the relevant pairwise interaction surfaces remains difficult, often relying on shape complementarity with molecular docking while accounting for molecular motions to optimize rigid 3D translations and rotations. However, such approaches can be computationally expensive, and faster, less accurate approximations may prove useful for large‐scale prediction and assembly of 3D structures of multi‐protein complexes. We asked if a reduced representation of protein geometry retains enough information about molecular properties to predict pairwise protein interaction interfaces that are tolerant of limited structural rearrangements. Here, we describe a reduced representation of 3D protein accessible surfaces on which molecular properties such as charge, hydrophobicity, and evolutionary rate can be easily mapped, implemented in the MorphProt package. Pairs of surfaces are compared to rapidly assess partner‐specific potential surface complementarity. On two available benchmarks of 185 overall known protein complexes, we observe predictions comparable to other structure‐based tools at correctly identifying protein interaction surfaces. Furthermore, we examined the effect of molecular motion through normal mode simulation on a benchmark receptor‐ligand pair and observed no marked loss of predictive accuracy for distortions of up to 6 Å Cα‐RMSD. Thus, a shape reduction of protein surfaces retains considerable information about surface complementarity, offers enhanced speed of comparison relative to more complex geometric representations, and exhibits tolerance to conformational changes.

     
    more » « less
  4. Abstract Motivation

    Predicting the secondary structure of an ribonucleic acid (RNA) sequence is useful in many applications. Existing algorithms [based on dynamic programming] suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications.

    Results

    We present a novel alternative O(n3)-time dynamic programming algorithm for RNA folding that is amenable to heuristics that make it run in O(n) time and O(n) space, while producing a high-quality approximation to the optimal solution. Inspired by incremental parsing for context-free grammars in computational linguistics, our alternative dynamic programming algorithm scans the sequence in a left-to-right (5′-to-3′) direction rather than in a bottom-up fashion, which allows us to employ the effective beam pruning heuristic. Our work, though inexact, is the first RNA folding algorithm to achieve linear runtime (and linear space) without imposing constraints on the output structure. Surprisingly, our approximate search results in even higher overall accuracy on a diverse database of sequences with known structures. More interestingly, it leads to significantly more accurate predictions on the longest sequence families in that database (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500+ nucleotides apart), both of which are well known to be challenging for the current models.

    Availability and implementation

    Our source code is available at https://github.com/LinearFold/LinearFold, and our webserver is at http://linearfold.org (sequence limit: 100 000nt).

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. null (Ed.)
    Abstract

    Flapping insect wings experience appreciable deformation due to aerodynamic and inertial forces. This deformation is believed to benefit the insect’s aerodynamic force production as well as energetic efficiency. However, the fluid-structure interaction (FSI) models used to estimate wing deformations are often computationally demanding and are therefore challenged by parametric studies. Here, we develop a simple FSI model of a flapping wing idealized as a two-dimensional pitching-plunging airfoil. Using the Lagrangian formulation, we derive the reduced-order structural framework governing wing’s elastic deformation. We consider two fluid models: quasi-steady Deformable Blade Element Theory (DBET) and Unsteady Vortex Lattice Method (UVLM). DBET is computationally economical but does not provide insight into the flow structure surrounding the wing, whereas UVLM approximates flows but requires more time to solve. For simple flapping kinematics, DBET and UVLM produce similar estimates of the aerodynamic force normal to the surface of a rigid wing. More importantly, when the wing is permitted to deform, DBET and UVLM agree well in predicting wingtip deflection and aerodynamic normal force. The most notable difference between the model predictions is a roughly 20° phase difference in normal force. DBET estimates wing deformation and force production approximately 15 times faster than UVLM for the parameters considered, and both models solve in under a minute when considering 15 flapping periods. Moving forward, we will benchmark both low-order models with respect to high fidelity computational fluid dynamics coupled to finite element analysis, and assess the agreement between DBET and UVLM over a broader range of flapping kinematics.

     
    more » « less