Title: SamplingDesign: RNA design via continuous optimization with coupled variables and Monte-Carlo sampling
RNA design aims to find a sequence that can fold into a target secondary structure. It can create artificial RNA molecules for specific functions, with wide applications in medicine. It is computationally challenging due to two levels of combinatorial explosion: the exponentially large design space and the exponentially many competing structures per design. Popular methods such as local search cannot keep up with these combinatorial explosions. We instead employ two techniques from machine learning, continuous optimization and Monte-Carlo sampling. We start from a distribution over all valid sequences, and use gradient descent to improve the expectation of an arbitrary objective function. We define novel coupled-variable distributions to model the correlation between nucleotides. We then use sampling to approximate the objective, estimate the gradient, and select the final candidate. Our work consistently outperforms state-of-the-art methods in key metrics including Boltzmann probability and ensemble defect, especially on long and hard-to-design structures. more »« less
Liu, J; Duan, I; Santichaivekin, S; Libeskind-Hadas, R
(, Bioinformatics Research and Applications: 18th International Symposium, ISBRA 2022, Haifa, Israel, November 14–17, 2022, Proceedings)
Bansal, M
(Ed.)
Predicting the secondary structure of RNA is an important problem in molecular biology, providing insights into the function of non-coding Rn As and with broad applications in understanding disease, the development of new drugs, among others. Combinatorial algorithms for predicting RNA foldings can generate an exponentially large number of equally optimal foldings with respect to a given optimization criterion, making it difficult to determine how well any single folding represents the entire space. We provide efficient new algorithms for providing insights into this large space of optimal RNA foldings and a research software tool, toRNAdo, that implements these algorithms.
Curtis, Frank E.; Li, Minhan
(, INFORMS Journal on Optimization)
Gradient sampling (GS) methods for the minimization of objective functions that may be nonconvex and/or nonsmooth are proposed, analyzed, and tested. One of the most computationally expensive components of contemporary GS methods is the need to solve a convex quadratic subproblem in each iteration. By contrast, the methods proposed in this paper allow the use of inexact solutions of these subproblems, which, as proved in the paper, can be incorporated without the loss of theoretical convergence guarantees. Numerical experiments show that, by exploiting inexact subproblem solutions, one can consistently reduce the computational effort required by a GS method. Additionally, a strategy is proposed for aggregating gradient information after a subproblem is solved (potentially inexactly) as has been exploited in bundle methods for nonsmooth optimization. It is proved that the aggregation scheme can be introduced without the loss of theoretical convergence guarantees. Numerical experiments show that incorporating this gradient aggregation approach can also reduce the computational effort required by a GS method.
Many Markov Chain Monte Carlo (MCMC) methods leverage gradient information of the potential function of target distribution to explore sample space efficiently. However, computing gradients can often be computationally expensive for large scale applications, such as those in contemporary machine learning. Stochastic Gradient (SG-)MCMC methods approximate gradients by stochastic ones, commonly via uniformly subsampled data points, and achieve improved computational efficiency, however at the price of introducing sampling error. We propose a non-uniform subsampling scheme to improve the sampling accuracy. The proposed exponentially weighted stochastic gradient (EWSG) is designed so that a non-uniform-SG-MCMC method mimics the statistical behavior of a batch-gradient-MCMC method, and hence the inaccuracy due to SG approximation is reduced. EWSG differs from classical variance reduction (VR) techniques as it focuses on the entire distribution instead of just the variance; nevertheless, its reduced local variance is also proved. EWSG can also be viewed as an extension of the importance sampling idea, successful for stochastic-gradient-based optimizations, to sampling tasks. In our practical implementation of EWSG, the non-uniform subsampling is performed efficiently via a Metropolis-Hastings chain on the data index, which is coupled to the MCMC algorithm. Numerical experiments are provided, not only to demonstrate EWSG's effectiveness, but also to guide hyperparameter choices, and validate our non-asymptotic global error bound despite of approximations in the implementation. Notably, while statistical accuracy is improved, convergence speed can be comparable to the uniform version, which renders EWSG a practical alternative to VR (but EWSG and VR can be combined too).
Abstract MotivationRNA design is the search for a sequence or set of sequences that will fold to desired structure, also known as the inverse problem of RNA folding. However, the sequences designed by existing algorithms often suffer from low ensemble stability, which worsens for long sequence design. Additionally, for many methods only a small number of sequences satisfying the MFE criterion can be found by each run of design. These drawbacks limit their use cases. ResultsWe propose an innovative optimization paradigm, SAMFEO, which optimizes ensemble objectives (equilibrium probability or ensemble defect) by iterative search and yields a very large number of successfully designed RNA sequences as byproducts. We develop a search method which leverages structure level and ensemble level information at different stages of the optimization: initialization, sampling, mutation, and updating. Our work, while being less complicated than others, is the first algorithm that is able to design thousands of RNA sequences for the puzzles from the Eterna100 benchmark. In addition, our algorithm solves the most Eterna100 puzzles among all the general optimization based methods in our study. The only baseline solving more puzzles than our work is dependent on handcrafted heuristics designed for a specific folding model. Surprisingly, our approach shows superiority on designing long sequences for structures adapted from the database of 16S Ribosomal RNAs. Availability and implementationOur source code and data used in this article is available at https://github.com/shanry/SAMFEO.
RNA design is the search for a sequence or set of sequences that will fold into predefined structures, also known as the inverse problem of RNA folding. While numerous RNA design methods have been invented to find sequences capable of folding into a target structure, little attention has been given to the identification of undesignable structures according to the minimum free energy ( ) criterion under the Turner model. In this paper, we address this gap by first introducing mathematical theorems outlining sufficient conditions for recognizing undesignable structures, then proposing efficient algorithms, guided by these theorems, to verify the undesignability of RNA structures. Through the application of these theorems and algorithms to the Eterna100 puzzles, we demonstrate the ability to efficiently establish that 15 of the puzzles indeed fall within the category of undesignable structures. In addition, we provide specific insights from the study of undesignability, in the hope that it will enable more understanding of RNA folding and RNA design. Availability: Our source code is available at https://github.com/shanry/RNA-Undesign.
Tang, Wei Yu, Dai, Ning, Zhou, Tianshuo, Mathews, David H, and Huang, Liang.
"SamplingDesign: RNA design via continuous optimization with coupled variables and Monte-Carlo sampling". Nature Communications (). Country unknown/Code not available: Springer Nature. https://doi.org/10.1038/s41467-025-67901-3.https://par.nsf.gov/biblio/10673079.
@article{osti_10673079,
place = {Country unknown/Code not available},
title = {SamplingDesign: RNA design via continuous optimization with coupled variables and Monte-Carlo sampling},
url = {https://par.nsf.gov/biblio/10673079},
DOI = {10.1038/s41467-025-67901-3},
abstractNote = {RNA design aims to find a sequence that can fold into a target secondary structure. It can create artificial RNA molecules for specific functions, with wide applications in medicine. It is computationally challenging due to two levels of combinatorial explosion: the exponentially large design space and the exponentially many competing structures per design. Popular methods such as local search cannot keep up with these combinatorial explosions. We instead employ two techniques from machine learning, continuous optimization and Monte-Carlo sampling. We start from a distribution over all valid sequences, and use gradient descent to improve the expectation of an arbitrary objective function. We define novel coupled-variable distributions to model the correlation between nucleotides. We then use sampling to approximate the objective, estimate the gradient, and select the final candidate. Our work consistently outperforms state-of-the-art methods in key metrics including Boltzmann probability and ensemble defect, especially on long and hard-to-design structures.},
journal = {Nature Communications},
publisher = {Springer Nature},
author = {Tang, Wei Yu and Dai, Ning and Zhou, Tianshuo and Mathews, David H and Huang, Liang},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.