Ribozymes are RNA molecules that catalyze biochemical reactions. Self-cleaving ribozymes are a common naturally occurring class of ribozymes that catalyze site-specific cleavage of their own phosphodiester backbone. In addition to their natural functions, self-cleaving ribozymes have been used to engineer control of gene expression because they can be designed to alter RNA processing and stability. However, the rational design of ribozyme activity remains challenging, and many ribozyme-based systems are engineered or improved by random mutagenesis and selection ( in vitro evolution). Improving a ribozyme-based system often requires several mutations to achieve the desired function, but extensive pairwise and higher-order epistasis prevent a simple prediction of the effect of multiple mutations that is needed for rational design. Recently, high-throughput sequencing-based approaches have produced data sets on the effects of numerous mutations in different ribozymes (RNA fitness landscapes). Here we used such high-throughput experimental data from variants of the CPEB3 self-cleaving ribozyme to train a predictive model through machine learning approaches. We trained models using either a random forest or long short-term memory (LSTM) recurrent neural network approach. We found that models trained on a comprehensive set of pairwise mutant data could predict active sequences at higher mutational distances, but the correlation between predicted and experimentally observed self-cleavage activity decreased with increasing mutational distance. Adding sequences with increasingly higher numbers of mutations to the training data improved the correlation at increasing mutational distances. Systematically reducing the size of the training data set suggests that a wide distribution of ribozyme activity may be the key to accurate predictions. Because the model predictions are based only on sequence and activity data, the results demonstrate that this machine learning approach allows readily obtainable experimental data to be used for RNA design efforts even for RNA molecules with unknown structures. The accurate prediction of RNA functions will enable a more comprehensive understanding of RNA fitness landscapes for studying evolution and for guiding RNA-based engineering efforts.
more »
« less
Discovering Pathways Through Ribozyme Fitness Landscapes Using Information Theoretic Quantification of Epistasis
The identification of catalytic RNAs is typically achieved through primarily experimental means. However, only a small fraction of sequence space can be analyzed even with high-throughput techniques. Methods to extrapolate from a limited data set to predict additional ribozyme sequences, particularly in a human-interpretable fashion, could be useful both for designing new functional RNAs and for generating greater understanding about a ribozyme fitness landscape. Using information theory, we express the effects of epistasis (i.e., deviations from additivity) on a ribozyme. This representation was incorporated into a simple model of the epistatic fitness landscape, which identified potentially exploitable combinations of mutations. We used this model to theoretically predict mutants of high activity for a self-aminoacylating ribozyme, identifying potentially active triple and quadruple mutants beyond the experimental data set of single and double mutants. The predictions were validated experimentally, with nine out of nine sequences being accurately predicted to have high activity. This set of sequences included mutants that form a previously unknown evolutionary ‘bridge’ between two ribozyme families that share a common motif. Individual steps in the method could be examined, understood, and guided by a human, combining interpretability and performance in a simple model to predict ribozyme sequences by extrapolation.
more »
« less
- PAR ID:
- 10469152
- Publisher / Repository:
- Cold Spring Harbor Laboratory Press
- Date Published:
- Journal Name:
- RNA
- ISSN:
- 1355-8382
- Page Range / eLocation ID:
- rna.079541.122
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Small nucleolytic ribozymes are RNAs that cleave their own phosphodiester backbone. While proteinaceous enzymes are regulated by a variety of known mechanisms, methods of regulation for ribozymes remain unclear. Twister is one ribozyme class for which many structural and catalytic properties have been elucidated. However, few studies have analyzed the activity of twister ribozymes in the context of native flanking sequence, even though ribozymes as transcribed in nature do not exist in isolation. Interactions between the ribozyme and its neighboring sequences can induce conformational changes that inhibit self-cleavage, providing a regulatory mechanism that could naturally determine ribozyme activity in vivo and in synthetic applications. To date, eight twister ribozymes have been identified within the staple crop rice (Oryza sativa). Herein, we select several twister ribozymes from rice and show that they are differentially regulated by their flanking sequence using published RNA-seq datasets, structure probing, and co-transcriptional cleavage assays. We found that the Osa 1-2 ribozyme does not interact with its flanking sequences. However, sequences flanking the Osa 1-3 and Osa 1-8 ribozymes form inactive conformations, referred to here as “ribozymogens”, that attenuate ribozyme self-cleavage activity. For the Osa 1-3 ribozyme, we show that activity can be rescued upon addition of a complementary antisense oligonucleotide, suggesting ribozymogens can be controlled via external signals. In all, our data provide a plausible mechanism wherein flanking sequence differentially regulates ribozyme activity in vivo. More broadly, the ability to regulate ribozyme behavior locally has potential applications in control of gene expression and synthetic biology.more » « less
-
Zhang, Jianzhi (Ed.)Abstract Fitness landscapes of protein and RNA molecules can be studied experimentally using high-throughput techniques to measure the functional effects of numerous combinations of mutations. The rugged topography of these molecular fitness landscapes is important for understanding and predicting natural and experimental evolution. Mutational effects are also dependent upon environmental conditions, but the effects of environmental changes on fitness landscapes remains poorly understood. Here, we investigate the changes to the fitness landscape of a catalytic RNA molecule while changing a single environmental variable that is critical for RNA structure and function. Using high-throughput sequencing of in vitro selections, we mapped a fitness landscape of the Azoarcus group I ribozyme under eight different concentrations of magnesium ions (1–48 mM MgCl2). The data revealed the magnesium dependence of 16,384 mutational neighbors, and from this, we investigated the magnesium induced changes to the topography of the fitness landscape. The results showed that increasing magnesium concentration improved the relative fitness of sequences at higher mutational distances while also reducing the ruggedness of the mutational trajectories on the landscape. As a result, as magnesium concentration was increased, simulated populations evolved toward higher fitness faster. Curve-fitting of the magnesium dependence of individual ribozymes demonstrated that deep sequencing of in vitro reactions can be used to evaluate the structural stability of thousands of sequences in parallel. Overall, the results highlight how environmental changes that stabilize structures can also alter the ruggedness of fitness landscapes and alter evolutionary processes.more » « less
-
null (Ed.)One of the long-standing holy grails of molecular evolution has been the ability to predict an organism’s fitness directly from its genotype. With such predictive abilities in hand, researchers would be able to more accurately forecast how organisms will evolve and how proteins with novel functions could be engineered, leading to revolutionary advances in medicine and biotechnology. In this work, we assemble the largest reported set of experimental TEM-1 β-lactamase folding free energies and use this data in conjunction with previously acquired fitness data and computational free energy predictions to determine how much of the fitness of β-lactamase can be directly predicted by thermodynamic folding and binding free energies. We focus upon β-lactamase because of its long history as a model enzyme and its central role in antibiotic resistance. Based upon a set of 21 β-lactamase single and double mutants expressly designed to influence protein folding, we first demonstrate that modeling software designed to compute folding free energies such as FoldX and PyRosetta can meaningfully, although not perfectly, predict the experimental folding free energies of single mutants. Interestingly, while these techniques also yield sensible double mutant free energies, we show that they do so for the wrong physical reasons. We then go on to assess how well both experimental and computational folding free energies explain single mutant fitness. We find that folding free energies account for, at most, 24% of the variance in β-lactamase fitness values according to linear models and, somewhat surprisingly, complementing folding free energies with computationally-predicted binding free energies of residues near the active site only increases the folding-only figure by a few percent. This strongly suggests that the majority of β-lactamase’s fitness is controlled by factors other than free energies. Overall, our results shed a bright light on to what extent the community is justified in using thermodynamic measures to infer protein fitness as well as how applicable modern computational techniques for predicting free energies will be to the large data sets of multiply-mutated proteins forthcomingmore » « less
-
null (Ed.)Various self-cleaving ribozymes appearing in nature catalyze the sequence-specific intramolecular cleavage of RNA and can be engineered to catalyze cleavage of appropriate substrates in an intermolecular fashion, thus acting as true catalysts. The mechanisms of the small, self-cleaving ribozymes have been extensively studied and reviewed previously. Self-cleaving ribozymes can possess high catalytic activity and high substrate specificity; however, substrate specificity is also engineerable within the constraints of the ribozyme structure. While these ribozymes share a common fundamental catalytic mechanism, each ribozyme family has a unique overall architecture and active site organization, indicating that several distinct structures yield this chemical activity. The multitude of catalytic structures, combined with some flexibility in substrate specificity within each family, suggests that such catalytic RNAs, taken together, could access a wide variety of substrates. Here, we give an overview of 10 classes of self-cleaving ribozymes and capture what is understood about their substrate specificity and synthetic applications. Evolution of these ribozymes in an RNA world might be characterized by the emergence of a new ribozyme family followed by rapid adaptation or diversification for specific substrates.more » « less
An official website of the United States government

