Abstract Highly selective C−H functionalization remains an ongoing challenge in organic synthetic methodologies. Biocatalysts are robust tools for achieving these difficult chemical transformations. Biocatalyst engineering has often required directed evolution or structure‐based rational design campaigns to improve their activities. In recent years, machine learning has been integrated into these workflows to improve the discovery of beneficial enzyme variants. In this work, we combine a structure‐based self‐supervised machine learning framework, MutComputeX, with classical molecular dynamics simulations to down select mutations for rational design of a non‐heme iron‐dependent lysine dioxygenase, LDO. This approach consistently resulted in functional LDO mutants and circumvents the need for extensive study of mutational activity before‐hand. Our rationally designed single mutants purified with up to 2‐fold higher expression yields than WT and displayed higher total turnover numbers (TTN). Combining five such single mutations into a pentamutant variant, LPNYI LDO, leads to a 40 % improvement in the TTN (218±3) as compared to WT LDO (TTN=160±2). Overall, this work offers a low‐barrier approach for those seeking to synergize machine learning algorithms with pre‐existing protein engineering strategies.
more »
« less
Machine learning-guided co-optimization of fitness and diversity facilitates combinatorial library design in enzyme engineering
Abstract The effective design of combinatorial libraries to balance fitness and diversity facilitates the engineering of useful enzyme functions, particularly those that are poorly characterized or unknown in biology. We introduce MODIFY, a machine learning (ML) algorithm that learns from natural protein sequences to infer evolutionarily plausible mutations and predict enzyme fitness. MODIFY co-optimizes predicted fitness and sequence diversity of starting libraries, prioritizing high-fitness variants while ensuring broad sequence coverage. In silico evaluation shows that MODIFY outperforms state-of-the-art unsupervised methods in zero-shot fitness prediction and enables ML-guided directed evolution with enhanced efficiency. Using MODIFY, we engineer generalist biocatalysts derived from a thermostable cytochromecto achieve enantioselective C-B and C-Si bond formation via a new-to-nature carbene transfer mechanism, leading to biocatalysts six mutations away from previously developed enzymes while exhibiting superior or comparable activities. These results demonstrate MODIFY’s potential in solving challenging enzyme engineering problems beyond the reach of classic directed evolution.
more »
« less
- PAR ID:
- 10528377
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Nature Communications
- Volume:
- 15
- Issue:
- 1
- ISSN:
- 2041-1723
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Heterologous tRNAs used for noncanonical amino acid (ncAA) mutagenesis in mammalian cells typically show poor activity. We recently introduced a virus‐assisted directed evolution strategy (VADER) that can enrich improved tRNA mutants from naïve libraries in mammalian cells. However, VADER was limited to processing only a few thousand mutants; the inability to screen a larger sequence space precluded the identification of highly active variants with distal synergistic mutations. Here, we report VADER2.0, which can process significantly larger mutant libraries. It also employs a novel library design, which maintains base‐pairing between distant residues in the stem regions, allowing us to pack a higher density of functional mutants within a fixed sequence space. VADER2.0 enabled simultaneous engineering of the entire acceptor stem ofM. mazeipyrrolysyl tRNA (tRNAPyl), leading to a remarkably improved variant, which facilitates more efficient incorporation of a wider range of ncAAs, and enables facile development of viral vectors and stable cell‐lines for ncAA mutagenesis.more » « less
-
Abstract Directed evolution generates novel biomolecules with desired functions by iteratively diversifying the genetic sequence of wildtype biomolecules, relaying the genetic information to the molecule with function, and selecting the variants that progresses towards the properties of interest. While traditional directed evolution consumes significant labor and time for each step, continuous evolution seeks to automate all steps so directed evolution can proceed with minimum human intervention and dramatically shortened time. A major application of continuous evolution is the generation of novel enzymes, which catalyze reactions under conditions that are not favorable to their wildtype counterparts, or on altered substrates. The challenge to continuously evolve enzymes lies in automating sufficient, unbiased gene diversification, providing selection for a wide array of reaction types, and linking the genetic information to the phenotypic function. Over years of development, continuous evolution has accumulated versatile strategies to address these challenges, enabling its use as a general tool for enzyme engineering. As the capability of continuous evolution continues to expand, its impact will increase across various industries. In this review, we summarize the working mechanisms of recently developed continuous evolution strategies, discuss examples of their applications focusing on enzyme evolution, and point out their limitations and future directions.more » « less
-
Abstract Improved prodrug‐activating enzymes have the potential to increase the therapeutic efficacy of gene‐directed enzyme prodrug therapy (GDEPT). Yeast cytosine deaminase (yCD) is commonly used to convert the prodrug 5‐fluorocytosine (5‐FC) to the chemotherapeutic 5‐fluorouracil for GDEPT. Mutagenesis studies on yCD aimed at improving its application in GDEPT have been limited to subsets of residues or have sought to improve a single property of the enzyme. We performed comprehensive site‐saturation mutagenesis (CSM) on yCD designed to create all 2,983 possible unique protein mutants with a single amino acid substitution. We identified active variants throughEscherichia coligenetic complementation and screened these mutants, and combinations thereof, for increased ability to sensitizeE. coliand HT1080 fibrosarcoma cells to 5‐FC. Several mutants identified in this study showed increased sensitization ability for bothE. coliand HT1080 cells indicating that CSM is an effective directed evolution tool for identifying unexpectedly beneficial mutations.more » « less
-
Laboratory evolution combined with computational enzyme design provides the opportunity to generate novel biocatalysts. Nevertheless, it has been challenging to understand how laboratory evolution optimizes designer enzymes by introducing seemingly random mutations. A typical enzyme optimized with laboratory evolution is the abiological Kemp eliminase, initially designed by grafting active site residues into a natural protein scaffold. Here, we relate the catalytic power of laboratory-evolved Kemp eliminases to the statistical energy ( E MaxEnt ) inferred from their natural homologous sequences using the maximum entropy model. The E MaxEnt of designs generated by directed evolution is correlated with enhanced activity and reduced stability, thus displaying a stability-activity trade-off. In contrast, the E MaxEnt for mutants in catalytic-active remote regions (in which remote residues are important for catalysis) is strongly anticorrelated with the activity. These findings provide an insight into the role of protein scaffolds in the adaption to new enzymatic functions. It also indicates that the valley in the E MaxEnt landscape can guide enzyme design for abiological catalysis. Overall, the connection between laboratory and natural evolution contributes to understanding what is optimized in the laboratory and how new enzymatic function emerges in nature, and provides guidance for computational enzyme design.more » « less
An official website of the United States government
