skip to main content


Title: Harnessing generative AI to decode enzyme catalysis and evolution for enhanced engineering
ABSTRACT

Enzymes, as paramount protein catalysts, occupy a central role in fostering remarkable progress across numerous fields. However, the intricacy of sequence-function relationships continues to obscure our grasp of enzyme behaviors and curtails our capabilities in rational enzyme engineering. Generative artificial intelligence (AI), known for its proficiency in handling intricate data distributions, holds the potential to offer novel perspectives in enzyme research. Generative models could discern elusive patterns within the vast sequence space and uncover new functional enzyme sequences. This review highlights the recent advancements in employing generative AI for enzyme sequence analysis. We delve into the impact of generative AI in predicting mutation effects on enzyme fitness, catalytic activity and stability, rationalizing the laboratory evolution of de novo enzymes, and decoding protein sequence semantics and their application in enzyme engineering. Notably, the prediction of catalytic activity and stability of enzymes using natural protein sequences serves as a vital link, indicating how enzyme catalysis shapes enzyme evolution. Overall, we foresee that the integration of generative AI into enzyme studies will remarkably enhance our knowledge of enzymes and expedite the creation of superior biocatalysts.

 
more » « less
NSF-PAR ID:
10488700
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
National Science Review
Volume:
10
Issue:
12
ISSN:
2095-5138
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Laboratory evolution combined with computational enzyme design provides the opportunity to generate novel biocatalysts. Nevertheless, it has been challenging to understand how laboratory evolution optimizes designer enzymes by introducing seemingly random mutations. A typical enzyme optimized with laboratory evolution is the abiological Kemp eliminase, initially designed by grafting active site residues into a natural protein scaffold. Here, we relate the catalytic power of laboratory-evolved Kemp eliminases to the statistical energy ( E MaxEnt ) inferred from their natural homologous sequences using the maximum entropy model. The E MaxEnt of designs generated by directed evolution is correlated with enhanced activity and reduced stability, thus displaying a stability-activity trade-off. In contrast, the E MaxEnt for mutants in catalytic-active remote regions (in which remote residues are important for catalysis) is strongly anticorrelated with the activity. These findings provide an insight into the role of protein scaffolds in the adaption to new enzymatic functions. It also indicates that the valley in the E MaxEnt landscape can guide enzyme design for abiological catalysis. Overall, the connection between laboratory and natural evolution contributes to understanding what is optimized in the laboratory and how new enzymatic function emerges in nature, and provides guidance for computational enzyme design. 
    more » « less
  2. Although computational enzyme design is of great importance, the advances utilizing physics-based approaches have been slow, and further progress is urgently needed. One promising direction is using machine learning, but such strategies have not been established as effective tools for predicting the catalytic power of enzymes. Here, we show that the statistical energy inferred from homologous sequences with the maximum entropy (MaxEnt) principle significantly correlates with enzyme catalysis and stability at the active site region and the more distant region, respectively. This finding decodes enzyme architecture and offers a connection between enzyme evolution and the physical chemistry of enzyme catalysis, and it deepens our understanding of the stability–activity trade-off hypothesis for enzymes. Overall, the strong correlations found here provide a powerful way of guiding enzyme design. 
    more » « less
  3. Gram-positive bacteria are some of the earliest known life forms, diverging from gram-negative bacteria 2 billion years ago. These organisms utilize sortase enzymes to attach proteins to their peptidoglycan cell wall, a structural feature that distinguishes the two types of bacteria. The transpeptidase activity of sortases make them an important tool in protein engineering applications, e.g., in sortase-mediated ligations or sortagging. However, due to relatively low catalytic efficiency, there are ongoing efforts to create better sortase variants for these uses. Here, we use bioinformatics tools, principal component analysis and ancestral sequence reconstruction, in combination with protein biochemistry, to analyze natural sequence variation in these enzymes. Principal component analysis on the sortase superfamily distinguishes previously described classes and identifies regions of relatively high sequence variation in structurally-conserved loops within each sortase family, including those near the active site. Using ancestral sequence reconstruction, we determined sequences of ancestral Staphylococcus and Streptococcus Class A sortase proteins. Enzyme assays revealed that the ancestral Streptococcus enzyme is relatively active and shares similar sequence variation with other Class A Streptococcus sortases. Taken together, we highlight how natural sequence variation can be utilized to investigate this important protein family, arguing that these and similar techniques may be used to discover or design sortases with increased catalytic efficiency and/or selectivity for sortase-mediated ligation experiments. 
    more » « less
  4. Abstract

    Phosphotriesterases (PTEs) represent a class of enzymes capable of efficient neutralization of organophosphates (OPs), a dangerous class of neurotoxic chemicals. PTEs suffer from low catalytic activity, particularly at higher temperatures, due to low thermostability and low solubility. Supercharging, a protein engineering approach via selective mutation of surface residues to charged residues, has been successfully employed to generate proteins with increased solubility and thermostability by promoting charge–charge repulsion between proteins. We set out to overcome the challenges in improving PTE activity against OPs by employing a computational protein supercharging algorithm in Rosetta. Here, we discover two supercharged PTE variants, one negatively supercharged (with −14 net charge) and one positively supercharged (with +12 net charge) and characterize them for their thermodynamic stability and catalytic activity. We find that positively supercharged PTE possesses slight but significant losses in thermostability, which correlates to losses in catalytic efficiency at all temperatures, whereas negatively supercharged PTE possesses increased catalytic activity across 25°C–55°C while offering similar thermostability characteristic to the parent PTE. The impact of supercharging on catalytic efficiency will inform the design of shelf-stable PTE and criteria for enzyme engineering.

     
    more » « less
  5. null (Ed.)
    Abstract Alcohol-forming fatty acyl reductases (FARs) catalyze the reduction of thioesters to alcohols and are key enzymes for microbial production of fatty alcohols. Many metabolic engineering strategies utilize FARs to produce fatty alcohols from intracellular acyl-CoA and acyl-ACP pools; however, enzyme activity, especially on acyl-ACPs, remains a significant bottleneck to high-flux production. Here, we engineer FARs with enhanced activity on acyl-ACP substrates by implementing a machine learning (ML)-driven approach to iteratively search the protein fitness landscape. Over the course of ten design-test-learn rounds, we engineer enzymes that produce over twofold more fatty alcohols than the starting natural sequences. We characterize the top sequence and show that it has an enhanced catalytic rate on palmitoyl-ACP. Finally, we analyze the sequence-function data to identify features, like the net charge near the substrate-binding site, that correlate with in vivo activity. This work demonstrates the power of ML to navigate the fitness landscape of traditionally difficult-to-engineer proteins. 
    more » « less