Abstract MotivationDespite experimental and curation efforts, the extent of enzyme promiscuity on substrates continues to be largely unexplored and under documented. Providing computational tools for the exploration of the enzyme–substrate interaction space can expedite experimentation and benefit applications such as constructing synthesis pathways for novel biomolecules, identifying products of metabolism on ingested compounds, and elucidating xenobiotic metabolism. Recommender systems (RS), which are currently unexplored for the enzyme–substrate interaction prediction problem, can be utilized to provide enzyme recommendations for substrates, and vice versa. The performance of Collaborative-Filtering (CF) RSs; however, hinges on the quality of embedding vectors of users and items (enzymes and substrates in our case). Importantly, enhancing CF embeddings with heterogeneous auxiliary data, specially relational data (e.g. hierarchical, pairwise or groupings), remains a challenge. ResultsWe propose an innovative general RS framework, termed Boost-RS that enhances RS performance by ‘boosting’ embedding vectors through auxiliary data. Specifically, Boost-RS is trained and dynamically tuned on multiple relevant auxiliary learning tasks Boost-RS utilizes contrastive learning tasks to exploit relational data. To show the efficacy of Boost-RS for the enzyme–substrate prediction interaction problem, we apply the Boost-RS framework to several baseline CF models. We show that each of our auxiliary tasks boosts learning of the embedding vectors, and that contrastive learning using Boost-RS outperforms attribute concatenation and multi-label learning. We also show that Boost-RS outperforms similarity-based models. Ablation studies and visualization of learned representations highlight the importance of using contrastive learning on some of the auxiliary data in boosting the embedding vectors. Availability and implementationA Python implementation for Boost-RS is provided at https://github.com/HassounLab/Boost-RS. The enzyme-substrate interaction data is available from the KEGG database (https://www.genome.jp/kegg/).
more »
« less
Using graph neural networks for site-of-metabolism prediction and its applications to ranking promiscuous enzymatic products
Abstract MotivationWhile traditionally utilized for identifying site-specific metabolic activity within a compound to alter its interaction with a metabolizing enzyme, predicting the site-of-metabolism (SOM) is essential in analyzing the promiscuity of enzymes on substrates. The successful prediction of SOMs and the relevant promiscuous products has a wide range of applications that include creating extended metabolic models (EMMs) that account for enzyme promiscuity and the construction of novel heterologous synthesis pathways. There is therefore a need to develop generalized methods that can predict molecular SOMs for a wide range of metabolizing enzymes. ResultsThis article develops a Graph Neural Network (GNN) model for the classification of an atom (or a bond) being an SOM. Our model, GNN-SOM, is trained on enzymatic interactions, available in the KEGG database, that span all enzyme commission numbers. We demonstrate that GNN-SOM consistently outperforms baseline machine learning models, when trained on all enzymes, on Cytochrome P450 (CYP) enzymes, or on non-CYP enzymes. We showcase the utility of GNN-SOM in prioritizing predicted enzymatic products due to enzyme promiscuity for two biological applications: the construction of EMMs and the construction of synthesis pathways. Availability and implementationA python implementation of the trained SOM predictor model can be found at https://github.com/HassounLab/GNN-SOM. Supplementary informationSupplementary data are available at Bioinformatics online.
more »
« less
- Award ID(s):
- 1909536
- PAR ID:
- 10400645
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Bioinformatics
- Volume:
- 39
- Issue:
- 3
- ISSN:
- 1367-4811
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Martelli, Pier Luigi (Ed.)Abstract Motivation As experimental efforts are costly and time consuming, computational characterization of enzyme capabilities is an attractive alternative. We present and evaluate several machine-learning models to predict which of 983 distinct enzymes, as defined via the Enzyme Commission (EC) numbers, are likely to interact with a given query molecule. Our data consists of enzyme-substrate interactions from the BRENDA database. Some interactions are attributed to natural selection and involve the enzyme’s natural substrates. The majority of the interactions however involve non-natural substrates, thus reflecting promiscuous enzymatic activities. Results We frame this ‘enzyme promiscuity prediction’ problem as a multi-label classification task. We maximally utilize inhibitor and unlabeled data to train prediction models that can take advantage of known hierarchical relationships between enzyme classes. We report that a hierarchical multi-label neural network, EPP-HMCNF, is the best model for solving this problem, outperforming k-nearest neighbors similarity-based and other machine-learning models. We show that inhibitor information during training consistently improves predictive power, particularly for EPP-HMCNF. We also show that all promiscuity prediction models perform worse under a realistic data split when compared to a random data split, and when evaluating performance on non-natural substrates compared to natural substrates. Availability and implementation We provide Python code and data for EPP-HMCNF and other models in a repository termed EPP (Enzyme Promiscuity Prediction) at https://github.com/hassounlab/EPP. Supplementary information Supplementary data are available at Bioinformatics online.more » « less
-
Kamerlin, Lynn (Ed.)Abstract Millions of years of evolution have optimized many biosynthetic pathways by use of multi‐step catalysis. In addition, multi‐step metabolic pathways are commonly found in and on membrane‐bound organelles in eukaryotic biochemistry. The fundamental mechanisms that facilitate these reaction processes provide strategies to bioengineer metabolic pathways in synthetic chemistry. Using Brownian dynamics simulations, here we modeled intermediate substrate transportation of colocalized yeast–ester biosynthesis enzymes on the membrane. The substrate acetate ion traveled from the pocket of aldehyde dehydrogenase to its target enzyme acetyl‐CoA synthetase, then the substrate acetyl CoA diffused from Acs1 to the active site of the next enzyme, alcohol‐O‐acetyltransferase. Arranging two enzymes with the smallest inter‐enzyme distance of 60 Å had the fastest average substrate association time as compared with anchoring enzymes with larger inter‐enzyme distances. When the off‐target side reactions were turned on, most substrates were lost, which suggests that native localization is necessary for efficient final product synthesis. We also evaluated the effects of intermolecular interactions, local substrate concentrations, and membrane environment to bring mechanistic insights into the colocalization pathways. The computation work demonstrates that creating spatially organized multi‐enzymes on membranes can be an effective strategy to increase final product synthesis in bioengineering systems.more » « less
-
Abstract Transient plant enzyme complexes formed via protein-protein interactions (PPIs) play crucial regulatory roles in secondary metabolism. Complexes assembled on cytochrome P450s (CYPs) are challenging to characterize metabolically due to difficulties in decoupling the PPIs’ metabolic impacts from the CYPs’ catalytic activities. Here, we developed a yeast-based synthetic biology approach to elucidate the metabolic roles of PPIs between a soybean-derived CYP, isoflavone synthase (GmIFS2), and other enzymes in isoflavonoid metabolism. By reconstructing multiple complex variants with an inactive GmIFS2 in yeast, we found that GmIFS2-mediated PPIs can regulate metabolic flux between two competing pathways producing deoxyisoflavonoids and isoflavonoids. Specifically, GmIFS2 can recruit chalcone synthase (GmCHS7) and chalcone reductase (GmCHR5) to enhance deoxyisoflavonoid production or GmCHS7 and chalcone isomerase (GmCHI1B1) to enhance isoflavonoid production. Additionally, we identified and characterized two novel isoflavoneO-methyltransferases interacting with GmIFS2. This study highlights the potential of yeast synthetic biology for characterizing CYP-mediated complexes.more » « less
-
Specialized metabolites are structurally diverse and cell‐ or tissue‐specific molecules produced in restricted plant lineages. In contrast, primary metabolic pathways are highly conserved in plants and produce metabolites essential for all of life, such as amino acids and nucleotides. Substrate promiscuity – the capacity to accept non‐native substrates – is a common characteristic of enzymes, and its impact is especially apparent in generating specialized metabolite variation. However, promiscuity only leads to metabolic diversity when alternative substrates are available; thus, enzyme cellular and subcellular localization directly influence chemical phenotypes. We review a variety of mechanisms that modulate substrate availability for promiscuous plant enzymes. We focus on examples where evolution led to modification of the ‘cellular context’ through changes in cell‐type expression, subcellular relocalization, pathway sequestration, and cellular mixing via tissue damage. These varied mechanisms contributed to the emergence of structurally diverse plant specialized metabolites and inform future metabolic engineering approaches.more » « less
An official website of the United States government
