Abstract MotivationCarbohydrate-active enzymes (CAZymes) are extremely important to bioenergy, human gut microbiome, and plant pathogen researches and industries. Here we developed a new amino acid k-mer-based CAZyme classification, motif identification and genome annotation tool using a bipartite network algorithm. Using this tool, we classified 390 CAZyme families into thousands of subfamilies each with distinguishing k-mer peptides. These k-mers represented the characteristic motifs (in the form of a collection of conserved short peptides) of each subfamily, and thus were further used to annotate new genomes for CAZymes. This idea was also generalized to extract characteristic k-mer peptides for all the Swiss-Prot enzymes classified by the EC (enzyme commission) numbers and applied to enzyme EC prediction. ResultsThis new tool was implemented as a Python package named eCAMI. Benchmark analysis of eCAMI against the state-of-the-art tools on CAZyme and enzyme EC datasets found that: (i) eCAMI has the best performance in terms of accuracy and memory use for CAZyme and enzyme EC classification and annotation; (ii) the k-mer-based tools (including PPR-Hotpep, CUPP and eCAMI) perform better than homology-based tools and deep-learning tools in enzyme EC prediction. Lastly, we confirmed that the k-mer-based tools have the unique ability to identify the characteristic k-mer peptides in the predicted enzymes. Availability and implementationhttps://github.com/yinlabniu/eCAMI and https://github.com/zhanglabNKU/eCAMI. Supplementary informationSupplementary data are available at Bioinformatics online.
more »
« less
Evidential deep learning for trustworthy prediction of enzyme commission number
Abstract The rapid growth of uncharacterized enzymes and their functional diversity urge accurate and trustworthy computational functional annotation tools. However, current state-of-the-art models lack trustworthiness on the prediction of the multilabel classification problem with thousands of classes. Here, we demonstrate that a novel evidential deep learning model (named ECPICK) makes trustworthy predictions of enzyme commission (EC) numbers with data-driven domain-relevant evidence, which results in significantly enhanced predictive power and the capability to discover potential new motif sites. ECPICK learns complex sequential patterns of amino acids and their hierarchical structures from 20 million enzyme data. ECPICK identifies significant amino acids that contribute to the prediction without multiple sequence alignment. Our intensive assessment showed not only outstanding enhancement of predictive performance on the largest databases of Uniprot, Protein Data Bank (PDB) and Kyoto Encyclopedia of Genes and Genomes (KEGG), but also a capability to discover new motif sites in microorganisms. ECPICK is a reliable EC number prediction tool to identify protein functions of an increasing number of uncharacterized enzymes.
more »
« less
- Award ID(s):
- 2117941
- PAR ID:
- 10548060
- Publisher / Repository:
- OXford
- Date Published:
- Journal Name:
- Briefings in Bioinformatics
- Volume:
- 25
- Issue:
- 1
- ISSN:
- 1467-5463
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Photoenzymatic catalysts are attractive for stereoselective radical reactions because the transformation occurs within tunable enzyme active sites. When using flavoproteins for non-natural photoenzymatic reactions, reductive mechanisms are often used for radical initiation. Oxidative mechanisms for radical formation would enable abundant functional groups, such as amines and carboxylic acids, to serve as radical precursors. However, excited state flavin is short-lived in many proteins because of rapid quenching by the protein scaffold. Here we report that adding an exogenous Ru(bpy)3 2+ cofactor to flavin-dependent ‘ene’-reductases enables the redox-neutral decarboxylative coupling of amino acids with vinylpyridines with high yield and enantioselectivity. Additionally, stereo-complementary enzymes are found to provide access to both enantiomers of the product. Mechanistic studies indicate that Ru(bpy)3 2+ binds to the protein, helping to localize radical formation to the enzyme’s active site. This work expands the types of transformation that can be rendered asymmetric using photoenzymatic catalysis and provides an intriguing mechanism of radical initiation.more » « less
-
Abstract The absence of orthogonal aminoacyl-transfer RNA (tRNA) synthetases that accept non-l-α-amino acids is a primary bottleneck hindering the in vivo translation of sequence-defined hetero-oligomers and biomaterials. Here we report that pyrrolysyl-tRNA synthetase (PylRS) and certain PylRS variants accept α-hydroxy, α-thio andN-formyl-l-α-amino acids, as well as α-carboxy acid monomers that are precursors to polyketide natural products. These monomers are accommodated and accepted by the translation apparatus in vitro; those with reactive nucleophiles are incorporated into proteins in vivo. High-resolution structural analysis of the complex formed between one PylRS enzyme and am-substituted 2-benzylmalonic acid derivative revealed an active site that discriminates prochiral carboxylates and accommodates the large size and distinct electrostatics of an α-carboxy substituent. This work emphasizes the potential of PylRS-derived enzymes for acylating tRNA with monomers whose α-substituent diverges substantially from the α-amine of proteinogenic amino acids. These enzymes or derivatives thereof could synergize with natural or evolved ribosomes and/or translation factors to generate diverse sequence-defined non-protein heteropolymers.more » « less
-
Ribosomally synthesized and post-translationally modified peptides (RiPPs) are a growing class of natural products biosynthesized from a genetically encoded precursor peptide. The enzymes that install the post-translational modifications on these peptides have the potential to be useful catalysts in the production of natural-product-like compounds and can install non-proteogenic amino acids in peptides and proteins. However, engineering these enzymes has been somewhat limited, due in part to limited structural information on enzymes in the same families that nonetheless exhibit different substrate selectivities. Despite AlphaFold2’s superior performance in single-chain protein structure prediction, its multimer version lacks accuracy and requires high-end GPUs, which are not typically available to most research groups. Additionally, the default parameters of AlphaFold2 may not be optimal for predicting complex structures like RiPP biosynthetic enzymes, due to their dynamic binding and substrate-modifying mechanisms. This study assessed the efficacy of the structure prediction program ColabFold (a variant of AlphaFold2) in modeling RiPP biosynthetic enzymes in both monomeric and dimeric forms. After extensive benchmarking, it was found that there were no statistically significant differences in the accuracy of the predicted structures, regardless of the various possible prediction parameters that were examined, and that with the default parameters, ColabFold was able to produce accurate models. We then generated additional structural predictions for select RiPP biosynthetic enzymes from multiple protein families and biosynthetic pathways. Our findings can serve as a reference for future enzyme engineering complemented by AlphaFold-related tools.more » « less
-
Enzymes have evolved to catalyse challenging chemical transformations with high efficiency and selectivity. Although a number of artificial systems have been developed to recapitulate the catalytic activity of natural enzymes, they are mostly limited to catalysing relatively simple reactions owing to their ability to mimic only the active metal centres of natural enzymes, without incorporating the proximal amino acids or cofactors. Here we report a metal–organic framework-based artificial enzyme (metal–organic–zyme, MOZ) by integrating active metal centres, proximal amino acids and other cofactors into a tunable metal–organic framework monolayer. We design two libraries of MOZs to perform photocatalytic CO2 reduction and water oxidation reactions. Through tuning the incorporated amino acids in the MOZs, we systematically optimize the activity and selectivity of these libraries. Combining these optimized MOZs into a single system realizes complete artificial photosynthesis in the reaction of (1 + n) CO2 + 2H2O → CH4 + nCO + (2 + n/2)O2.more » « less