skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Evidential deep learning for trustworthy prediction of enzyme commission number
Abstract The rapid growth of uncharacterized enzymes and their functional diversity urge accurate and trustworthy computational functional annotation tools. However, current state-of-the-art models lack trustworthiness on the prediction of the multilabel classification problem with thousands of classes. Here, we demonstrate that a novel evidential deep learning model (named ECPICK) makes trustworthy predictions of enzyme commission (EC) numbers with data-driven domain-relevant evidence, which results in significantly enhanced predictive power and the capability to discover potential new motif sites. ECPICK learns complex sequential patterns of amino acids and their hierarchical structures from 20 million enzyme data. ECPICK identifies significant amino acids that contribute to the prediction without multiple sequence alignment. Our intensive assessment showed not only outstanding enhancement of predictive performance on the largest databases of Uniprot, Protein Data Bank (PDB) and Kyoto Encyclopedia of Genes and Genomes (KEGG), but also a capability to discover new motif sites in microorganisms. ECPICK is a reliable EC number prediction tool to identify protein functions of an increasing number of uncharacterized enzymes.  more » « less
Award ID(s):
2117941
PAR ID:
10548060
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
OXford
Date Published:
Journal Name:
Briefings in Bioinformatics
Volume:
25
Issue:
1
ISSN:
1467-5463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract MotivationCarbohydrate-active enzymes (CAZymes) are extremely important to bioenergy, human gut microbiome, and plant pathogen researches and industries. Here we developed a new amino acid k-mer-based CAZyme classification, motif identification and genome annotation tool using a bipartite network algorithm. Using this tool, we classified 390 CAZyme families into thousands of subfamilies each with distinguishing k-mer peptides. These k-mers represented the characteristic motifs (in the form of a collection of conserved short peptides) of each subfamily, and thus were further used to annotate new genomes for CAZymes. This idea was also generalized to extract characteristic k-mer peptides for all the Swiss-Prot enzymes classified by the EC (enzyme commission) numbers and applied to enzyme EC prediction. ResultsThis new tool was implemented as a Python package named eCAMI. Benchmark analysis of eCAMI against the state-of-the-art tools on CAZyme and enzyme EC datasets found that: (i) eCAMI has the best performance in terms of accuracy and memory use for CAZyme and enzyme EC classification and annotation; (ii) the k-mer-based tools (including PPR-Hotpep, CUPP and eCAMI) perform better than homology-based tools and deep-learning tools in enzyme EC prediction. Lastly, we confirmed that the k-mer-based tools have the unique ability to identify the characteristic k-mer peptides in the predicted enzymes. Availability and implementationhttps://github.com/yinlabniu/eCAMI and https://github.com/zhanglabNKU/eCAMI. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  2. Enzyme function annotation is a fundamental challenge, and numerous computational tools have been developed. However, most of these tools cannot accurately predict functional annotations, such as enzyme commission (EC) number, for less-studied proteins or those with previously uncharacterized functions or multiple activities. We present a machine learning algorithm named CLEAN (contrastive learning–enabled enzyme annotation) to assign EC numbers to enzymes with better accuracy, reliability, and sensitivity compared with the state-of-the-art tool BLASTp. The contrastive learning framework empowers CLEAN to confidently (i) annotate understudied enzymes, (ii) correct mislabeled enzymes, and (iii) identify promiscuous enzymes with two or more EC numbers—functions that we demonstrate by systematic in silico and in vitro experiments. We anticipate that this tool will be widely used for predicting the functions of uncharacterized enzymes, thereby advancing many fields, such as genomics, synthetic biology, and biocatalysis. 
    more » « less
  3. Abstract Bacterial sortases are a family of cysteine transpeptidases in Gram‐positive bacteria of which sortase A (SrtA) enzymes are responsible for ligating proteins to the peptidoglycan layer of the cell surface. Engineered versions of sortases are also used in sortase‐mediated ligation (SML) strategies for a variety of protein engineering applications. Although a versatile tool, substrate recognition byStaphylococcus aureusSrtA (saSrtA), the most commonly utilized enzyme in SML, is stringent and relies on an LPXTG pentapeptide motif. Previous structural studies revealed that the requirement of a glycine in the binding motif may be due to potential steric hindrance of amino acids possessing a β‐carbon by W194, a tryptophan located in the β7‐β8 loop of the enzyme. Here, we measured the effect of seven single point mutants of W194 (A, D, F, G, N, S, Y) saSrtA using a FRET‐based activity assay. We found that while the LPXTG motif remains a requirement for initial proteolytic cleavage, the nucleophile specificity of our variants is altered. In particular, W194A and W194S saSrtA recognize a D‐Ala nucleophile and are able to perform ligation reactions. Notably, an LPXT(D‐Ala) peptide was not cleaved by either mutant enzyme. We hypothesize that these variants may potentially be utilized to develop an irreversible sortase‐mediated reaction. Taken together, this experiment reveals new insight into sortase specificity and possible future SML strategies. 
    more » « less
  4. Photoenzymatic catalysts are attractive for stereoselective radical reactions because the transformation occurs within tunable enzyme active sites. When using flavoproteins for non-natural photoenzymatic reactions, reductive mechanisms are often used for radical initiation. Oxidative mechanisms for radical formation would enable abundant functional groups, such as amines and carboxylic acids, to serve as radical precursors. However, excited state flavin is short-lived in many proteins because of rapid quenching by the protein scaffold. Here we report that adding an exogenous Ru(bpy)3 2+ cofactor to flavin-dependent ‘ene’-reductases enables the redox-neutral decarboxylative coupling of amino acids with vinylpyridines with high yield and enantioselectivity. Additionally, stereo-complementary enzymes are found to provide access to both enantiomers of the product. Mechanistic studies indicate that Ru(bpy)3 2+ binds to the protein, helping to localize radical formation to the enzyme’s active site. This work expands the types of transformation that can be rendered asymmetric using photoenzymatic catalysis and provides an intriguing mechanism of radical initiation. 
    more » « less
  5. Abstract The absence of orthogonal aminoacyl-transfer RNA (tRNA) synthetases that accept non-l-α-amino acids is a primary bottleneck hindering the in vivo translation of sequence-defined hetero-oligomers and biomaterials. Here we report that pyrrolysyl-tRNA synthetase (PylRS) and certain PylRS variants accept α-hydroxy, α-thio andN-formyl-l-α-amino acids, as well as α-carboxy acid monomers that are precursors to polyketide natural products. These monomers are accommodated and accepted by the translation apparatus in vitro; those with reactive nucleophiles are incorporated into proteins in vivo. High-resolution structural analysis of the complex formed between one PylRS enzyme and am-substituted 2-benzylmalonic acid derivative revealed an active site that discriminates prochiral carboxylates and accommodates the large size and distinct electrostatics of an α-carboxy substituent. This work emphasizes the potential of PylRS-derived enzymes for acylating tRNA with monomers whose α-substituent diverges substantially from the α-amine of proteinogenic amino acids. These enzymes or derivatives thereof could synergize with natural or evolved ribosomes and/or translation factors to generate diverse sequence-defined non-protein heteropolymers. 
    more » « less