skip to main content


Title: DeepNGlyPred: A Deep Neural Network-Based Approach for Human N-Linked Glycosylation Site Prediction
Protein N-linked glycosylation is a post-translational modification that plays an important role in a myriad of biological processes. Computational prediction approaches serve as complementary methods for the characterization of glycosylation sites. Most of the existing predictors for N-linked glycosylation utilize the information that the glycosylation site occurs at the N-X-[S/T] sequon, where X is any amino acid except proline. Not all N-X-[S/T] sequons are glycosylated, thus the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In that regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem. Here, we report DeepNGlyPred a deep learning-based approach that encodes the positive and negative sequences in the human proteome dataset (extracted from N-GlycositeAtlas) using sequence-based features (gapped-dipeptide), predicted structural features, and evolutionary information. DeepNGlyPred produces SN, SP, MCC, and ACC of 88.62%, 73.92%, 0.60, and 79.41%, respectively on N-GlyDE independent test set, which is better than the compared approaches. These results demonstrate that DeepNGlyPred is a robust computational technique to predict N-Linked glycosylation sites confined to N-X-[S/T] sequon. DeepNGlyPred will be a useful resource for the glycobiology community.  more » « less
Award ID(s):
1901793 2210356
NSF-PAR ID:
10336442
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Molecules
Volume:
26
Issue:
23
ISSN:
1420-3049
Page Range / eLocation ID:
7314
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Protein N-linked glycosylation is a post-translational modification that plays an important role in a myriad of biological processes. Computational prediction approaches serve as complementary methods for the characterization of glycosylation sites. Most of the existing predictors for N-linked glycosylation utilize the information that the glycosylation site occurs at the N-X-[S/T] sequon, where X is any amino acid except proline. Not all N-X-[S/T] sequons are glycosylated, thus the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In that regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem. Here, we report DeepNGlyPred a deep learning-based approach that encodes the positive and negative sequences in the human proteome dataset (extracted from N-GlycositeAtlas) using sequence-based features (gapped-dipeptide), predicted structural features, and evolutionary information. DeepNGlyPred produces SN, SP, MCC, and ACC of 88.62%, 73.92%, 0.60, and 79.41%, respectively on N-GlyDE independent test set, which is better than the compared approaches. These results demonstrate that DeepNGlyPred is a robust computational technique to predict N-Linked glycosylation sites confined to N-X-[S/T] sequon. DeepNGlyPred will be a useful resource for the glycobiology community. 
    more » « less
  2. Abstract

    Protein N-linked glycosylation is an important post-translational mechanism in Homo sapiens, playing essential roles in many vital biological processes. It occurs at the N-X-[S/T] sequon in amino acid sequences, where X can be any amino acid except proline. However, not all N-X-[S/T] sequons are glycosylated; thus, the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In this regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem that has not been extensively addressed by the existing methods, especially in regard to the creation of negative sets and leveraging the distilled information from protein language models (pLMs). Here, we developed LMNglyPred, a deep learning-based approach, to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained pLM. LMNglyPred produces sensitivity, specificity, Matthews Correlation Coefficient, precision, and accuracy of 76.50, 75.36, 0.49, 60.99, and 75.74 percent, respectively, on a benchmark-independent test set. These results demonstrate that LMNglyPred is a robust computational tool to predict N-linked glycosylation sites confined to the N-X-[S/T] sequon.

     
    more » « less
  3. null (Ed.)
    Abstract The spike (S) glycoprotein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the binding to the permissive cells. The receptor-binding domain (RBD) of SARS-CoV-2 S protein directly interacts with the human angiotensin-converting enzyme 2 (ACE2) on the host cell membrane. In this study, we used computational saturation mutagenesis approaches, including structure-based energy calculations and sequence-based pathogenicity predictions, to quantify the systemic effects of missense mutations on SARS-CoV-2 S protein structure and function. A total of 18 354 mutations in S protein were analyzed, and we discovered that most of these mutations could destabilize the entire S protein and its RBD. Specifically, residues G431 and S514 in SARS-CoV-2 RBD are important for S protein stability. We analyzed 384 experimentally verified S missense variations and revealed that the dominant pandemic form, D614G, can stabilize the entire S protein. Moreover, many mutations in N-linked glycosylation sites can increase the stability of the S protein. In addition, we investigated 3705 mutations in SARS-CoV-2 RBD and 11 324 mutations in human ACE2 and found that SARS-CoV-2 neighbor residues G496 and F497 and ACE2 residues D355 and Y41 are critical for the RBD–ACE2 interaction. The findings comprehensively provide potential target sites in the development of drugs and vaccines against COVID-19. 
    more » « less
  4. Asparagine-linked glycosylation is an essential and highly conserved protein modification reaction that occurs in the endoplasmic reticulum of cells during protein synthesis at the ribosome. In the central reaction, a pre-assembled high- mannose sugar is transferred from a lipid-linked donor substrate to the side-chain of an asparagine residue in an -N-X-T/S- sequence (where X is any residue except Proline). This reaction is carried by a membrane-bound multi-subunit enzyme complex, Oligosaccharyltransferase (OST). In humans, genetic defects in OST lead to a group of rare metabolic diseases collectively known as congenital disorders of glycosylation (CDG). Certain mutations are lethal for all organisms. In yeast, the OST is composed of nine non-identical protein subunits. The functional enzyme complex contains eight subunits with either Ost3 or Ost6 at any given time. Ost4, an unusually small protein, plays a very important role in the stabilization of the OST complex. It bridges the catalytic subunit Stt3 with Ost3 (or Ost6) in the Stt3-Ost4-Ost3 (or Ost6) sub-complex. Mutation of any residue from M18-I24 in the trans-membrane helix of yeast Ost4 negatively impacts N-linked glycosylation and the growth of yeast. Indeed, mutation of valine23 to an aspartate impairs OST function in vivo resulting in a lethal phenotype in yeast. To understand the structural mechanism of Ost4 in the stabilization of the enzyme complex, we have initiated a detailed investigation of Ost4 and its functionally important mutant, Ost4V23D. Here, we report the backbone 1H, 13C and 15N resonance assignments for Ost4 and Ost4V23D in DPC micelles. 
    more » « less
  5. Abstract

    N-linked protein glycosylation is a post-translational modification that exists in all domains of life. It involves two consecutive steps: (i) biosynthesis of a lipid-linked oligosaccharide (LLO), and (ii) glycan transfer from the LLO to asparagine residues in secretory proteins, which is catalyzed by the integral membrane enzyme oligosaccharyltransferase (OST). In the last decade, structural and functional studies of the N-glycosylation machinery have increased our mechanistic understanding of the pathway. The structures of bacterial and eukaryotic glycosyltransferases involved in LLO elongation provided an insight into the mechanism of LLO biosynthesis, whereas structures of OST enzymes revealed the molecular basis of sequon recognition and catalysis. In this review, we will discuss approaches used and insight obtained from these studies with a special emphasis on the design and preparation of substrate analogs.

     
    more » « less