Protein N-linked glycosylation is a post-translational modification that plays an important role in a myriad of biological processes. Computational prediction approaches serve as complementary methods for the characterization of glycosylation sites. Most of the existing predictors for N-linked glycosylation utilize the information that the glycosylation site occurs at the N-X-[S/T] sequon, where X is any amino acid except proline. Not all N-X-[S/T] sequons are glycosylated, thus the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In that regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem. Here, we report DeepNGlyPred a deep learning-based approach that encodes the positive and negative sequences in the human proteome dataset (extracted from N-GlycositeAtlas) using sequence-based features (gapped-dipeptide), predicted structural features, and evolutionary information. DeepNGlyPred produces SN, SP, MCC, and ACC of 88.62%, 73.92%, 0.60, and 79.41%, respectively on N-GlyDE independent test set, which is better than the compared approaches. These results demonstrate that DeepNGlyPred is a robust computational technique to predict N-Linked glycosylation sites confined to N-X-[S/T] sequon. DeepNGlyPred will be a useful resource for the glycobiology community.
more »
« less
LMNglyPred: prediction of human N -linked glycosylation sites using embeddings from a pre-trained protein language model
Abstract Protein N-linked glycosylation is an important post-translational mechanism in Homo sapiens, playing essential roles in many vital biological processes. It occurs at the N-X-[S/T] sequon in amino acid sequences, where X can be any amino acid except proline. However, not all N-X-[S/T] sequons are glycosylated; thus, the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In this regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem that has not been extensively addressed by the existing methods, especially in regard to the creation of negative sets and leveraging the distilled information from protein language models (pLMs). Here, we developed LMNglyPred, a deep learning-based approach, to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained pLM. LMNglyPred produces sensitivity, specificity, Matthews Correlation Coefficient, precision, and accuracy of 76.50, 75.36, 0.49, 60.99, and 75.74 percent, respectively, on a benchmark-independent test set. These results demonstrate that LMNglyPred is a robust computational tool to predict N-linked glycosylation sites confined to the N-X-[S/T] sequon.
more »
« less
- Award ID(s):
- 1901793
- PAR ID:
- 10412869
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Glycobiology
- ISSN:
- 1460-2423
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Protein N-linked glycosylation is a post-translational modification that plays an important role in a myriad of biological processes. Computational prediction approaches serve as complementary methods for the characterization of glycosylation sites. Most of the existing predictors for N-linked glycosylation utilize the information that the glycosylation site occurs at the N-X-[S/T] sequon, where X is any amino acid except proline. Not all N-X-[S/T] sequons are glycosylated, thus the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In that regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem. Here, we report DeepNGlyPred a deep learning-based approach that encodes the positive and negative sequences in the human proteome dataset (extracted from N-GlycositeAtlas) using sequence-based features (gapped-dipeptide), predicted structural features, and evolutionary information. DeepNGlyPred produces SN, SP, MCC, and ACC of 88.62%, 73.92%, 0.60, and 79.41%, respectively on N-GlyDE independent test set, which is better than the compared approaches. These results demonstrate that DeepNGlyPred is a robust computational technique to predict N-Linked glycosylation sites confined to N-X-[S/T] sequon. DeepNGlyPred will be a useful resource for the glycobiology community.more » « less
-
Site specific N- and O-glycosylation mapping of the spike proteins of SARS-CoV-2 variants of concernAbstract The glycosylation on the spike (S) protein of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes COVID-19, modulates the viral infection by altering conformational dynamics, receptor interaction and host immune responses. Several variants of concern (VOCs) of SARS-CoV-2 have evolved during the pandemic, and crucial mutations on the S protein of the virus have led to increased transmissibility and immune escape. In this study, we compare the site-specific glycosylation and overall glycomic profiles of the wild type Wuhan-Hu-1 strain (WT) S protein and five VOCs of SARS-CoV-2: Alpha, Beta, Gamma, Delta and Omicron. Interestingly, both N- and O-glycosylation sites on the S protein are highly conserved among the spike mutant variants, particularly at the sites on the receptor-binding domain (RBD). The conservation of glycosylation sites is noteworthy, as over 2 million SARS-CoV-2 S protein sequences have been reported with various amino acid mutations. Our detailed profiling of the glycosylation at each of the individual sites of the S protein across the variants revealed intriguing possible association of glycosylation pattern on the variants and their previously reported infectivity. While the sites are conserved, we observed changes in the N- and O-glycosylation profile across the variants. The newly emerged variants, which showed higher resistance to neutralizing antibodies and vaccines, displayed a decrease in the overall abundance of complex-type glycans with both fucosylation and sialylation and an increase in the oligomannose-type glycans across the sites. Among the variants, the glycosylation sites with significant changes in glycan profile were observed at both theN-terminal domain and RBD of S protein, with Omicron showing the highest deviation. The increase in oligomannose-type happens sequentially from Alpha through Delta. Interestingly, Omicron does not contain more oligomannose-type glycans compared to Delta but does contain more compared to the WT and other VOCs. O-glycosylation at the RBD showed lower occupancy in the VOCs in comparison to the WT. Our study on the sites and pattern of glycosylation on the SARS-CoV-2 S proteins across the VOCs may help to understand how the virus evolved to trick the host immune system. Our study also highlights how the SARS-CoV-2 virus has conserved bothN- andO- glycosylation sites on the S protein of the most successful variants even after undergoing extensive mutations, suggesting a correlation between infectivity/ transmissibility and glycosylation.more » « less
-
Membrane transporters of the solute carrier 6 (SLC6) family mediate various physiological processes by facilitating the translocation of amino acids, neurotransmitters, and other metabolites. In the body, the activity of these transporters is tightly controlled through various post-translational modifications with implications on protein expression, stability, membrane trafficking, and dynamics. While N-linked glycosylation is a universal regulatory mechanism among eukaryotes, a consistent mechanism of how glycosylation affects the SLC6 transporter family remains elusive. It is generally believed that glycans influence transporter stability and membrane trafficking; however, the role of glycosylation on transporter dynamics remains disputable, with differing conclusions among individual transporters across the SLC6 family. In this study, we collected over 1 ms of aggregated all-atom molecular dynamics (MD) simulation data to systematically identify the impact of N-glycans on SLC6 transporter dynamics. We modeled four human SLC6 transporters, the serotonin, dopamine, glycine, and B0AT1 transporters, by first simulating all possible combinations of a glycan attached to each glycosylation site followed by investigating the effect of larger, oligo-N-linked glycans to each transporter. The simulations reveal that glycosylation does not significantly affect the transporter structure but alters the dynamics of the glycosylated extracellular loop and surrounding regions. The structural consequences of glycosylation on the loop dynamics are further emphasized with larger glycan molecules attached. However, no apparent differences in ligand stability or movement of the gating helices were observed, and as such, the simulations suggest that glycosylation does not have a profound effect on conformational dynamics associated with substrate transport.more » « less
-
Asparagine-linked glycosylation is an essential and highly conserved protein modification reaction that occurs in the endoplasmic reticulum of cells during protein synthesis at the ribosome. In the central reaction, a pre-assembled high- mannose sugar is transferred from a lipid-linked donor substrate to the side-chain of an asparagine residue in an -N-X-T/S- sequence (where X is any residue except Proline). This reaction is carried by a membrane-bound multi-subunit enzyme complex, Oligosaccharyltransferase (OST). In humans, genetic defects in OST lead to a group of rare metabolic diseases collectively known as congenital disorders of glycosylation (CDG). Certain mutations are lethal for all organisms. In yeast, the OST is composed of nine non-identical protein subunits. The functional enzyme complex contains eight subunits with either Ost3 or Ost6 at any given time. Ost4, an unusually small protein, plays a very important role in the stabilization of the OST complex. It bridges the catalytic subunit Stt3 with Ost3 (or Ost6) in the Stt3-Ost4-Ost3 (or Ost6) sub-complex. Mutation of any residue from M18-I24 in the trans-membrane helix of yeast Ost4 negatively impacts N-linked glycosylation and the growth of yeast. Indeed, mutation of valine23 to an aspartate impairs OST function in vivo resulting in a lethal phenotype in yeast. To understand the structural mechanism of Ost4 in the stabilization of the enzyme complex, we have initiated a detailed investigation of Ost4 and its functionally important mutant, Ost4V23D. Here, we report the backbone 1H, 13C and 15N resonance assignments for Ost4 and Ost4V23D in DPC micelles.more » « less
An official website of the United States government
