skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Machine learning for the identification of respiratory viral attachment machinery from sequences data
At the outset of an emergent viral respiratory pandemic, sequence data is among the first molecular information available. As viral attachment machinery is a key target for therapeutic and prophylactic interventions, rapid identification of viral “spike” proteins from sequence can significantly accelerate the development of medical countermeasures. For six families of respiratory viruses, covering the vast majority of airborne and droplet-transmitted diseases, host cell entry is mediated by the binding of viral surface glycoproteins that interact with a host cell receptor. In this report it is shown that sequence data for an unknown virus belonging to one of the six families above provides sufficient information to identify the protein(s) responsible for viral attachment. Random forest models that take as input a set of respiratory viral sequences can classify the protein as “spike” vs. non-spike based on predicted secondary structure elements alone (with 97.3% correctly classified) or in combination with N-glycosylation related features (with 97.0% correctly classified). Models were validated through 10-fold cross-validation, bootstrapping on a class-balanced set, and an out-of-sample extra-familial validation set. Surprisingly, we showed that secondary structural elements and N-glycosylation features were sufficient for model generation. The ability to rapidly identify viral attachment machinery directly from sequence data holds the potential to accelerate the design of medical countermeasures for future pandemics. Furthermore, this approach may be extendable for the identification of other potential viral targets and for viral sequence annotation in general in the future.  more » « less
Award ID(s):
2200052
PAR ID:
10460887
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Krishnan, Viswanathan V.
Date Published:
Journal Name:
PLOS ONE
Volume:
18
Issue:
3
ISSN:
1932-6203
Page Range / eLocation ID:
e0281642
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is highly contagious, and transmission involves a series of processes that may be targeted by vaccines and therapeutics. During transmission, host cell invasion is controlled by a large-scale (200–300 Å) conformational change of the Spike protein. This conformational rearrangement leads to membrane fusion, which creates transmembrane pores through which the viral genome is passed to the host. During Spike-protein-mediated fusion, the fusion peptides must be released from the core of the protein and associate with the host membrane. While infection relies on this transition between the prefusion and postfusion conformations, there has yet to be a biophysical characterization reported for this rearrangement. That is, structures are available for the endpoints, though the intermediate conformational processes have not been described. Interestingly, the Spike protein possesses many post-translational modifications, in the form of branched glycans that flank the surface of the assembly. With the current lack of data on the pre-to-post transition, the precise role of glycans during cell invasion has also remained unclear. To provide an initial mechanistic description of the pre-to-post rearrangement, an all-atom model with simplified energetics was used to perform thousands of simulations in which the protein transitions between the prefusion and postfusion conformations. These simulations indicate that the steric composition of the glycans can induce a pause during the Spike protein conformational change. We additionally show that this glycan-induced delay provides a critical opportunity for the fusion peptides to capture the host cell. In contrast, in the absence of glycans, the viral particle would likely fail to enter the host. This analysis reveals how the glycosylation state can regulate infectivity, while providing a much-needed structural framework for studying the dynamics of this pervasive pathogen. 
    more » « less
  2. Abstract The glycosylation on the spike (S) protein of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes COVID-19, modulates the viral infection by altering conformational dynamics, receptor interaction and host immune responses. Several variants of concern (VOCs) of SARS-CoV-2 have evolved during the pandemic, and crucial mutations on the S protein of the virus have led to increased transmissibility and immune escape. In this study, we compare the site-specific glycosylation and overall glycomic profiles of the wild type Wuhan-Hu-1 strain (WT) S protein and five VOCs of SARS-CoV-2: Alpha, Beta, Gamma, Delta and Omicron. Interestingly, both N- and O-glycosylation sites on the S protein are highly conserved among the spike mutant variants, particularly at the sites on the receptor-binding domain (RBD). The conservation of glycosylation sites is noteworthy, as over 2 million SARS-CoV-2 S protein sequences have been reported with various amino acid mutations. Our detailed profiling of the glycosylation at each of the individual sites of the S protein across the variants revealed intriguing possible association of glycosylation pattern on the variants and their previously reported infectivity. While the sites are conserved, we observed changes in the N- and O-glycosylation profile across the variants. The newly emerged variants, which showed higher resistance to neutralizing antibodies and vaccines, displayed a decrease in the overall abundance of complex-type glycans with both fucosylation and sialylation and an increase in the oligomannose-type glycans across the sites. Among the variants, the glycosylation sites with significant changes in glycan profile were observed at both theN-terminal domain and RBD of S protein, with Omicron showing the highest deviation. The increase in oligomannose-type happens sequentially from Alpha through Delta. Interestingly, Omicron does not contain more oligomannose-type glycans compared to Delta but does contain more compared to the WT and other VOCs. O-glycosylation at the RBD showed lower occupancy in the VOCs in comparison to the WT. Our study on the sites and pattern of glycosylation on the SARS-CoV-2 S proteins across the VOCs may help to understand how the virus evolved to trick the host immune system. Our study also highlights how the SARS-CoV-2 virus has conserved bothN- andO- glycosylation sites on the S protein of the most successful variants even after undergoing extensive mutations, suggesting a correlation between infectivity/ transmissibility and glycosylation. 
    more » « less
  3. Severe Acute respiratory syndrome coronavirus (SARS-CoV-1) attaches to the host cell surface to initiate the interaction between the receptor-binding domain (RBD) of its spike glycoprotein (S) and the human Angiotensin-converting enzyme (hACE2) receptor. SARS-CoV-1 mutates frequently because of its RNA genome, which challenges the antiviral development. Here, we per-formed computational saturation mutagenesis of the S protein of SARS-CoV-1 to identify the residues crucial for its functions. We used the structure-based energy calculations to analyze the effects of the missense mutations on the SARS-CoV-1 S stability and the binding affinity with hACE2. The sequence and structure alignment showed similarities between the S proteins of SARS-CoV-1 and SARS-CoV-2. Interestingly, we found that target mutations of S protein amino acids generate similar effects on their stabilities between SARS-CoV-1 and SARS-CoV-2. For example, G839W of SARS-CoV-1 corresponds to G857W of SARS-CoV-2, which decrease the stability of their S glycoproteins. The viral mutation analysis of the two different SARS-CoV-1 isolates showed that mutations, T487S and L472P, weakened the S-hACE2 binding of the 2003–2004 SARS-CoV-1 isolate. In addition, the mutations of L472P and F360S destabilized the 2003–2004 viral isolate. We further predicted that many mutations on N-linked glycosylation sites would increase the stability of the S glycoprotein. Our results can be of therapeutic importance in the design of antivirals or vaccines against SARS-CoV-1 and SARS-CoV-2. 
    more » « less
  4. We develop a generalizable AI-driven workflow that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems. We use this workflow to investigate the mechanisms of infectivity of the SARS-CoV-2 spike protein, the main viral infection machinery. Our workflow enables more efficient investigation of spike dynamics in a variety of complex environments, including within a complete SARS-CoV-2 viral envelope simulation, which contains 305 million atoms and shows strong scaling on ORNL Summit using NAMD. We present several novel scientific discoveries, including the elucidation of the spike’s full glycan shield, the role of spike glycans in modulating the infectivity of the virus, and the characterization of the flexible interactions between the spike and the human ACE2 receptor. We also demonstrate how AI can accelerate conformational sampling across different systems and pave the way for the future application of such methods to additional studies in SARS-CoV-2 and other molecular systems. 
    more » « less
  5. Prasad, Vinayaka R. (Ed.)
    ABSTRACT The ongoing coronavirus disease 2019 (COVID-19) pandemic demonstrates the threat posed by novel coronaviruses to human health. Coronaviruses share a highly conserved cell entry mechanism mediated by the spike protein, the sole product of the S gene. The structural dynamics by which the spike protein orchestrates infection illuminate how antibodies neutralize virions and how S mutations contribute to viral fitness. Here, we review the process by which spike engages its proteinaceous receptor, angiotensin converting enzyme 2 (ACE2), and how host proteases prime and subsequently enable efficient membrane fusion between virions and target cells. We highlight mutations common among severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants of concern and discuss implications for cell entry. Ultimately, we provide a model by which sarbecoviruses are activated for fusion competency and offer a framework for understanding the interplay between humoral immunity and the molecular evolution of the SARS-CoV-2 Spike. In particular, we emphasize the relevance of the Canyon Hypothesis (M. G. Rossmann, J Biol Chem 264:14587–14590, 1989) for understanding evolutionary trajectories of viral entry proteins during sustained intraspecies transmission of a novel viral pathogen. 
    more » « less