skip to main content


Title: SARS-Arena: Sequence and Structure-Guided Selection of Conserved Peptides from SARS-related Coronaviruses for Novel Vaccine Development
The pandemic caused by the SARS-CoV-2 virus, the agent responsible for the COVID-19 disease, has affected millions of people worldwide. There is constant search for new therapies to either prevent or mitigate the disease. Fortunately, we have observed the successful development of multiple vaccines. Most of them are focused on one viral envelope protein, the spike protein. However, such focused approaches may contribute for the rise of new variants, fueled by the constant selection pressure on envelope proteins, and the widespread dispersion of coronaviruses in nature. Therefore, it is important to examine other proteins, preferentially those that are less susceptible to selection pressure, such as the nucleocapsid (N) protein. Even though the N protein is less accessible to humoral response, peptides from its conserved regions can be presented by class I Human Leukocyte Antigen (HLA) molecules, eliciting an immune response mediated by T-cells. Given the increased number of protein sequences deposited in biological databases daily and the N protein conservation among viral strains, computational methods can be leveraged to discover potential new targets for SARS-CoV-2 and SARS-CoV-related viruses. Here we developed SARS-Arena, a user-friendly computational pipeline that can be used by practitioners of different levels of expertise for novel vaccine development. SARS-Arena combines sequence-based methods and structure-based analyses to (i) perform multiple sequence alignment (MSA) of SARS-CoV-related N protein sequences, (ii) recover candidate peptides of different lengths from conserved protein regions, and (iii) model the 3D structure of the conserved peptides in the context of different HLAs. We present two main Jupyter Notebook workflows that can help in the identification of new T-cell targets against SARS-CoV viruses. In fact, in a cross-reactive case study, our workflows identified a conserved N protein peptide (SPRWYFYYL) recognized by CD8 + T-cells in the context of HLA-B7 + . SARS-Arena is available at https://github.com/KavrakiLab/SARS-Arena .  more » « less
Award ID(s):
2033262
NSF-PAR ID:
10359084
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Frontiers in Immunology
Volume:
13
ISSN:
1664-3224
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The COVID-19 pandemic caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has spurred unprecedented and concerted worldwide research to curtail and eradicate this pathogen. SARS-CoV-2 has four structural proteins: Envelope (E), Membrane (M), Nucleocapsid (N), and Spike (S), which self-assemble along with its RNA into the infectious virus by budding from intracellular lipid membranes. In this paper, we develop a model to explore the mechanisms of RNA condensation by structural proteins, protein oligomerization and cellular membrane–protein interactions that control the budding process and the ultimate virus structure. Using molecular dynamics simulations, we have deciphered how the positively charged N proteins interact and condense the very long genomic RNA resulting in its packaging by a lipid envelope decorated with structural proteins inside a host cell. Furthermore, considering the length of RNA and the size of the virus, we find that the intrinsic curvature of M proteins is essential for virus budding. While most current research has focused on the S protein, which is responsible for viral entry, and it has been motivated by the need to develop efficacious vaccines, the development of resistance through mutations in this crucial protein makes it essential to elucidate the details of the viral life cycle to identify other drug targets for future therapy. Our simulations will provide insight into the viral life cycle through the assembly of viral particles de novo and potentially identify therapeutic targets for future drug development. 
    more » « less
  2. Abstract Coronavirus disease (COVID-19) is a contagious respiratory disease caused by the SARS-CoV-2 virus. The clinical phenotypes are variable, ranging from spontaneous recovery to serious illness and death. On March 2020, a global COVID-19 pandemic was declared by the World Health Organization (WHO). As of February 2023, almost 670 million cases and 6,8 million deaths have been confirmed worldwide. Coronaviruses, including SARS-CoV-2, contain a single-stranded RNA genome enclosed in a viral capsid consisting of four structural proteins: the nucleocapsid (N) protein, in the ribonucleoprotein core, the spike (S) protein, the envelope (E) protein, and the membrane (M) protein, embedded in the surface envelope. In particular, the E protein is a poorly characterized viroporin with high identity amongst all the β-coronaviruses (SARS-CoV-2, SARS-CoV, MERS-CoV, HCoV-OC43) and a low mutation rate. Here, we focused our attention on the study of SARS-CoV-2 E and M proteins, and we found a general perturbation of the host cell calcium (Ca 2+ ) homeostasis and a selective rearrangement of the interorganelle contact sites. In vitro and in vivo biochemical analyses revealed that the binding of specific nanobodies to soluble regions of SARS-CoV-2 E protein reversed the observed phenotypes, suggesting that the E protein might be an important therapeutic candidate not only for vaccine development, but also for the clinical management of COVID designing drug regimens that, so far, are very limited. 
    more » « less
  3. SARS-CoV-2, the cause of COVID-19, is a new, highly pathogenic coronavirus, which is the third coronavirus to emerge in the past 2 decades and the first to become a global pandemic. The virus has demonstrated itself to be extremely transmissible and deadly. Recent data suggest that a targeted approach is key to mitigating infectivity. Due to the proliferation of cataloged protein and nucleic acid sequences in databases, the function of the nucleic acid, and genetic encoded proteins, we make predictions by simply aligning sequences and exploring their homology. Thus, similar amino acid sequences in a protein usually confer similar biochemical function, even from distal or unrelated organisms. To understand viral transmission and adhesion, it is key to elucidate the structural, surface, and functional properties of each viral protein. This is typically first modeled in highly pathogenic species by exploring folding, hydrophobicity, and isoelectric point (IEP). Recent evidence from viral RNA sequence modeling and protein crystals have been inadequate, which prevent full understanding of the IEP and other viral properties of SARS-CoV-2. We have thus experimentally determined the IEP of SARS-CoV-2. Our findings suggest that for enveloped viruses, such as SARS-CoV-2, estimates of IEP by the amino acid sequence alone may be unreliable. We compared the experimental IEP of SARS-CoV-2 to variants of interest (VOIs) using their amino acid sequence, thus providing a qualitative comparison of the IEP of VOIs. 
    more » « less
  4. Specific lipid–protein interactions are key for cellular processes, and even more so for the replication of pathogens. The COVID-19 pandemic has drastically changed our lives and caused the death of nearly four million people worldwide, as of this writing. SARS-CoV-2 is the virus that causes the disease and has been at the center of scientific research over the past year. Most of the research on the virus is focused on key players during its initial attack and entry into the cellular host; namely the S protein, its glycan shield, and its interactions with the ACE2 receptors of human cells. As cases continue to rise around the globe, and new mutants are identified, there is an urgent need to understand the mechanisms of this virus during different stages of its life cycle. Here, we consider two integral membrane proteins of SARS-CoV-2 known to be important for viral assembly and infectivity. We have used microsecond-long all-atom molecular dynamics to examine the lipid–protein and protein–protein interactions of the membrane (M) and envelope (E) structural proteins of SARS-CoV-2 in a complex membrane model. We contrast the two proposed protein complexes for each of these proteins, and quantify their effect on their local lipid environment. This ongoing work also aims to provide molecular-level understanding of the mechanisms of action of this virus to possibly aid in the design of novel treatments. 
    more » « less
  5. null (Ed.)
    Viruses such as the novel coronavirus, SARS-CoV-2, that is wreaking havoc on the world, depend on interactions of its own proteins with those of the human host cells. Relatively small changes in sequence such as between SARS-CoV and SARS-CoV-2 can dramatically change clinical phenotypes of the virus, including transmission rates and severity of the disease. On the other hand, highly dissimilar virus families such as Coronaviridae, Ebola, and HIV have overlap in functions. In this work we aim to analyze the role of protein sequence in the binding of SARS-CoV-2 virus proteins towards human proteins and compare it to that of the above other viruses. We build supervised machine learning models, using Generalized Additive Models to predict interactions based on sequence features and find that our models perform well with an AUC-PR of 0.65 in a class-skew of 1:10. Analysis of the novel predictions using an independent dataset showed statistically significant enrichment. We further map the importance of specific amino-acid sequence features in predicting binding and summarize what combinations of sequences from the virus and the host is correlated with an interaction. By analyzing the sequence-based embeddings of the interactomes from different viruses and clustering them together we find some functionally similar proteins from different viruses. For example, vif protein from HIV-1, vp24 from Ebola and orf3b from SARS-CoV all function as interferon antagonists. Furthermore, we can differentiate the functions of similar viruses, for example orf3a’s interactions are more diverged than orf7b interactions when comparing SARS-CoV and SARS-CoV-2. 
    more » « less