skip to main content


Title: SARSNTdb database: Factors affecting SARS-CoV-2 sequence conservation
SARSNTdb offers a curated, nucleotide-centric database for users of varying levels of SARS-CoV-2 knowledge. Its user-friendly interface enables querying coding regions and coordinate intervals to find out the various functional and selective constraints that act upon the corresponding nucleotides and amino acids. Users can easily obtain information about viral genes and proteins, functional domains, repeats, secondary structure formation, intragenomic interactions, and mutation prevalence. Currently, many databases are focused on the phylogeny and amino acid substitutions, mainly in the spike protein. We took a novel, more nucleotide-focused approach as RNA does more than just code for proteins and many insights can be gleaned from its study. For example, RNA-targeted drug therapies for SARS-CoV-2 are currently being developed and it is essential to understand the features only visible at that level. This database enables the user to identify regions that are more prone to forming secondary structures that drugs can target. SARSNTdb also provides illustrative mutation data from a subset of ~25,000 patient samples with a reliable read coverage across the whole genome (from different locations and time points in the pandemic. Finally, the database allows for comparing SARS-CoV-2 and SARS-CoV domains and sequences. SARSNTdb can serve the research community by being a curated repository for information that gives a jump start to analyze a mutation’s effect far beyond just determining synonymous/non-synonymous substitutions in protein sequences.  more » « less
Award ID(s):
2027611
NSF-PAR ID:
10410325
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Frontiers in Virology
Volume:
2
ISSN:
2673-818X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The pandemic caused by the SARS-CoV-2 virus, the agent responsible for the COVID-19 disease, has affected millions of people worldwide. There is constant search for new therapies to either prevent or mitigate the disease. Fortunately, we have observed the successful development of multiple vaccines. Most of them are focused on one viral envelope protein, the spike protein. However, such focused approaches may contribute for the rise of new variants, fueled by the constant selection pressure on envelope proteins, and the widespread dispersion of coronaviruses in nature. Therefore, it is important to examine other proteins, preferentially those that are less susceptible to selection pressure, such as the nucleocapsid (N) protein. Even though the N protein is less accessible to humoral response, peptides from its conserved regions can be presented by class I Human Leukocyte Antigen (HLA) molecules, eliciting an immune response mediated by T-cells. Given the increased number of protein sequences deposited in biological databases daily and the N protein conservation among viral strains, computational methods can be leveraged to discover potential new targets for SARS-CoV-2 and SARS-CoV-related viruses. Here we developed SARS-Arena, a user-friendly computational pipeline that can be used by practitioners of different levels of expertise for novel vaccine development. SARS-Arena combines sequence-based methods and structure-based analyses to (i) perform multiple sequence alignment (MSA) of SARS-CoV-related N protein sequences, (ii) recover candidate peptides of different lengths from conserved protein regions, and (iii) model the 3D structure of the conserved peptides in the context of different HLAs. We present two main Jupyter Notebook workflows that can help in the identification of new T-cell targets against SARS-CoV viruses. In fact, in a cross-reactive case study, our workflows identified a conserved N protein peptide (SPRWYFYYL) recognized by CD8 + T-cells in the context of HLA-B7 + . SARS-Arena is available at https://github.com/KavrakiLab/SARS-Arena . 
    more » « less
  2. Abstract

    PhyloFisher is a software package written primarily in Python3 that can be used for the creation, analysis, and visualization of phylogenomic datasets that consist of protein sequences from eukaryotic organisms. Unlike many existing phylogenomic pipelines, PhyloFisher comes with a manually curated database of 240 protein‐coding genes, a subset of a previous phylogenetic dataset sampled from 304 eukaryotic taxa. The software package can also utilize a user‐created database of eukaryotic proteins, which may be more appropriate for shallow evolutionary questions. PhyloFisher is also equipped with a set of utilities to aid in running routine analyses, such as the prediction of alternative genetic codes, removal of genes and/or taxa based on occupancy/completeness of the dataset, testing for amino acid compositional heterogeneity among sequences, removal of heterotachious and/or fast‐evolving sites, removal of fast‐evolving taxa, supermatrix creation from randomly resampled genes, and supermatrix creation from nucleotide sequences. © 2024 Wiley Periodicals LLC.

    Basic Protocol 1: Constructing a phylogenomic dataset

    Basic Protocol 2: Performing phylogenomic analyses

    Support Protocol 1: Installing PhyloFisher

    Support Protocol 2: Creating a custom phylogenomic database

     
    more » « less
  3. Abstract Coronavirus disease (COVID-19) is a contagious respiratory disease caused by the SARS-CoV-2 virus. The clinical phenotypes are variable, ranging from spontaneous recovery to serious illness and death. On March 2020, a global COVID-19 pandemic was declared by the World Health Organization (WHO). As of February 2023, almost 670 million cases and 6,8 million deaths have been confirmed worldwide. Coronaviruses, including SARS-CoV-2, contain a single-stranded RNA genome enclosed in a viral capsid consisting of four structural proteins: the nucleocapsid (N) protein, in the ribonucleoprotein core, the spike (S) protein, the envelope (E) protein, and the membrane (M) protein, embedded in the surface envelope. In particular, the E protein is a poorly characterized viroporin with high identity amongst all the β-coronaviruses (SARS-CoV-2, SARS-CoV, MERS-CoV, HCoV-OC43) and a low mutation rate. Here, we focused our attention on the study of SARS-CoV-2 E and M proteins, and we found a general perturbation of the host cell calcium (Ca 2+ ) homeostasis and a selective rearrangement of the interorganelle contact sites. In vitro and in vivo biochemical analyses revealed that the binding of specific nanobodies to soluble regions of SARS-CoV-2 E protein reversed the observed phenotypes, suggesting that the E protein might be an important therapeutic candidate not only for vaccine development, but also for the clinical management of COVID designing drug regimens that, so far, are very limited. 
    more » « less
  4. Due to the emergence of new variants of the SARS-CoV-2 coronavirus, the question of how the viral genomes evolved, leading to the formation of highly infectious strains, becomes particularly important. Three major emergent strains, Alpha, Beta and Delta, characterized by a significant number of missense mutations, provide a natural test field. We accumulated and aligned 4.7 million SARS-CoV-2 genomes from the GISAID database and carried out a comprehensive set of analyses. This collection covers the period until the end of October 2021, i.e., the beginnings of the Omicron variant. First, we explored combinatorial complexity of the genomic variants emerging and their timing, indicating very strong, albeit hidden, selection forces. Our analyses show that the mutations that define variants of concern did not arise gradually but rather co-evolved rapidly, leading to the emergence of the full variant strain. To explore in more detail the evolutionary forces at work, we developed time trajectories of mutations at all 29,903 sites of the SARS-CoV-2 genome, week by week, and stratified them into trends related to (i) point substitutions, (ii) deletions and (iii) non-sequenceable regions. We focused on classifying the genetic forces active at different ranges of the mutational spectrum. We observed the agreement of the lowest-frequency mutation spectrum with the Griffiths–Tavaré theory, under the Infinite Sites Model and neutrality. If we widen the frequency range, we observe the site frequency spectra much more consistently with the Tung–Durrett model assuming clone competition and selection. The coefficients of the fitting model indicate the possibility of selection acting to promote gradual growth slowdown, as observed in the history of the variants of concern. These results add up to a model of genomic evolution, which partly fits into the classical drift barrier ideas. Certain observations, such as mutation “bands” persistent over the epidemic history, suggest contribution of genetic forces different from mutation, drift and selection, including recombination or other genome transformations. In addition, we show that a “toy” mathematical model can qualitatively reproduce how new variants (clones) stem from rare advantageous driver mutations, and then acquire neutral or disadvantageous passenger mutations which gradually reduce their fitness so they can be then outcompeted by new variants due to other driver mutations. 
    more » « less
  5. The s2m, a highly conserved 41-nt hairpin structure in the SARS-CoV-2 genome, serves as an attractive therapeutic target that may have important roles in the virus life cycle or interactions with the host. However, the conserved s2m in Delta SARS-CoV-2, a previously dominant variant characterized by high infectivity and disease severity, has received relatively less attention than that of the original SARS-CoV-2 virus. The focus of this work is to identify and define the s2m changes between Delta and SARS-CoV-2 and the subsequent impact of those changes upon the s2m dimerization and interactions with the host microRNA miR-1307-3p. Bioinformatics analysis of the GISAID database targeting the s2m element reveals a >99% correlation of a single nucleotide mutation at the 15th position (G15U) in Delta SARS-CoV-2. Based on1H NMR spectroscopy assignments comparing the imino proton resonance region of s2m and the s2m G15U at 19°C, we show that the U15–A29 base pair closes, resulting in a stabilization of the upper stem without overall secondary structure deviation. Increased stability of the upper stem did not affect the chaperone activity of the viral N protein, as it was still able to convert the kissing dimers formed by s2m G15U into a stable duplex conformation, consistent with the s2m reference. However, we show that the s2m G15U mutation drastically impacts the binding of host miR-1307-3p. These findings demonstrate that the observed G15U mutation alters the secondary structure of s2m with subsequent impact on viral binding of host miR-1307-3p, with potential consequences on immune responses.

     
    more » « less