skip to main content


Title: Epidemiological associations with genomic variation in SARS-CoV-2
Abstract

SARS-CoV-2 (CoV) is the etiological agent of the COVID-19 pandemic and evolves to evade both host immune systems and intervention strategies. We divided the CoV genome into 29 constituent regions and applied novel analytical approaches to identify associations between CoV genomic features and epidemiological metadata. Our results show that nonstructural protein 3 (nsp3) and Spike protein (S) have the highest variation and greatest correlation with the viral whole-genome variation. S protein variation is correlated with nsp3, nsp6, and 3′-to-5′ exonuclease variation. Country of origin and time since the start of the pandemic were the most influential metadata associated with genomic variation, while host sex and age were the least influential. We define a novel statistic—coherence—and show its utility in identifying geographic regions (populations) with unusually high (many new variants) or low (isolated) viral phylogenetic diversity. Interestingly, at both global and regional scales, we identify geographic locations with high coherence neighboring regions of low coherence; this emphasizes the utility of this metric to inform public health measures for disease spread. Our results provide a direction to prioritize genes associated with outcome predictors (e.g., health, therapeutic, and vaccine outcomes) and to improve DNA tests for predicting disease status.

 
more » « less
Award ID(s):
2028280 2109688
NSF-PAR ID:
10360441
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Volume:
11
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cimarelli, Andrea (Ed.)
    The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infection causes Coronavirus Disease 2019 (COVID-19), a pandemic that seriously threatens global health. SARS-CoV-2 propagates by packaging its RNA genome into membrane enclosures in host cells. The packaging of the viral genome into the nascent virion is mediated by the nucleocapsid (N) protein, but the underlying mechanism remains unclear. Here, we show that the N protein forms biomolecular condensates with viral genomic RNA both in vitro and in mammalian cells. While the N protein forms spherical assemblies with homopolymeric RNA substrates that do not form base pairing interactions, it forms asymmetric condensates with viral RNA strands. Cross-linking mass spectrometry (CLMS) identified a region that drives interactions between N proteins in condensates, and deletion of this region disrupts phase separation. We also identified small molecules that alter the size and shape of N protein condensates and inhibit the proliferation of SARS-CoV-2 in infected cells. These results suggest that the N protein may utilize biomolecular condensation to package the SARS-CoV-2 RNA genome into a viral particle. 
    more » « less
  2. Abstract

    Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identifySNPmarkers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein‐coding and nearby 5′ and 3′ untranslated regions of chosen candidate genes. Targeted sequences were taken from bighorn sheep (Ovis canadensis) exon capture data and directly from the domestic sheep genome (Ovis ariesv. 3; oviAri3). The bighorn sheep sequences used in the Dall's sheep (Ovis dalli dalli) exon capture aligned to 2350 genes on the oviAri3 genome with an average of 2 exons each. We developed a microfluidic qPCR‐basedSNPchip to genotype 476 Dall's sheep from locations across their range and test for patterns of selection. Using multiple corroborating approaches (lositanandbayescan), we detected 28SNPloci potentially under selection. We additionally identified candidate loci significantly associated with latitude, longitude, precipitation and temperature, suggesting local environmental adaptation. The three methods demonstrated consistent support for natural selection on nine genes with immune and disease‐regulating functions (e.g. Ovar‐DRA,APC,BATF2,MAGEB18), cell regulation signalling pathways (e.g.KRIT1,PI3K,ORRC3), and respiratory health (CYSLTR1). Characterizing adaptive allele distributions from novel genetic techniques will facilitate investigation of the influence of environmental variation on local adaptation of a northern alpine ungulate throughout its range. This research demonstrated the utility of exon capture for gene‐targetedSNPdiscovery and subsequentSNPchip genotyping using low‐quality samples in a nonmodel species.

     
    more » « less
  3. Abstract Coronavirus disease (COVID-19) is a contagious respiratory disease caused by the SARS-CoV-2 virus. The clinical phenotypes are variable, ranging from spontaneous recovery to serious illness and death. On March 2020, a global COVID-19 pandemic was declared by the World Health Organization (WHO). As of February 2023, almost 670 million cases and 6,8 million deaths have been confirmed worldwide. Coronaviruses, including SARS-CoV-2, contain a single-stranded RNA genome enclosed in a viral capsid consisting of four structural proteins: the nucleocapsid (N) protein, in the ribonucleoprotein core, the spike (S) protein, the envelope (E) protein, and the membrane (M) protein, embedded in the surface envelope. In particular, the E protein is a poorly characterized viroporin with high identity amongst all the β-coronaviruses (SARS-CoV-2, SARS-CoV, MERS-CoV, HCoV-OC43) and a low mutation rate. Here, we focused our attention on the study of SARS-CoV-2 E and M proteins, and we found a general perturbation of the host cell calcium (Ca 2+ ) homeostasis and a selective rearrangement of the interorganelle contact sites. In vitro and in vivo biochemical analyses revealed that the binding of specific nanobodies to soluble regions of SARS-CoV-2 E protein reversed the observed phenotypes, suggesting that the E protein might be an important therapeutic candidate not only for vaccine development, but also for the clinical management of COVID designing drug regimens that, so far, are very limited. 
    more » « less
  4. Through the COVID-19 pandemic, SARS-CoV-2 has gained and lost multiple mutations in novel or unexpected combinations. Predicting how complex mutations affect COVID-19 disease severity is critical in planning public health responses as the virus continues to evolve. This paper presents a novel computational framework to complement conventional lineage classification and applies it to predict the severe disease potential of viral genetic variation. The transformer-based neural network model architecture has additional layers that provide sample embeddings and sequence-wide attention for interpretation and visualization. First, training a model to predict SARS-CoV-2 taxonomy validates the architecture’s interpretability. Second, an interpretable predictive model of disease severity is trained on spike protein sequence and patient metadata from GISAID. Confounding effects of changing patient demographics, increasing vaccination rates, and improving treatment over time are addressed by including demographics and case date as independent input to the neural network model. The resulting model can be interpreted to identify potentially significant virus mutations and proves to be a robust predctive tool. Although trained on sequence data obtained entirely before the availability of empirical data for Omicron, the model can predict the Omicron’s reduced risk of severe disease, in accord with epidemiological and experimental data. 
    more » « less
  5. Lee, Benhur (Ed.)
    ABSTRACT Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has infected over 40 million people worldwide, with over 1 million deaths as of October 2020 and with multiple efforts in the development and testing of antiviral drugs and vaccines under way. In order to gain insights into SARS-CoV-2 evolution and drug targets, we investigated how and to what extent the SARS-CoV-2 genome sequence differs from those of other well-characterized human and animal coronavirus genomes, as well as how polymorphic SARS-CoV-2 genomes are generally. We ultimately sought to identify features in the SARS-CoV-2 genome that may contribute to its viral replication, host pathogenicity, and vulnerabilities. Our analyses suggest the presence of unique sequence signatures in the 3′ untranslated region (3′-UTR) of betacoronavirus lineage B, which phylogenetically encompasses SARS-CoV-2 and SARS-CoV as well as multiple groups of bat and animal coronaviruses. In addition, we identified genome-wide patterns of variation across different SARS-CoV-2 strains that likely reflect the effects of selection. Finally, we provide evidence for a possible host-microRNA-mediated interaction between the 3′-UTR and human microRNA hsa-miR-1307-3p based on the results of multiple computational target prediction analyses and an assessment of similar interactions involving the influenza A H1N1 virus. This interaction also suggests a possible survival mechanism, whereby a mutation in the SARS-CoV-2 3′-UTR leads to a weakened host immune response. The potential roles of host microRNAs in SARS-CoV-2 replication and infection and the exploitation of conserved features in the 3′-UTR as therapeutic targets warrant further investigation. IMPORTANCE The coronavirus disease 2019 (COVID-19) outbreak is having a dramatic global effect on public health and the economy. As of October 2020, SARS-CoV-2 has been detected in over 189 countries, has infected over 40 million people, and is responsible for more than 1 million deaths. The genome of SARS-CoV-2 is small but complex, and its functions and interactions with human host factors are being studied extensively. The significance of our study is that, using extensive SARS-CoV-2 genome analysis techniques, we identified potential interacting human host microRNA targets that share similarity with those of influenza A virus H1N1. Our study results will allow the development of virus-host interaction models that will enhance our understanding of SARS-CoV-2 pathogenesis and motivate the exploitation of both the interacting viral and host factors as therapeutic targets. 
    more » « less