skip to main content


Title: Analysis of SARS-CoV-2 mutations in the United States suggests presence of four substrains and novel variants
Abstract

SARS-CoV-2 has been mutating since it was first sequenced in early January 2020. Here, we analyze 45,494 complete SARS-CoV-2 geneome sequences in the world to understand their mutations. Among them, 12,754 sequences are from the United States. Our analysis suggests the presence of four substrains and eleven top mutations in the United States. These eleven top mutations belong to 3 disconnected groups. The first and second groups consisting of 5 and 8 concurrent mutations are prevailing, while the other group with three concurrent mutations gradually fades out. Moreover, we reveal that female immune systems are more active than those of males in responding to SARS-CoV-2 infections. One of the top mutations, 27964C > T-(S24L) on ORF8, has an unusually strong gender dependence. Based on the analysis of all mutations on the spike protein, we uncover that two of four SARS-CoV-2 substrains in the United States become potentially more infectious.

 
more » « less
Award ID(s):
1761320 1900473
NSF-PAR ID:
10214238
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Communications Biology
Volume:
4
Issue:
1
ISSN:
2399-3642
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Building reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites but millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate and fast phylogenetic inference of resolvable phylogenetic features.

    Results

    We present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. We develop a bootstrap strategy that resamples genomes spatiotemporally to assess topological robustness. The application of TopHap to build a phylogeny of 68 057 SARS-CoV-2 genomes (68KG) from the first year of the pandemic produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from more traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million SARS-CoV-2 genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major and recent variants of concern.

    Availability and implementation

    TopHap is available at https://github.com/SayakaMiura/TopHap.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Yeager, Meredith (Ed.)
    Abstract Global sequencing of genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continued to reveal new genetic variants that are the key to unraveling its early evolutionary history and tracking its global spread over time. Here we present the heretofore cryptic mutational history and spatiotemporal dynamics of SARS-CoV-2 from an analysis of thousands of high-quality genomes. We report the likely most recent common ancestor of SARS-CoV-2, reconstructed through a novel application and advancement of computational methods initially developed to infer the mutational history of tumor cells in a patient. This progenitor genome differs from genomes of the first coronaviruses sampled in China by three variants, implying that none of the earliest patients represent the index case or gave rise to all the human infections. However, multiple coronavirus infections in China and the United States harbored the progenitor genetic fingerprint in January 2020 and later, suggesting that the progenitor was spreading worldwide months before and after the first reported cases of COVID-19 in China. Mutations of the progenitor and its offshoots have produced many dominant coronavirus strains that have spread episodically over time. Fingerprinting based on common mutations reveals that the same coronavirus lineage has dominated North America for most of the pandemic in 2020. There have been multiple replacements of predominant coronavirus strains in Europe and Asia as well as continued presence of multiple high-frequency strains in Asia and North America. We have developed a continually updating dashboard of global evolution and spatiotemporal trends of SARS-CoV-2 spread (http://sars2evo.datamonkey.org/). 
    more » « less
  3. Abstract Background

    Controlling the spread of infectious diseases―even when safe, transmission-blocking vaccines are available―may require the effective use of non-pharmaceutical interventions (NPIs), e.g., mask wearing, testing, limits on group sizes, venue closure. During the SARS-CoV-2 pandemic, many countries implemented NPIs inconsistently in space and time. This inconsistency was especially pronounced for policies in the United States of America (US) related to venue closure.

    Methods

    Here, we investigate the impact of inconsistent policies associated with venue closure using mathematical modeling and high-resolution human mobility, Google search, and county-level SARS-CoV-2 incidence data from the USA. Specifically, we look at high-resolution location data and perform a US-county-level analysis of nearly 8 million SARS-CoV-2 cases and 150 million location visits, including 120 million church visitors across 184,677 churches, 14 million grocery visitors across 7662 grocery stores, and 13.5 million gym visitors across 5483 gyms.

    Results

    Analyzing the interaction between venue closure and changing mobility using a mathematical model shows that, across a broad range of model parameters, inconsistent or partial closure can be worse in terms of disease transmission as compared to scenarios with no closures at all. Importantly, changes in mobility patterns due to epidemic control measures can lead to increase in the future number of cases. In the most severe cases, individuals traveling to neighboring jurisdictions with different closure policies can result in an outbreak that would otherwise have been contained. To motivate our mathematical models, we turn to mobility data and find that while stay-at-home orders and closures decreased contacts in most areas of the USA, some specific activities and venues saw an increase in attendance and an increase in the distance visitors traveled to attend. We support this finding using search query data, which clearly shows a shift in information seeking behavior concurrent with the changing mobility patterns.

    Conclusions

    While coarse-grained observations are not sufficient to validate our models, taken together, they highlight the potential unintended consequences of inconsistent epidemic control policies related to venue closure and stress the importance of balancing the societal needs of a population with the risk of an outbreak growing into a large epidemic.

     
    more » « less
  4. Abstract

    The glycosylation on the spike (S) protein of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes COVID-19, modulates the viral infection by altering conformational dynamics, receptor interaction and host immune responses. Several variants of concern (VOCs) of SARS-CoV-2 have evolved during the pandemic, and crucial mutations on the S protein of the virus have led to increased transmissibility and immune escape. In this study, we compare the site-specific glycosylation and overall glycomic profiles of the wild type Wuhan-Hu-1 strain (WT) S protein and five VOCs of SARS-CoV-2: Alpha, Beta, Gamma, Delta and Omicron. Interestingly, both N- and O-glycosylation sites on the S protein are highly conserved among the spike mutant variants, particularly at the sites on the receptor-binding domain (RBD). The conservation of glycosylation sites is noteworthy, as over 2 million SARS-CoV-2 S protein sequences have been reported with various amino acid mutations. Our detailed profiling of the glycosylation at each of the individual sites of the S protein across the variants revealed intriguing possible association of glycosylation pattern on the variants and their previously reported infectivity. While the sites are conserved, we observed changes in the N- and O-glycosylation profile across the variants. The newly emerged variants, which showed higher resistance to neutralizing antibodies and vaccines, displayed a decrease in the overall abundance of complex-type glycans with both fucosylation and sialylation and an increase in the oligomannose-type glycans across the sites. Among the variants, the glycosylation sites with significant changes in glycan profile were observed at both theN-terminal domain and RBD of S protein, with Omicron showing the highest deviation. The increase in oligomannose-type happens sequentially from Alpha through Delta. Interestingly, Omicron does not contain more oligomannose-type glycans compared to Delta but does contain more compared to the WT and other VOCs. O-glycosylation at the RBD showed lower occupancy in the VOCs in comparison to the WT. Our study on the sites and pattern of glycosylation on the SARS-CoV-2 S proteins across the VOCs may help to understand how the virus evolved to trick the host immune system. Our study also highlights how the SARS-CoV-2 virus has conserved bothN- andO- glycosylation sites on the S protein of the most successful variants even after undergoing extensive mutations, suggesting a correlation between infectivity/ transmissibility and glycosylation.

     
    more » « less
  5. Abstract

    The omicron variant of severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) characterized by 30 mutations in its spike protein, has rapidly spread worldwide since November 2021, significantly exacerbating the ongoing COVID‐19 pandemic. In order to investigate the relationship between these mutations and the variant's high transmissibility, we conducted a systematic analysis of the mutational effect on spike–angiotensin‐converting enzyme‐2 (ACE2) interactions and explored the structural/energy correlation of key mutations, utilizing a reliable coarse‐grained model. Our study extended beyond the receptor‐binding domain (RBD) of spike trimer through comprehensive modeling of the full‐length spike trimer rather than just the RBD. Our free‐energy calculation revealed that the enhanced binding affinity between the spike protein and the ACE2 receptor is correlated with the increased structural stability of the isolated spike protein, thus explaining the omicron variant's heightened transmissibility. The conclusion was supported by our experimental analyses involving the expression and purification of the full‐length spike trimer. Furthermore, the energy decomposition analysis established those electrostatic interactions make major contributions to this effect. We categorized the mutations into four groups and established an analytical framework that can be employed in studying future mutations. Additionally, our calculations rationalized the reduced affinity of the omicron variant towards most available therapeutic neutralizing antibodies, when compared with the wild type. By providing concrete experimental data and offering a solid explanation, this study contributes to a better understanding of the relationship between theories and observations and lays the foundation for future investigations.

     
    more » « less