skip to main content

Title: An Evolutionary Portrait of the Progenitor SARS-CoV-2 and Its Dominant Offshoots in COVID-19 Pandemic
Abstract Global sequencing of genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continued to reveal new genetic variants that are the key to unraveling its early evolutionary history and tracking its global spread over time. Here we present the heretofore cryptic mutational history and spatiotemporal dynamics of SARS-CoV-2 from an analysis of thousands of high-quality genomes. We report the likely most recent common ancestor of SARS-CoV-2, reconstructed through a novel application and advancement of computational methods initially developed to infer the mutational history of tumor cells in a patient. This progenitor genome differs from genomes of the first coronaviruses sampled in China by three variants, implying that none of the earliest patients represent the index case or gave rise to all the human infections. However, multiple coronavirus infections in China and the United States harbored the progenitor genetic fingerprint in January 2020 and later, suggesting that the progenitor was spreading worldwide months before and after the first reported cases of COVID-19 in China. Mutations of the progenitor and its offshoots have produced many dominant coronavirus strains that have spread episodically over time. Fingerprinting based on common mutations reveals that the same coronavirus lineage has dominated North America for most of the pandemic in 2020. There have been multiple replacements of predominant coronavirus strains in Europe and Asia as well as continued presence of multiple high-frequency strains in Asia and North America. We have developed a continually updating dashboard of global evolution and spatiotemporal trends of SARS-CoV-2 spread (  more » « less
Award ID(s):
2027196 2034228 1934848
Author(s) / Creator(s):
; ; ; ; ; ; ;
Yeager, Meredith
Date Published:
Journal Name:
Molecular Biology and Evolution
Page Range / eLocation ID:
3046 to 3059
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Due to the emergence of new variants of the SARS-CoV-2 coronavirus, the question of how the viral genomes evolved, leading to the formation of highly infectious strains, becomes particularly important. Three major emergent strains, Alpha, Beta and Delta, characterized by a significant number of missense mutations, provide a natural test field. We accumulated and aligned 4.7 million SARS-CoV-2 genomes from the GISAID database and carried out a comprehensive set of analyses. This collection covers the period until the end of October 2021, i.e., the beginnings of the Omicron variant. First, we explored combinatorial complexity of the genomic variants emerging and their timing, indicating very strong, albeit hidden, selection forces. Our analyses show that the mutations that define variants of concern did not arise gradually but rather co-evolved rapidly, leading to the emergence of the full variant strain. To explore in more detail the evolutionary forces at work, we developed time trajectories of mutations at all 29,903 sites of the SARS-CoV-2 genome, week by week, and stratified them into trends related to (i) point substitutions, (ii) deletions and (iii) non-sequenceable regions. We focused on classifying the genetic forces active at different ranges of the mutational spectrum. We observed the agreement of the lowest-frequency mutation spectrum with the Griffiths–Tavaré theory, under the Infinite Sites Model and neutrality. If we widen the frequency range, we observe the site frequency spectra much more consistently with the Tung–Durrett model assuming clone competition and selection. The coefficients of the fitting model indicate the possibility of selection acting to promote gradual growth slowdown, as observed in the history of the variants of concern. These results add up to a model of genomic evolution, which partly fits into the classical drift barrier ideas. Certain observations, such as mutation “bands” persistent over the epidemic history, suggest contribution of genetic forces different from mutation, drift and selection, including recombination or other genome transformations. In addition, we show that a “toy” mathematical model can qualitatively reproduce how new variants (clones) stem from rare advantageous driver mutations, and then acquire neutral or disadvantageous passenger mutations which gradually reduce their fitness so they can be then outcompeted by new variants due to other driver mutations. 
    more » « less
  2. Abstract

    The emergence of severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) variants of concern (VOC) has raised questions regarding vaccine protection against SARS‐CoV‐2 infection, transmission, and ongoing virus evolution. Twenty‐three mildly symptomatic “vaccination breakthrough” infections were identified as early as January 2021 in Alachua County, Florida, among individuals fully vaccinated with either the BNT162b2 (Pfizer) or the Ad26 (Janssen/J&J) vaccines. SARS‐CoV‐2 genomes were successfully generated for 11 of the vaccine breakthroughs, and 878 individuals in the surrounding area and were included for reference‐based phylogenetic investigation. These 11 individuals were characterized by infection with VOCs, but also low‐frequency variants present within the surrounding population. Low‐frequency mutations were observed, which have been more recently identified as mutations of interest owing to their location within targeted immune epitopes (P812L) and association with increased replicative capacity (L18F). We present these results to posit the nature of the efficacy of vaccines in reducing symptoms as both a blessing and a curse—as vaccination becomes more widespread and self‐motivated testing reduced owing to the absence of severe symptoms, we face the challenge of early recognition of novel mutations of potential concern. This case study highlights the critical need for continued testing and monitoring of infection and transmission among individuals regardless of vaccination status.

    more » « less
  3. The pandemic of SARS-CoV-2/COVID-19 was reported in December 2019 in Wuhan, China. Pertaining to its high transmissibility and wide host adaptability, this unique human coronavirus spread across the planet inflicting 115 million people and causing 2.5 million deaths (as of March 3rd, 2021). Limited or negligible pre-existing immunity to multiple SARS-CoV-2 variants has resulted in severe morbidity and mortality worldwide, as well as a record-breaking surge in the use of medical-surgical supplies and personal protective equipment. In response to the global need for effective sterilization techniques, this study evaluated the virucidal efficacy of FATHHOME’s self-contained, ozone-based dry-sanitizing device, by dose and time response assessment. We tested inactivation of human coronavirus, HCoV-OC43, a close genetic model of SARS-CoV-2, on porous (N95 filtering facepiece respirator/FFR) and nonporous (glass) surfaces. We started our assays with 20 ppm-10 min ozone exposure, and effectively reduced 99.8% and 99.9% of virus from glass and N95 FFR surfaces, respectively. Importantly, the virus was completely inactivated, below the detection limit (over 6-log10 reduction) with 25 ppm-15 min ozone exposure on both tested surfaces. As expected, a higher ozone exposure (50 ppm-10 min) resulted in faster inactivation of HCoV-OC43 with 100% inactivation from both the surfaces, with no residual ozone present after completion of the 5-min post exposure recapture cycle and no measurable increase in ambient ozone levels. These results confirmed that FATHHOME’s device is suitable for rapid decontamination of SARS-CoV-2-from worn items, frequently touched items, and PPE including N95 FFRs, face shields, and other personal items. 
    more » « less
  4. Abstract

    The COVID-19 pandemic, caused by the coronavirus SARS-CoV-2, has resulted in the loss of millions of lives and severe global economic consequences. Every time SARS-CoV-2 replicates, the viruses acquire new mutations in their genomes. Mutations in SARS-CoV-2 genomes led to increased transmissibility, severe disease outcomes, evasion of the immune response, changes in clinical manifestations and reducing the efficacy of vaccines or treatments. To date, the multiple resources provide lists of detected mutations without key functional annotations. There is a lack of research examining the relationship between mutations and various factors such as disease severity, pathogenicity, patient age, patient gender, cross-species transmission, viral immune escape, immune response level, viral transmission capability, viral evolution, host adaptability, viral protein structure, viral protein function, viral protein stability and concurrent mutations. Deep understanding the relationship between mutation sites and these factors is crucial for advancing our knowledge of SARS-CoV-2 and for developing effective responses. To fill this gap, we built COV2Var, a function annotation database of SARS-CoV-2 genetic variation, available at COV2Var aims to identify common mutations in SARS-CoV-2 variants and assess their effects, providing a valuable resource for intensive functional annotations of common mutations among SARS-CoV-2 variants.

    more » « less
  5. Abstract Motivation

    Building reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites but millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate and fast phylogenetic inference of resolvable phylogenetic features.


    We present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. We develop a bootstrap strategy that resamples genomes spatiotemporally to assess topological robustness. The application of TopHap to build a phylogeny of 68 057 SARS-CoV-2 genomes (68KG) from the first year of the pandemic produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from more traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million SARS-CoV-2 genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major and recent variants of concern.

    Availability and implementation

    TopHap is available at

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    more » « less