skip to main content

Title: An Evolutionary Portrait of the Progenitor SARS-CoV-2 and Its Dominant Offshoots in COVID-19 Pandemic
Abstract Global sequencing of genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continued to reveal new genetic variants that are the key to unraveling its early evolutionary history and tracking its global spread over time. Here we present the heretofore cryptic mutational history and spatiotemporal dynamics of SARS-CoV-2 from an analysis of thousands of high-quality genomes. We report the likely most recent common ancestor of SARS-CoV-2, reconstructed through a novel application and advancement of computational methods initially developed to infer the mutational history of tumor cells in a patient. This progenitor genome differs from genomes of the first coronaviruses sampled in China by three variants, implying that none of the earliest patients represent the index case or gave rise to all the human infections. However, multiple coronavirus infections in China and the United States harbored the progenitor genetic fingerprint in January 2020 and later, suggesting that the progenitor was spreading worldwide months before and after the first reported cases of COVID-19 in China. Mutations of the progenitor and its offshoots have produced many dominant coronavirus strains that have spread episodically over time. Fingerprinting based on common mutations reveals that the same coronavirus lineage has dominated North America for more » most of the pandemic in 2020. There have been multiple replacements of predominant coronavirus strains in Europe and Asia as well as continued presence of multiple high-frequency strains in Asia and North America. We have developed a continually updating dashboard of global evolution and spatiotemporal trends of SARS-CoV-2 spread ( « less
; ; ; ; ; ; ;
Yeager, Meredith
Award ID(s):
2027196 2034228 1934848
Publication Date:
Journal Name:
Molecular Biology and Evolution
Page Range or eLocation-ID:
3046 to 3059
Sponsoring Org:
National Science Foundation
More Like this
  1. The pandemic of SARS-CoV-2/COVID-19 was reported in December 2019 in Wuhan, China. Pertaining to its high transmissibility and wide host adaptability, this unique human coronavirus spread across the planet inflicting 115 million people and causing 2.5 million deaths (as of March 3rd, 2021). Limited or negligible pre-existing immunity to multiple SARS-CoV-2 variants has resulted in severe morbidity and mortality worldwide, as well as a record-breaking surge in the use of medical-surgical supplies and personal protective equipment. In response to the global need for effective sterilization techniques, this study evaluated the virucidal efficacy of FATHHOME’s self-contained, ozone-based dry-sanitizing device, by dose and time response assessment. We tested inactivation of human coronavirus, HCoV-OC43, a close genetic model of SARS-CoV-2, on porous (N95 filtering facepiece respirator/FFR) and nonporous (glass) surfaces. We started our assays with 20 ppm-10 min ozone exposure, and effectively reduced 99.8% and 99.9% of virus from glass and N95 FFR surfaces, respectively. Importantly, the virus was completely inactivated, below the detection limit (over 6-log10 reduction) with 25 ppm-15 min ozone exposure on both tested surfaces. As expected, a higher ozone exposure (50 ppm-10 min) resulted in faster inactivation of HCoV-OC43 with 100% inactivation from both the surfaces, with no residualmore »ozone present after completion of the 5-min post exposure recapture cycle and no measurable increase in ambient ozone levels. These results confirmed that FATHHOME’s device is suitable for rapid decontamination of SARS-CoV-2-from worn items, frequently touched items, and PPE including N95 FFRs, face shields, and other personal items.« less
  2. Background The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published both in scientific journals and through non–peer-reviewed channels to investigate the genetic heterogeneity and spatiotemporal dissemination of SARS-CoV-2. However, a systematic investigation of phylogenetic information and sampling bias in the available data is lacking. Although the number of available genome sequences of SARS-CoV-2 is growing daily and the sequences show increasing phylogenetic information, country-specific data still present severe limitations and should be interpreted with caution. Objective The objective of this study was to determine the quality of the currently available SARS-CoV-2 full genome data in terms of sampling bias as well as phylogenetic and temporal signals to inform and guide the scientific community. Methods We used maximum likelihood–based methods to assess the presence of sufficient information for robust phylogenetic and phylogeographic studies in several SARS-CoV-2 sequence alignments assembled from GISAID (Global Initiative on Sharing All Influenza Data) data released between March and April 2020. Results Although the number of high-quality full genomes is growing daily, and sequence data released inmore »April 2020 contain sufficient phylogenetic information to allow reliable inference of phylogenetic relationships, country-specific SARS-CoV-2 data sets still present severe limitations. Conclusions At the present time, studies assessing within-country spread or transmission clusters should be considered preliminary or hypothesis-generating at best. Hence, current reports should be interpreted with caution, and concerted efforts should continue to increase the number and quality of sequences required for robust tracing of the epidemic.« less
  3. Abstract Motivation

    Building reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites but millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate and fast phylogenetic inference of resolvable phylogenetic features.


    We present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. We develop a bootstrap strategy that resamples genomes spatiotemporally to assess topological robustness. The application of TopHap to build a phylogeny of 68 057 SARS-CoV-2 genomes (68KG) from the first year of the pandemic produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from moremore »traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million SARS-CoV-2 genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major and recent variants of concern.

    Availability and implementation

    TopHap is available at

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    « less
  4. Lee, Benhur (Ed.)
    ABSTRACT Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has infected over 40 million people worldwide, with over 1 million deaths as of October 2020 and with multiple efforts in the development and testing of antiviral drugs and vaccines under way. In order to gain insights into SARS-CoV-2 evolution and drug targets, we investigated how and to what extent the SARS-CoV-2 genome sequence differs from those of other well-characterized human and animal coronavirus genomes, as well as how polymorphic SARS-CoV-2 genomes are generally. We ultimately sought to identify features in the SARS-CoV-2 genome that may contribute to its viral replication, host pathogenicity, and vulnerabilities. Our analyses suggest the presence of unique sequence signatures in the 3′ untranslated region (3′-UTR) of betacoronavirus lineage B, which phylogenetically encompasses SARS-CoV-2 and SARS-CoV as well as multiple groups of bat and animal coronaviruses. In addition, we identified genome-wide patterns of variation across different SARS-CoV-2 strains that likely reflect the effects of selection. Finally, we provide evidence for a possible host-microRNA-mediated interaction between the 3′-UTR and human microRNA hsa-miR-1307-3p based on the results of multiple computational target prediction analyses and an assessment of similar interactions involving the influenza A H1N1 virus. This interaction also suggests amore »possible survival mechanism, whereby a mutation in the SARS-CoV-2 3′-UTR leads to a weakened host immune response. The potential roles of host microRNAs in SARS-CoV-2 replication and infection and the exploitation of conserved features in the 3′-UTR as therapeutic targets warrant further investigation. IMPORTANCE The coronavirus disease 2019 (COVID-19) outbreak is having a dramatic global effect on public health and the economy. As of October 2020, SARS-CoV-2 has been detected in over 189 countries, has infected over 40 million people, and is responsible for more than 1 million deaths. The genome of SARS-CoV-2 is small but complex, and its functions and interactions with human host factors are being studied extensively. The significance of our study is that, using extensive SARS-CoV-2 genome analysis techniques, we identified potential interacting human host microRNA targets that share similarity with those of influenza A virus H1N1. Our study results will allow the development of virus-host interaction models that will enhance our understanding of SARS-CoV-2 pathogenesis and motivate the exploitation of both the interacting viral and host factors as therapeutic targets.« less
  5. Abstract Since its global emergence in 2020, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused multiple epidemics in the United States. When medical treatments for the virus were still emerging and a vaccine was not yet available, state and local governments sought to limit its spread by enacting various social-distancing interventions, such as school closures and lockdowns; however, the effectiveness of these interventions was unknown. We applied an established, semimechanistic Bayesian hierarchical model of these interventions to the spread of SARS-CoV-2 from Europe to the United States, using case fatalities from February 29, 2020, up to April 25, 2020, when some states began reversing their interventions. We estimated the effects of interventions across all states, contrasted the estimated reproduction numbers before and after lockdown for each state, and contrasted the predicted number of future fatalities with the actual number of fatalities as a check of the model’s validity. Overall, school closures and lockdowns were the only interventions modeled that had a reliable impact on the time-varying reproduction number, and lockdown appears to have played a key role in reducing that number to below 1.0. We conclude that reversal of lockdown without implementation of additional, equally effective interventions will enablemore »continued, sustained transmission of SARS-CoV-2 in the United States.« less