Abstract We investigated SARS-CoV-2 transmission dynamics in Italy, one of the countries hit hardest by the pandemic, using phylodynamic analysis of viral genetic and epidemiological data. We observed the co-circulation of multiple SARS-CoV-2 lineages over time, which were linked to multiple importations and characterized by large transmission clusters concomitant with a high number of infections. Subsequent implementation of a three-phase nationwide lockdown strategy greatly reduced infection numbers and hospitalizations. Yet we present evidence of sustained viral spread among sporadic clusters acting as “hidden reservoirs” during summer 2020. Mathematical modelling shows that increased mobility among residents eventually catalyzed the coalescence of such clusters, thus driving up the number of infections and initiating a new epidemic wave. Our results suggest that the efficacy of public health interventions is, ultimately, limited by the size and structure of epidemic reservoirs, which may warrant prioritization during vaccine deployment.
more »
« less
Transmission cluster characteristics of global, regional, and lineage-specific SARS-CoV-2 phylogenies
The SARS-CoV-2 pandemic has been presenting in periodic waves and multiple variants, of which some dominated over time with increased transmissibility. SARS-CoV-2 is still adapting in the human population, thus it is crucial to understand its evolutionary patterns and dynamics ahead of time. In this work, we analyzed transmission clusters and topology of SARSCoV-2 phylogenies at the global, regional (North America) and clade-specific (Delta and Omicron) epidemic scales. We used the Nextstrain’s nCov open global all-time phylogeny (September 2022, 2,698 strains, 2,243 for North America, 499 for Delta21A, and 543 for Omicron20M), with Nextstrain’s clade annotation and Pango lineages. Transmission clusters were identified using Phylopart, DYNAMITE, and several tree imbalance measures were calculated, including staircase-ness, Sackin and Colless index. We found that the phylogenetic clustering profiles of the global epidemic have highest diversification at a distance threshold of 3% (divergence of 10, where the tree sampled median is 49). Phylopart and DYNAMITE clusters moderately-to-highly agree with the Pango nomenclature and the Nextstrain’s clade. At the regional and clade-specific scale, transmission clustering profiles tend to flatten and similar clusters are found at distance thresholds between 0.05% and 25%. All the considered phylogenies exhibit high tree imbalance with respect to what expected in random phylogenies, suggesting short infection times and antigenic drift, perhaps due to progressive transition from innate to adaptive immunity in the population.
more »
« less
- Award ID(s):
- 2028221
- PAR ID:
- 10407035
- Date Published:
- Journal Name:
- 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
- Page Range / eLocation ID:
- 2940 to 2944
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Background The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published both in scientific journals and through non–peer-reviewed channels to investigate the genetic heterogeneity and spatiotemporal dissemination of SARS-CoV-2. However, a systematic investigation of phylogenetic information and sampling bias in the available data is lacking. Although the number of available genome sequences of SARS-CoV-2 is growing daily and the sequences show increasing phylogenetic information, country-specific data still present severe limitations and should be interpreted with caution. Objective The objective of this study was to determine the quality of the currently available SARS-CoV-2 full genome data in terms of sampling bias as well as phylogenetic and temporal signals to inform and guide the scientific community. Methods We used maximum likelihood–based methods to assess the presence of sufficient information for robust phylogenetic and phylogeographic studies in several SARS-CoV-2 sequence alignments assembled from GISAID (Global Initiative on Sharing All Influenza Data) data released between March and April 2020. Results Although the number of high-quality full genomes is growing daily, and sequence data released in April 2020 contain sufficient phylogenetic information to allow reliable inference of phylogenetic relationships, country-specific SARS-CoV-2 data sets still present severe limitations. Conclusions At the present time, studies assessing within-country spread or transmission clusters should be considered preliminary or hypothesis-generating at best. Hence, current reports should be interpreted with caution, and concerted efforts should continue to increase the number and quality of sequences required for robust tracing of the epidemic.more » « less
-
null (Ed.)Abstract Background When three SARS-CoV-2 vaccines came to market in Europe and North America in the winter of 2020–2021, distribution networks were in a race against a major epidemiological wave of SARS-CoV-2 that began in autumn 2020. Rapid and optimized vaccine allocation was critical during this time. With 95% efficacy reported for two of the vaccines, near-term public health needs likely require that distribution is prioritized to the elderly, health care workers, teachers, essential workers, and individuals with comorbidities putting them at risk of severe clinical progression. Methods We evaluate various age-based vaccine distributions using a validated mathematical model based on current epidemic trends in Rhode Island and Massachusetts. We allow for varying waning efficacy of vaccine-induced immunity, as this has not yet been measured. We account for the fact that known COVID-positive cases may not have been included in the first round of vaccination. And, we account for age-specific immune patterns in both states at the time of the start of the vaccination program. Our analysis assumes that health systems during winter 2020–2021 had equal staffing and capacity to previous phases of the SARS-CoV-2 epidemic; we do not consider the effects of understaffed hospitals or unvaccinated medical staff. Results We find that allocating a substantial proportion (>75 % ) of vaccine supply to individuals over the age of 70 is optimal in terms of reducing total cumulative deaths through mid-2021. This result is robust to different profiles of waning vaccine efficacy and several different assumptions on age mixing during and after lockdown periods. As we do not explicitly model other high-mortality groups, our results on vaccine allocation apply to all groups at high risk of mortality if infected. A median of 327 to 340 deaths can be avoided in Rhode Island (3444 to 3647 in Massachusetts) by optimizing vaccine allocation and vaccinating the elderly first. The vaccination campaigns are expected to save a median of 639 to 664 lives in Rhode Island and 6278 to 6618 lives in Massachusetts in the first half of 2021 when compared to a scenario with no vaccine. A policy of vaccinating only seronegative individuals avoids redundancy in vaccine use on individuals that may already be immune, and would result in 0.5% to 1% reductions in cumulative hospitalizations and deaths by mid-2021. Conclusions Assuming high vaccination coverage (>28 % ) and no major changes in distancing, masking, gathering size, hygiene guidelines, and virus transmissibility between 1 January 2021 and 1 July 2021 a combination of vaccination and population immunity may lead to low or near-zero transmission levels by the second quarter of 2021.more » « less
-
Abstract MotivationBuilding reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites but millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate and fast phylogenetic inference of resolvable phylogenetic features. ResultsWe present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. We develop a bootstrap strategy that resamples genomes spatiotemporally to assess topological robustness. The application of TopHap to build a phylogeny of 68 057 SARS-CoV-2 genomes (68KG) from the first year of the pandemic produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from more traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million SARS-CoV-2 genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major and recent variants of concern. Availability and implementationTopHap is available at https://github.com/SayakaMiura/TopHap. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
-
Yeager, Meredith (Ed.)Abstract Global sequencing of genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continued to reveal new genetic variants that are the key to unraveling its early evolutionary history and tracking its global spread over time. Here we present the heretofore cryptic mutational history and spatiotemporal dynamics of SARS-CoV-2 from an analysis of thousands of high-quality genomes. We report the likely most recent common ancestor of SARS-CoV-2, reconstructed through a novel application and advancement of computational methods initially developed to infer the mutational history of tumor cells in a patient. This progenitor genome differs from genomes of the first coronaviruses sampled in China by three variants, implying that none of the earliest patients represent the index case or gave rise to all the human infections. However, multiple coronavirus infections in China and the United States harbored the progenitor genetic fingerprint in January 2020 and later, suggesting that the progenitor was spreading worldwide months before and after the first reported cases of COVID-19 in China. Mutations of the progenitor and its offshoots have produced many dominant coronavirus strains that have spread episodically over time. Fingerprinting based on common mutations reveals that the same coronavirus lineage has dominated North America for most of the pandemic in 2020. There have been multiple replacements of predominant coronavirus strains in Europe and Asia as well as continued presence of multiple high-frequency strains in Asia and North America. We have developed a continually updating dashboard of global evolution and spatiotemporal trends of SARS-CoV-2 spread (http://sars2evo.datamonkey.org/).more » « less
An official website of the United States government

