skip to main content


Title: Transmission cluster characteristics of global, regional, and lineage-specific SARS-CoV-2 phylogenies
The SARS-CoV-2 pandemic has been presenting in periodic waves and multiple variants, of which some dominated over time with increased transmissibility. SARS-CoV-2 is still adapting in the human population, thus it is crucial to understand its evolutionary patterns and dynamics ahead of time. In this work, we analyzed transmission clusters and topology of SARSCoV-2 phylogenies at the global, regional (North America) and clade-specific (Delta and Omicron) epidemic scales. We used the Nextstrain’s nCov open global all-time phylogeny (September 2022, 2,698 strains, 2,243 for North America, 499 for Delta21A, and 543 for Omicron20M), with Nextstrain’s clade annotation and Pango lineages. Transmission clusters were identified using Phylopart, DYNAMITE, and several tree imbalance measures were calculated, including staircase-ness, Sackin and Colless index. We found that the phylogenetic clustering profiles of the global epidemic have highest diversification at a distance threshold of 3% (divergence of 10, where the tree sampled median is 49). Phylopart and DYNAMITE clusters moderately-to-highly agree with the Pango nomenclature and the Nextstrain’s clade. At the regional and clade-specific scale, transmission clustering profiles tend to flatten and similar clusters are found at distance thresholds between 0.05% and 25%. All the considered phylogenies exhibit high tree imbalance with respect to what expected in random phylogenies, suggesting short infection times and antigenic drift, perhaps due to progressive transition from innate to adaptive immunity in the population.  more » « less
Award ID(s):
2028221
NSF-PAR ID:
10407035
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Page Range / eLocation ID:
2940 to 2944
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Background When three SARS-CoV-2 vaccines came to market in Europe and North America in the winter of 2020–2021, distribution networks were in a race against a major epidemiological wave of SARS-CoV-2 that began in autumn 2020. Rapid and optimized vaccine allocation was critical during this time. With 95% efficacy reported for two of the vaccines, near-term public health needs likely require that distribution is prioritized to the elderly, health care workers, teachers, essential workers, and individuals with comorbidities putting them at risk of severe clinical progression. Methods We evaluate various age-based vaccine distributions using a validated mathematical model based on current epidemic trends in Rhode Island and Massachusetts. We allow for varying waning efficacy of vaccine-induced immunity, as this has not yet been measured. We account for the fact that known COVID-positive cases may not have been included in the first round of vaccination. And, we account for age-specific immune patterns in both states at the time of the start of the vaccination program. Our analysis assumes that health systems during winter 2020–2021 had equal staffing and capacity to previous phases of the SARS-CoV-2 epidemic; we do not consider the effects of understaffed hospitals or unvaccinated medical staff. Results We find that allocating a substantial proportion (>75 % ) of vaccine supply to individuals over the age of 70 is optimal in terms of reducing total cumulative deaths through mid-2021. This result is robust to different profiles of waning vaccine efficacy and several different assumptions on age mixing during and after lockdown periods. As we do not explicitly model other high-mortality groups, our results on vaccine allocation apply to all groups at high risk of mortality if infected. A median of 327 to 340 deaths can be avoided in Rhode Island (3444 to 3647 in Massachusetts) by optimizing vaccine allocation and vaccinating the elderly first. The vaccination campaigns are expected to save a median of 639 to 664 lives in Rhode Island and 6278 to 6618 lives in Massachusetts in the first half of 2021 when compared to a scenario with no vaccine. A policy of vaccinating only seronegative individuals avoids redundancy in vaccine use on individuals that may already be immune, and would result in 0.5% to 1% reductions in cumulative hospitalizations and deaths by mid-2021. Conclusions Assuming high vaccination coverage (>28 % ) and no major changes in distancing, masking, gathering size, hygiene guidelines, and virus transmissibility between 1 January 2021 and 1 July 2021 a combination of vaccination and population immunity may lead to low or near-zero transmission levels by the second quarter of 2021. 
    more » « less
  2. Abstract Background

    Controlling the spread of infectious diseases―even when safe, transmission-blocking vaccines are available―may require the effective use of non-pharmaceutical interventions (NPIs), e.g., mask wearing, testing, limits on group sizes, venue closure. During the SARS-CoV-2 pandemic, many countries implemented NPIs inconsistently in space and time. This inconsistency was especially pronounced for policies in the United States of America (US) related to venue closure.

    Methods

    Here, we investigate the impact of inconsistent policies associated with venue closure using mathematical modeling and high-resolution human mobility, Google search, and county-level SARS-CoV-2 incidence data from the USA. Specifically, we look at high-resolution location data and perform a US-county-level analysis of nearly 8 million SARS-CoV-2 cases and 150 million location visits, including 120 million church visitors across 184,677 churches, 14 million grocery visitors across 7662 grocery stores, and 13.5 million gym visitors across 5483 gyms.

    Results

    Analyzing the interaction between venue closure and changing mobility using a mathematical model shows that, across a broad range of model parameters, inconsistent or partial closure can be worse in terms of disease transmission as compared to scenarios with no closures at all. Importantly, changes in mobility patterns due to epidemic control measures can lead to increase in the future number of cases. In the most severe cases, individuals traveling to neighboring jurisdictions with different closure policies can result in an outbreak that would otherwise have been contained. To motivate our mathematical models, we turn to mobility data and find that while stay-at-home orders and closures decreased contacts in most areas of the USA, some specific activities and venues saw an increase in attendance and an increase in the distance visitors traveled to attend. We support this finding using search query data, which clearly shows a shift in information seeking behavior concurrent with the changing mobility patterns.

    Conclusions

    While coarse-grained observations are not sufficient to validate our models, taken together, they highlight the potential unintended consequences of inconsistent epidemic control policies related to venue closure and stress the importance of balancing the societal needs of a population with the risk of an outbreak growing into a large epidemic.

     
    more » « less
  3. null (Ed.)
    Background The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published both in scientific journals and through non–peer-reviewed channels to investigate the genetic heterogeneity and spatiotemporal dissemination of SARS-CoV-2. However, a systematic investigation of phylogenetic information and sampling bias in the available data is lacking. Although the number of available genome sequences of SARS-CoV-2 is growing daily and the sequences show increasing phylogenetic information, country-specific data still present severe limitations and should be interpreted with caution. Objective The objective of this study was to determine the quality of the currently available SARS-CoV-2 full genome data in terms of sampling bias as well as phylogenetic and temporal signals to inform and guide the scientific community. Methods We used maximum likelihood–based methods to assess the presence of sufficient information for robust phylogenetic and phylogeographic studies in several SARS-CoV-2 sequence alignments assembled from GISAID (Global Initiative on Sharing All Influenza Data) data released between March and April 2020. Results Although the number of high-quality full genomes is growing daily, and sequence data released in April 2020 contain sufficient phylogenetic information to allow reliable inference of phylogenetic relationships, country-specific SARS-CoV-2 data sets still present severe limitations. Conclusions At the present time, studies assessing within-country spread or transmission clusters should be considered preliminary or hypothesis-generating at best. Hence, current reports should be interpreted with caution, and concerted efforts should continue to increase the number and quality of sequences required for robust tracing of the epidemic. 
    more » « less
  4. Abstract

    Knockout of the ORF8 protein has repeatedly spread through the global viral population during SARS-CoV-2 evolution. Here we use both regional and global pathogen sequencing to explore the selection pressures underlying its loss. In Washington State, we identified transmission clusters with ORF8 knockout throughout SARS-CoV-2 evolution, not just on novel, high fitness viral backbones. Indeed, ORF8 is truncated more frequently and knockouts circulate for longer than for any other gene. Using a global phylogeny, we find evidence of positive selection to explain this phenomenon: nonsense mutations resulting in shortened protein products occur more frequently and are associated with faster clade growth rates than synonymous mutations in ORF8. Loss of ORF8 is also associated with reduced clinical severity, highlighting the diverse clinical impacts of SARS-CoV-2 evolution.

     
    more » « less
  5. Abstract Motivation

    Building reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites but millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate and fast phylogenetic inference of resolvable phylogenetic features.

    Results

    We present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. We develop a bootstrap strategy that resamples genomes spatiotemporally to assess topological robustness. The application of TopHap to build a phylogeny of 68 057 SARS-CoV-2 genomes (68KG) from the first year of the pandemic produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from more traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million SARS-CoV-2 genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major and recent variants of concern.

    Availability and implementation

    TopHap is available at https://github.com/SayakaMiura/TopHap.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less