skip to main content


Title: Early detection of emerging viral variants through analysis of community structure of coordinated substitution networks
Abstract

The emergence of viral variants with altered phenotypes is a public health challenge underscoring the need for advanced evolutionary forecasting methods. Given extensive epistatic interactions within viral genomes and known viral evolutionary history, efficient genomic surveillance necessitates early detection of emerging viral haplotypes rather than commonly targeted single mutations. Haplotype inference, however, is a significantly more challenging problem precluding the use of traditional approaches. Here, using SARS-CoV-2 evolutionary dynamics as a case study, we show that emerging haplotypes with altered transmissibility can be linked to dense communities in coordinated substitution networks, which become discernible significantly earlier than the haplotypes become prevalent. From these insights, we develop a computational framework for inference of viral variants and validate it by successful early detection of known SARS-CoV-2 strains. Our methodology offers greater scalability than phylogenetic lineage tracing and can be applied to any rapidly evolving pathogen with adequate genomic surveillance data.

 
more » « less
NSF-PAR ID:
10498572
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
15
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Building reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites but millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate and fast phylogenetic inference of resolvable phylogenetic features.

    Results

    We present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. We develop a bootstrap strategy that resamples genomes spatiotemporally to assess topological robustness. The application of TopHap to build a phylogeny of 68 057 SARS-CoV-2 genomes (68KG) from the first year of the pandemic produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from more traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million SARS-CoV-2 genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major and recent variants of concern.

    Availability and implementation

    TopHap is available at https://github.com/SayakaMiura/TopHap.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Abstract

    Long-range ribonucleic acid (RNA)–RNA interactions (RRI) are prevalent in positive-strand RNA viruses, including Beta-coronaviruses, and these take part in regulatory roles, including the regulation of sub-genomic RNA production rates. Crosslinking of interacting RNAs and short read-based deep sequencing of resulting RNA–RNA hybrids have shown that these long-range structures exist in severe acute respiratory syndrome coronavirus (SARS-CoV)-2 on both genomic and sub-genomic levels and in dynamic topologies. Furthermore, co-evolution of coronaviruses with their hosts is navigated by genetic variations made possible by its large genome, high recombination frequency and a high mutation rate. SARS-CoV-2’s mutations are known to occur spontaneously during replication, and thousands of aggregate mutations have been reported since the emergence of the virus. Although many long-range RRIs have been experimentally identified using high-throughput methods for the wild-type SARS-CoV-2 strain, evolutionary trajectory of these RRIs across variants, impact of mutations on RRIs and interaction of SARS-CoV-2 RNAs with the host have been largely open questions in the field. In this review, we summarize recent computational tools and experimental methods that have been enabling the mapping of RRIs in viral genomes, with a specific focus on SARS-CoV-2. We also present available informatics resources to navigate the RRI maps and shed light on the impact of mutations on the RRI space in viral genomes. Investigating the evolution of long-range RNA interactions and that of virus–host interactions can contribute to the understanding of new and emerging variants as well as aid in developing improved RNA therapeutics critical for combating future outbreaks.

     
    more » « less
  3. Abstract

    Sensing of viral antigens has become a critical tool in combating infectious diseases. Current sensing techniques have a tradeoff between sensitivity and time of detection; with 10–30 min of detection time at a relatively low sensitivity and 6–12 h of detection at a high (picomolar) sensitivity. In this research, uniquely nanoengineered interfaces are demonstrated on 3D electrodes that enable the detection of spike antigens of SARS‐CoV‐2 and their variants in seconds at femtomolar concentrations with excellent specificity, thus, overcoming this tradeoff. The 3D electrodes, manufactured using a high‐resolution aerosol jet 3D nanoprinter, consist of a microelectrode array of sintered gold nanoparticles coated with graphene and antibodies specific to severe acute respiratory syndrome coronavirus‐2 (SARS‐CoV‐2) spike antigens. An impedance‐based sensing modality is employed to sense several pseudoviruses of SARS‐CoV‐2 variants of concern (VOCs). This device is sensitive to most of the pseudoviruses of SARS‐CoV‐2 VOCs. A high sensitivity of 100 fm, along with a low limit‐of‐detection of 9.2 fmwithin a test range of 0.1–1000 pm, and a detection time of 43 s are shown. This work illustrates that effective nano‐bioengineering of interfaces can be used to create an ultrafast and ultrasensitive healthcare diagnostic tool for combating emerging infections.

     
    more » « less
  4. Abstract

    The contours of endemic coronaviral disease in humans and other animals are shaped by the tendency of coronaviruses to generate new variants superimposed upon nonsterilizing immunity. Consequently, patterns of coronaviral reinfection in animals can inform the emerging endemic state of the SARS-CoV-2 pandemic. We generated controlled reinfection data after high and low risk natural exposure or heterologous vaccination to sialodacryoadenitis virus (SDAV) in rats. Using deterministic compartmental models, we utilized in vivo estimates from these experiments to model the combined effects of variable transmission rates, variable duration of immunity, successive waves of variants, and vaccination on patterns of viral transmission. Using rat experiment-derived estimates, an endemic state achieved by natural infection alone occurred after a median of 724 days with approximately 41.3% of the population susceptible to reinfection. After accounting for translationally altered parameters between rat-derived data and human SARS-CoV-2 transmission, and after introducing vaccination, we arrived at a median time to endemic stability of 1437 (IQR = 749.25) days with a median 15.4% of the population remaining susceptible. We extended the models to introduce successive variants with increasing transmissibility and included the effect of varying duration of immunity. As seen with endemic coronaviral infections in other animals, transmission states are altered by introduction of new variants, even with vaccination. However, vaccination combined with natural immunity maintains a lower prevalence of infection than natural infection alone and provides greater resilience against the effects of transmissible variants.

     
    more » « less
  5. Due to the emergence of new variants of the SARS-CoV-2 coronavirus, the question of how the viral genomes evolved, leading to the formation of highly infectious strains, becomes particularly important. Three major emergent strains, Alpha, Beta and Delta, characterized by a significant number of missense mutations, provide a natural test field. We accumulated and aligned 4.7 million SARS-CoV-2 genomes from the GISAID database and carried out a comprehensive set of analyses. This collection covers the period until the end of October 2021, i.e., the beginnings of the Omicron variant. First, we explored combinatorial complexity of the genomic variants emerging and their timing, indicating very strong, albeit hidden, selection forces. Our analyses show that the mutations that define variants of concern did not arise gradually but rather co-evolved rapidly, leading to the emergence of the full variant strain. To explore in more detail the evolutionary forces at work, we developed time trajectories of mutations at all 29,903 sites of the SARS-CoV-2 genome, week by week, and stratified them into trends related to (i) point substitutions, (ii) deletions and (iii) non-sequenceable regions. We focused on classifying the genetic forces active at different ranges of the mutational spectrum. We observed the agreement of the lowest-frequency mutation spectrum with the Griffiths–Tavaré theory, under the Infinite Sites Model and neutrality. If we widen the frequency range, we observe the site frequency spectra much more consistently with the Tung–Durrett model assuming clone competition and selection. The coefficients of the fitting model indicate the possibility of selection acting to promote gradual growth slowdown, as observed in the history of the variants of concern. These results add up to a model of genomic evolution, which partly fits into the classical drift barrier ideas. Certain observations, such as mutation “bands” persistent over the epidemic history, suggest contribution of genetic forces different from mutation, drift and selection, including recombination or other genome transformations. In addition, we show that a “toy” mathematical model can qualitatively reproduce how new variants (clones) stem from rare advantageous driver mutations, and then acquire neutral or disadvantageous passenger mutations which gradually reduce their fitness so they can be then outcompeted by new variants due to other driver mutations. 
    more » « less