Abstract The emergence of viral variants with altered phenotypes is a public health challenge underscoring the need for advanced evolutionary forecasting methods. Given extensive epistatic interactions within viral genomes and known viral evolutionary history, efficient genomic surveillance necessitates early detection of emerging viral haplotypes rather than commonly targeted single mutations. Haplotype inference, however, is a significantly more challenging problem precluding the use of traditional approaches. Here, using SARS-CoV-2 evolutionary dynamics as a case study, we show that emerging haplotypes with altered transmissibility can be linked to dense communities in coordinated substitution networks, which become discernible significantly earlier than the haplotypes become prevalent. From these insights, we develop a computational framework for inference of viral variants and validate it by successful early detection of known SARS-CoV-2 strains. Our methodology offers greater scalability than phylogenetic lineage tracing and can be applied to any rapidly evolving pathogen with adequate genomic surveillance data. 
                        more » 
                        « less   
                    
                            
                            Community structure and temporal dynamics of SARS-CoV-2 epistatic network allow for early detection of emerging variants with altered phenotypes
                        
                    
    
            Abstract The emergence of viral variants with altered phenotypes is a public health challenge underscoring the need for advanced evolutionary forecasting methods. Given extensive epistatic interactions within viral genomes and known viral evolutionary history, efficient genomic surveillance necessitates early detection of emerging viral haplotypes rather than commonly targeted single mutations. Haplotype inference, however, is a significantly more challenging problem precluding the use of traditional approaches. Here, using SARS-CoV-2 evolutionary dynamics as a case study, we show that emerging haplotypes with altered transmissibility can be linked to dense communities in coordinated substitution networks, which become discernible significantly earlier than the haplotypes become prevalent. From these insights, we develop a computational framework for inference of viral variants and validate it by successful early detection of known SARS-CoV-2 strains. Our methodology offers greater scalability than phylogenetic lineage tracing and can be applied to any rapidly evolving pathogen with adequate genomic surveillance data. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2415564
- PAR ID:
- 10574684
- Publisher / Repository:
- Lecture Notes in Computer Science
- Date Published:
- Format(s):
- Medium: X
- Institution:
- University of Connecticut
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract MotivationBuilding reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites but millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate and fast phylogenetic inference of resolvable phylogenetic features. ResultsWe present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. We develop a bootstrap strategy that resamples genomes spatiotemporally to assess topological robustness. The application of TopHap to build a phylogeny of 68 057 SARS-CoV-2 genomes (68KG) from the first year of the pandemic produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from more traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million SARS-CoV-2 genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major and recent variants of concern. Availability and implementationTopHap is available at https://github.com/SayakaMiura/TopHap. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
- 
            Mutheneni, Srinivasa Rao (Ed.)The lack of routine viral genomic surveillance delayed the initial detection of SARS-CoV-2, allowing the virus to spread unfettered at the outset of the U.S. epidemic. Over subsequent months, poor surveillance enabled variants to emerge unnoticed. Against this backdrop, long-standing social and racial inequities have contributed to a greater burden of cases and deaths among minority groups. To begin to address these problems, we developed a new variant surveillance model geared toward building ‘next generation’ genome sequencing capacity at universities in or near rural areas and engaging the participation of their local communities. The resulting genomic surveillance network has generated more than 1,000 SARS-CoV-2 genomes to date, including the first confirmed case in northeast Louisiana of Omicron, and the first and sixth confirmed cases in Georgia of the emergent BA.2.75 and BQ.1.1 variants, respectively. In agreement with other studies, significantly higher viral gene copy numbers were observed in Delta variant samples compared to those from Omicron BA.1 variant infections, and lower copy numbers were seen in asymptomatic infections relative to symptomatic ones. Collectively, the results and outcomes from our collaborative work demonstrate that establishing genomic surveillance capacity at smaller academic institutions in rural areas and fostering relationships between academic teams and local health clinics represent a robust pathway to improve pandemic readiness.more » « less
- 
            BACKGROUND: Genomic surveillance allows identification of circulating SARS-CoV-2 variants. We provide an update on the evolution of SARS-CoV-2 in Rhode Island (RI). METHODS: All publicly available SARS-CoV-2 RI sequences were retrieved from https://www.gisaid.org. Genomic analyses were conducted to identify variants of concern (VOC), variants being monitored (VBM), or non-VOC/non-VBM, and investigate their evolution. RESULTS: Overall, 17,340 SARS-CoV-2 RI sequences were available between 2/2020–5/2022 across five (globally recognized) major waves, including 1,462 (8%) sequences from 36 non VOC/non-VBM until 5/2021; 10,565 (61%) sequences from 8 VBM between 5/2021–12/2021, most commonly Delta; and 5,313 (31%) sequences from the VOC Omicron from 12/2021 onwards. Genomic analyses demonstrated 71 Delta and 44 Omicron sub-lineages, with occurrence of variant-defining mutations in other variants. CONCLUSION: Statewide SARS-CoV-2 genomic surveillance allows for continued characterization of circulating variants and monitoring of viral evolution, which inform the local health force and guide public health on mitigation efforts against COVID-19. KEYWORDS: COVID-19, SARS-CoV-2, variants, genomic sequencing, Rhode Islandmore » « less
- 
            Pettigrew, Melinda M. (Ed.)ABSTRACT Viral genome sequencing has guided our understanding of the spread and extent of genetic diversity of SARS-CoV-2 during the COVID-19 pandemic. SARS-CoV-2 viral genomes are usually sequenced from nasopharyngeal swabs of individual patients to track viral spread. Recently, RT-qPCR of municipal wastewater has been used to quantify the abundance of SARS-CoV-2 in several regions globally. However, metatranscriptomic sequencing of wastewater can be used to profile the viral genetic diversity across infected communities. Here, we sequenced RNA directly from sewage collected by municipal utility districts in the San Francisco Bay Area to generate complete and nearly complete SARS-CoV-2 genomes. The major consensus SARS-CoV-2 genotypes detected in the sewage were identical to clinical genomes from the region. Using a pipeline for single nucleotide variant calling in a metagenomic context, we characterized minor SARS-CoV-2 alleles in the wastewater and detected viral genotypes which were also found within clinical genomes throughout California. Observed wastewater variants were more similar to local California patient-derived genotypes than they were to those from other regions within the United States or globally. Additional variants detected in wastewater have only been identified in genomes from patients sampled outside California, indicating that wastewater sequencing can provide evidence for recent introductions of viral lineages before they are detected by local clinical sequencing. These results demonstrate that epidemiological surveillance through wastewater sequencing can aid in tracking exact viral strains in an epidemic context.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    