skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on April 21, 2026

Title: A data‐driven sliding‐window pairwise comparative approach for the estimation of transmission fitness of SARS‐CoV‐2 variants and construction of the evolution fitness landscape
Estimating the transmission fitness of SARS‐CoV‐2 variants and understanding their evolutionary fitness trends are important for epidemiological forecasting. Existing methods are often constrained by their parametric natures and do not satisfactorily align with the observations during COVID‐19. Here, we introduce a sliding‐window data‐driven pairwise comparison method, the differential population growth rate (DPGR) that uses viral strains as internal controls to mitigate sampling biases. DPGR is applicable in time windows in which the logarithmic ratio of two variant subpopulations is approximately linear. We apply DPGR to genomic surveillance data and focus on variants of concern (VOCs) in multiple countries and regions. We found that the log‐linear assumption of DPGR can be reliably found within appropriate time windows in many areas. We show that DPGR estimates of VOCs align well with regional empirical observations in different countries. We show that DPGR estimates agree with another method for estimating pathogenic transmission. Furthermore, DPGR allowed us to construct viral relative fitness landscapes that capture the shifting trends of SARS‐CoV‐2 evolution, reflecting the relative changes of transmission traits for key genotypic changes represented by major variants. The straightforward log‐linear regression approach of DPGR may also facilitate its easy adoption. This study shows that DPGR is a promising new tool in our repertoire for addressing future pandemics.  more » « less
Award ID(s):
2200138 2525493 1761839 1852042 2149956
PAR ID:
10587106
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
WILEY
Date Published:
Journal Name:
Quantitative Biology
Volume:
13
Issue:
4
ISSN:
2095-4689
Subject(s) / Keyword(s):
viral pathogen, fitness
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The emergence of Variants of Concern (VOCs) of SARS-CoV-2 with increased transmissibility, immune evasion properties, and virulence poses a great challenge to public health. Despite unprecedented efforts to increase genomic surveillance, fundamental facts about the evolutionary origins of VOCs remain largely unknown. One major uncertainty is whether the VOCs evolved during transmission chains of many acute infections or during long-term infections within single individuals. We test the consistency of these two possible paths with the observed dynamics, focusing on the clustered emergence of the first three VOCs, Alpha, Beta, and Gamma, in late 2020, following a period of relative evolutionary stasis. We consider a range of possible fitness landscapes, in which the VOC phenotypes could be the result of single mutations, multiple mutations that each contribute additively to increasing viral fitness, or epistatic interactions among multiple mutations that do not individually increase viral fitness—a “fitness plateau”. Our results suggest that the timing and dynamics of the VOC emergence, together with the observed number of mutations in VOC lineages, are in best agreement with the VOC phenotype requiring multiple mutations and VOCs having evolved within single individuals with long-term infections. 
    more » « less
  2. NA (Ed.)
    The sequencing of human virus genomes from wastewater samples is an efficient method for tracking viral transmission and evolution at the community level. However, this requires the recovery of viral nucleic acids of high quality. We developed a reusable tangential-flow filtration system to concentrate and purify viruses from wastewater for genome sequencing. A pilot study was conducted with 94 wastewater samples from four local sewersheds, from which viral nucleic acids were extracted, and the whole genome of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was sequenced using the ARTIC V4.0 primers. Our method yielded a high probability (0.9) of recovering complete or near-complete SARS-CoV-2 genomes (>90% coverage at 10× depth) from wastewater when the COVID-19 incidence rate exceeded 33 cases per 100 000 people. The relative abundances of sequenced SARS-CoV-2 variants followed the trends observed from patient-derived samples. We also identified SARS-CoV-2 lineages in wastewater that were underrepresented or not present in the clinical whole-genome sequencing data. The developed tangential-flow filtration system can be easily adopted for the sequencing of other viruses in wastewater, particularly those at low concentrations. 
    more » « less
  3. Abstract The glycosylation on the spike (S) protein of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes COVID-19, modulates the viral infection by altering conformational dynamics, receptor interaction and host immune responses. Several variants of concern (VOCs) of SARS-CoV-2 have evolved during the pandemic, and crucial mutations on the S protein of the virus have led to increased transmissibility and immune escape. In this study, we compare the site-specific glycosylation and overall glycomic profiles of the wild type Wuhan-Hu-1 strain (WT) S protein and five VOCs of SARS-CoV-2: Alpha, Beta, Gamma, Delta and Omicron. Interestingly, both N- and O-glycosylation sites on the S protein are highly conserved among the spike mutant variants, particularly at the sites on the receptor-binding domain (RBD). The conservation of glycosylation sites is noteworthy, as over 2 million SARS-CoV-2 S protein sequences have been reported with various amino acid mutations. Our detailed profiling of the glycosylation at each of the individual sites of the S protein across the variants revealed intriguing possible association of glycosylation pattern on the variants and their previously reported infectivity. While the sites are conserved, we observed changes in the N- and O-glycosylation profile across the variants. The newly emerged variants, which showed higher resistance to neutralizing antibodies and vaccines, displayed a decrease in the overall abundance of complex-type glycans with both fucosylation and sialylation and an increase in the oligomannose-type glycans across the sites. Among the variants, the glycosylation sites with significant changes in glycan profile were observed at both theN-terminal domain and RBD of S protein, with Omicron showing the highest deviation. The increase in oligomannose-type happens sequentially from Alpha through Delta. Interestingly, Omicron does not contain more oligomannose-type glycans compared to Delta but does contain more compared to the WT and other VOCs. O-glycosylation at the RBD showed lower occupancy in the VOCs in comparison to the WT. Our study on the sites and pattern of glycosylation on the SARS-CoV-2 S proteins across the VOCs may help to understand how the virus evolved to trick the host immune system. Our study also highlights how the SARS-CoV-2 virus has conserved bothN- andO- glycosylation sites on the S protein of the most successful variants even after undergoing extensive mutations, suggesting a correlation between infectivity/ transmissibility and glycosylation. 
    more » « less
  4. The rapid spread of SARS-CoV-2 required immediate actions to control the transmission of the virus and minimize its impact on humanity. An extensive mutation rate of this viral genome contributes to the virus’ ability to quickly adapt to environmental changes, impacts transmissibility and antigenicity, and may facilitate immune escape. Therefore, it is of great interest for researchers working in vaccine development and drug design to consider the impact of mutations on virus-drug interactions. Here, we propose a multitarget drug discovery pipeline for identifying potential drug candidates which can efficiently inhibit the Receptor Binding Domain (RBD) of spike glycoproteins from different variants of SARS-CoV-2. Eight homology models of RBDs for selected variants were created and validated using reference crystal structures. We then investigated interactions between host receptor ACE2 and RBDs from nine variants of SARS-CoV-2. It led us to conclude that efficient multi-variant targeting drugs should be capable of blocking residues Q(R)493 and N487 in RBDs. Using methods of molecular docking, molecular mechanics, and molecular dynamics, we identified three lead compounds (hesperidin, narirutin, and neohesperidin) suitable for multitarget SARS-CoV-2 inhibition. These compounds are flavanone glycosides found in citrus fruits – an active ingredient of Traditional Chinese Medicines. The developed pipeline can be further used to (1) model mutants for which crystal structures are not yet available and (2) scan a more extensive library of compounds against other mutated viral proteins. 
    more » « less
  5. Accurate prediction of the transmission fitness of emerging SARS-CoV-2 variants is vital for timely public health responses. In this study, we present a deep learning framework that predicts variant fitness from raw genomic sequences using a convolutional neural network (CNN) trained to regress Differential Population Growth Rate (DPGR) values. Our approach achieves high predictive accuracy R-square value of 0.92 on genomic sequences sampled from the USA and Europe. To interpret the model’s predictions, we apply SHapley Additive exPlanations (SHAP) to identify nucleotide-level contributions to predicted fitness. Our analysis highlights key mutations in ORF9 (nucleocapsid), ORF2 (spike), ORF5 (membrane), and ORF8 that either enhance or reduce predicted DPGR. Notably, we identify amino acid–altering mutations such as D3L, E484K, N501Y, and V97I as strong positive contributors to fitness, while synonymous or non-coding mutations had more subtle or regulatory effects. These findings validate the potential of sequence-based modeling and interpretable AI to support early detection and prioritization of high-risk variants. 
    more » « less