skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Data-Driven Whole-Genome Clustering to Detect Geospatial, Temporal, and Functional Trends in SARS-CoV-2 Evolution
Current methods for defining SARS-CoV-2 lineages ignore the vast majority of the SARS-CoV-2 genome. We develop and apply an exhaustive vector comparison method that directly compares all known SARS-CoV-2 genome sequences to produce novel lineage classifications. We utilize data-driven models that (i) accurately capture the complex interactions across the set of all known SARSCoV-2 genomes, (ii) scale to leadership- class computing systems, and (iii) enable tracking how such strains evolve geospatially over time. We show that during the height of the original Omicron surge, countries across Europe, Asia, and the Americas had a spatially asynchronous distribution of Omicron sub-strains. Moreover, neighboring countries were often dominated by either different clusters of the same variant or different variants altogether throughout the pandemic. Analyses of this kind may suggest a different pattern of epidemiological risk than was understood from conventional data, as well as produce actionable insights and transform our ability to prepare for and respond to current and future biological threats.  more » « less
Award ID(s):
2231624 2133763
PAR ID:
10504424
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the Platform for Advanced Scientific Computing Conference
ISBN:
9798400701900
Page Range / eLocation ID:
1 to 7
Format(s):
Medium: X
Location:
Davos Switzerland
Sponsoring Org:
National Science Foundation
More Like this
  1. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a high mutation rate and many variants have emerged in the last 2 years, including Alpha, Beta, Delta, Gamma and Omicron. Studies showed that the host-genome similarity (HGS) of SARS-CoV-2 is higher than SARS-CoV and the HGS of open reading frame (ORF) in coronavirus genome is closely related to suppression of innate immunity. Many works have shown that ORF 6 and ORF 8 of SARS-CoV-2 play an important role in suppressing IFN-β signaling pathway in vivo. However, the relation between HGS and the adaption of SARS-CoV-2 variants is still not clear. This work investigates HGS of SARS-CoV-2 variants based on a dataset containing more than 40,000 viral genomes. The relation between HGS of viral ORFs and the suppression of antivirus response is studied. The results show that ORF 7b, ORF 6 and ORF 8 are the top 3 genes with the highest HGS. In the past 2 years, the HGS values of ORF 8 and ORF 7B of SARS-CoV-2 have increased greatly. A remarkable correlation is discovered between HGS and inhibition of antivirus response of immune system, which suggests that the similarity between coronavirus and host gnome may be an indicator of the suppression of innate immunity. Among the five variants (Alpha, Beta, Delta, Gamma and Omicron), Delta has the highest HGS and Omicron has the lowest HGS. This finding implies that the high HGS in Delta variant may indicate further suppression of host innate immunity. However, the relatively low HGS of Omicron is still a puzzle. By comparing the mutations in genomes of Alpha, Delta and Omicron variants, a commonly shared mutation ACT > ATT is identified in high-HGS strain populations. The high HGS mutations among the three variants are quite different. This finding strongly suggests that mutations in high HGS strains are different in different variants. Only a few common mutations survive, which may play important role in improving the adaptability of SARS-CoV-2. However, the mechanism for how the mutations help SARS-CoV-2 escape immunity is still unclear. HGS analysis is a new method to study virus–host interaction and may provide a way to understand the rapid mutation and adaption of SARS-CoV-2. 
    more » « less
  2. We integrate evolutionary predictions based on the neutral theory of molecular evolution with protein dynamics to generate mechanistic insight into the molecular adaptations of the SARS-COV-2 spike (S) protein. With this approach, we first identified candidate adaptive polymorphisms (CAPs) of the SARS-CoV-2 S protein and assessed the impact of these CAPs through dynamics analysis. Not only have we found that CAPs frequently overlap with well-known functional sites, but also, using several different dynamics-based metrics, we reveal the critical allosteric interplay between SARS-CoV-2 CAPs and the S protein binding sites with the human ACE2 (hACE2) protein. CAPs interact far differently with the hACE2 binding site residues in the open conformation of the S protein compared to the closed form. In particular, the CAP sites control the dynamics of binding residues in the open state, suggesting an allosteric control of hACE2 binding. We also explored the characteristic mutations of different SARS-CoV-2 strains to find dynamic hallmarks and potential effects of future mutations. Our analyses reveal that Delta strain-specific variants have non-additive (i.e., epistatic) interactions with CAP sites, whereas the less pathogenic Omicron strains have mostly additive mutations. Finally, our dynamics-based analysis suggests that the novel mutations observed in the Omicron strain epistatically interact with the CAP sites to help escape antibody binding. 
    more » « less
  3. Mostafa, Heba H. (Ed.)
    ABSTRACT SARS-CoV-2 variants of concern (VOCs) continue to pose a public health threat which necessitates a real-time monitoring strategy to complement whole genome sequencing. Thus, we investigated the efficacy of competitive probe RT-qPCR assays for six mutation sites identified in SARS-CoV-2 VOCs and, after validating the assays with synthetic RNA, performed these assays on positive saliva samples. When compared with whole genome sequence results, the SΔ69-70 and ORF1aΔ3675-3677 assays demonstrated 93.60 and 68.00% accuracy, respectively. The SNP assays (K417T, E484K, E484Q, L452R) demonstrated 99.20, 96.40, 99.60, and 96.80% accuracies, respectively. Lastly, we screened 345 positive saliva samples from 7 to 22 December 2021 using Omicron-specific mutation assays and were able to quickly identify rapid spread of Omicron in Upstate South Carolina. Our workflow demonstrates a novel approach for low-cost, real-time population screening of VOCs. IMPORTANCE SARS-CoV-2 variants of concern and their many sublineages can be characterized by mutations present within their genetic sequences. These mutations can provide selective advantages such as increased transmissibility and antibody evasion, which influences public health recommendations such as mask mandates, quarantine requirements, and treatment regimens. Our RT-qPCR workflow allows for strain identification of SARS-CoV-2 positive saliva samples by targeting common mutation sites shared between variants of concern and detecting single nucleotides present at the targeted location. This differential diagnostic system can quickly and effectively identify a wide array of SARS-CoV-2 strains, which can provide more informed public health surveillance strategies in the future. 
    more » « less
  4. Lee, Benhur (Ed.)
    ABSTRACT Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has infected over 40 million people worldwide, with over 1 million deaths as of October 2020 and with multiple efforts in the development and testing of antiviral drugs and vaccines under way. In order to gain insights into SARS-CoV-2 evolution and drug targets, we investigated how and to what extent the SARS-CoV-2 genome sequence differs from those of other well-characterized human and animal coronavirus genomes, as well as how polymorphic SARS-CoV-2 genomes are generally. We ultimately sought to identify features in the SARS-CoV-2 genome that may contribute to its viral replication, host pathogenicity, and vulnerabilities. Our analyses suggest the presence of unique sequence signatures in the 3′ untranslated region (3′-UTR) of betacoronavirus lineage B, which phylogenetically encompasses SARS-CoV-2 and SARS-CoV as well as multiple groups of bat and animal coronaviruses. In addition, we identified genome-wide patterns of variation across different SARS-CoV-2 strains that likely reflect the effects of selection. Finally, we provide evidence for a possible host-microRNA-mediated interaction between the 3′-UTR and human microRNA hsa-miR-1307-3p based on the results of multiple computational target prediction analyses and an assessment of similar interactions involving the influenza A H1N1 virus. This interaction also suggests a possible survival mechanism, whereby a mutation in the SARS-CoV-2 3′-UTR leads to a weakened host immune response. The potential roles of host microRNAs in SARS-CoV-2 replication and infection and the exploitation of conserved features in the 3′-UTR as therapeutic targets warrant further investigation. IMPORTANCE The coronavirus disease 2019 (COVID-19) outbreak is having a dramatic global effect on public health and the economy. As of October 2020, SARS-CoV-2 has been detected in over 189 countries, has infected over 40 million people, and is responsible for more than 1 million deaths. The genome of SARS-CoV-2 is small but complex, and its functions and interactions with human host factors are being studied extensively. The significance of our study is that, using extensive SARS-CoV-2 genome analysis techniques, we identified potential interacting human host microRNA targets that share similarity with those of influenza A virus H1N1. Our study results will allow the development of virus-host interaction models that will enhance our understanding of SARS-CoV-2 pathogenesis and motivate the exploitation of both the interacting viral and host factors as therapeutic targets. 
    more » « less
  5. Abstract The identification of the Omicron (B.1.1.529.1 or BA.1) variant of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Botswana in November 2021 1 immediately caused concern owing to the number of alterations in the spike glycoprotein that could lead to antibody evasion. We 2 and others 3–6 recently reported results confirming such a concern. Continuing surveillance of the evolution of Omicron has since revealed the rise in prevalence of two sublineages, BA.1 with an R346K alteration (BA.1+R346K, also known as BA.1.1) and B.1.1.529.2 (BA.2), with the latter containing 8 unique spike alterations and lacking 13 spike alterations found in BA.1. Here we extended our studies to include antigenic characterization of these new sublineages. Polyclonal sera from patients infected by wild-type SARS-CoV-2 or recipients of current mRNA vaccines showed a substantial loss in neutralizing activity against both BA.1+R346K and BA.2, with drops comparable to that already reported for BA.1 (refs. 2,3,5,6 ). These findings indicate that these three sublineages of Omicron are antigenically equidistant from the wild-type SARS-CoV-2 and thus similarly threaten the efficacies of current vaccines. BA.2 also exhibited marked resistance to 17 of 19 neutralizing monoclonal antibodies tested, including S309 (sotrovimab) 7 , which had retained appreciable activity against BA.1 and BA.1+R346K (refs. 2–4,6 ). This finding shows that no authorized monoclonal antibody therapy could adequately cover all sublineages of the Omicron variant, except for the recently authorized LY-CoV1404 (bebtelovimab). 
    more » « less