skip to main content


Title: Rescuing low frequency variants within intra-host viral populations directly from Oxford Nanopore sequencing data
Abstract Infectious disease monitoring on Oxford Nanopore Technologies (ONT) platforms offers rapid turnaround times and low cost. Tracking low frequency intra-host variants provides important insights with respect to elucidating within-host viral population dynamics and transmission. However, given the higher error rate of ONT, accurate identification of intra-host variants with low allele frequencies remains an open challenge with no viable computational solutions available. In response to this need, we present Variabel, a novel approach and first method designed for rescuing low frequency intra-host variants from ONT data alone. We evaluate Variabel on both synthetic data (SARS-CoV-2) and patient derived datasets (Ebola virus, norovirus, SARS-CoV-2); our results show that Variabel can accurately identify low frequency variants below 0.5 allele frequency, outperforming existing state-of-the-art ONT variant callers for this task. Variabel is open-source and available for download at: www.gitlab.com/treangenlab/variabel .  more » « less
Award ID(s):
2126387
NSF-PAR ID:
10378505
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Nature Communications
Volume:
13
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Long-range ribonucleic acid (RNA)–RNA interactions (RRI) are prevalent in positive-strand RNA viruses, including Beta-coronaviruses, and these take part in regulatory roles, including the regulation of sub-genomic RNA production rates. Crosslinking of interacting RNAs and short read-based deep sequencing of resulting RNA–RNA hybrids have shown that these long-range structures exist in severe acute respiratory syndrome coronavirus (SARS-CoV)-2 on both genomic and sub-genomic levels and in dynamic topologies. Furthermore, co-evolution of coronaviruses with their hosts is navigated by genetic variations made possible by its large genome, high recombination frequency and a high mutation rate. SARS-CoV-2’s mutations are known to occur spontaneously during replication, and thousands of aggregate mutations have been reported since the emergence of the virus. Although many long-range RRIs have been experimentally identified using high-throughput methods for the wild-type SARS-CoV-2 strain, evolutionary trajectory of these RRIs across variants, impact of mutations on RRIs and interaction of SARS-CoV-2 RNAs with the host have been largely open questions in the field. In this review, we summarize recent computational tools and experimental methods that have been enabling the mapping of RRIs in viral genomes, with a specific focus on SARS-CoV-2. We also present available informatics resources to navigate the RRI maps and shed light on the impact of mutations on the RRI space in viral genomes. Investigating the evolution of long-range RNA interactions and that of virus–host interactions can contribute to the understanding of new and emerging variants as well as aid in developing improved RNA therapeutics critical for combating future outbreaks.

     
    more » « less
  2. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a high mutation rate and many variants have emerged in the last 2 years, including Alpha, Beta, Delta, Gamma and Omicron. Studies showed that the host-genome similarity (HGS) of SARS-CoV-2 is higher than SARS-CoV and the HGS of open reading frame (ORF) in coronavirus genome is closely related to suppression of innate immunity. Many works have shown that ORF 6 and ORF 8 of SARS-CoV-2 play an important role in suppressing IFN-β signaling pathway in vivo. However, the relation between HGS and the adaption of SARS-CoV-2 variants is still not clear. This work investigates HGS of SARS-CoV-2 variants based on a dataset containing more than 40,000 viral genomes. The relation between HGS of viral ORFs and the suppression of antivirus response is studied. The results show that ORF 7b, ORF 6 and ORF 8 are the top 3 genes with the highest HGS. In the past 2 years, the HGS values of ORF 8 and ORF 7B of SARS-CoV-2 have increased greatly. A remarkable correlation is discovered between HGS and inhibition of antivirus response of immune system, which suggests that the similarity between coronavirus and host gnome may be an indicator of the suppression of innate immunity. Among the five variants (Alpha, Beta, Delta, Gamma and Omicron), Delta has the highest HGS and Omicron has the lowest HGS. This finding implies that the high HGS in Delta variant may indicate further suppression of host innate immunity. However, the relatively low HGS of Omicron is still a puzzle. By comparing the mutations in genomes of Alpha, Delta and Omicron variants, a commonly shared mutation ACT > ATT is identified in high-HGS strain populations. The high HGS mutations among the three variants are quite different. This finding strongly suggests that mutations in high HGS strains are different in different variants. Only a few common mutations survive, which may play important role in improving the adaptability of SARS-CoV-2. However, the mechanism for how the mutations help SARS-CoV-2 escape immunity is still unclear. HGS analysis is a new method to study virus–host interaction and may provide a way to understand the rapid mutation and adaption of SARS-CoV-2. 
    more » « less
  3. Individuals infected with the SARS-CoV-2 virus present with a wide variety of symptoms ranging from asymptomatic to severe and even lethal outcomes. Past research has revealed a genetic haplotype on chromosome 3 that entered the human population via introgression from Neanderthals as the strongest genetic risk factor for the severe response to COVID-19. However, the specific variants along this introgressed haplotype that contribute to this risk and the biological mechanisms that are involved remain unclear. Here, we assess the variants present on the risk haplotype for their likelihood of driving the genetic predisposition to severe COVID-19 outcomes. We do this by first exploring their impact on the regulation of genes involved in COVID-19 infection using a variety of population genetics and functional genomics tools. We then perform a locus-specific massively parallel reporter assay to individually assess the regulatory potential of each allele on the haplotype in a multipotent immune-related cell line. We ultimately reduce the set of over 600 linked genetic variants to identify four introgressed alleles that are strong functional candidates for driving the association between this locus and severe COVID-19. Using reporter assays in the presence/absence of SARS-CoV-2 , we find evidence that these variants respond to viral infection. These variants likely drive the locus’ impact on severity by modulating the regulation of two critical chemokine receptor genes: CCR1 and CCR5 . These alleles are ideal targets for future functional investigations into the interaction between host genomics and COVID-19 outcomes. 
    more » « less
  4. null (Ed.)
    The novel coronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the cause of COVID-19. The main receptor of SARS-CoV-2, angiotensin I converting enzyme 2 (ACE2), is now undergoing extensive scrutiny to understand the routes of transmission and sensitivity in different species. Here, we utilized a unique dataset of ACE2 sequences from 410 vertebrate species, including 252 mammals, to study the conservation of ACE2 and its potential to be used as a receptor by SARS-CoV-2. We designed a five-category binding score based on the conservation properties of 25 amino acids important for the binding between ACE2 and the SARS-CoV-2 spike protein. Only mammals fell into the medium to very high categories and only catarrhine primates into the very high category, suggesting that they are at high risk for SARS-CoV-2 infection. We employed a protein structural analysis to qualitatively assess whether amino acid changes at variable residues would be likely to disrupt ACE2/SARS-CoV-2 spike protein binding and found the number of predicted unfavorable changes significantly correlated with the binding score. Extending this analysis to human population data, we found only rare (frequency <0.001) variants in 10/25 binding sites. In addition, we found significant signals of selection and accelerated evolution in the ACE2 coding sequence across all mammals, and specific to the bat lineage. Our results, if confirmed by additional experimental data, may lead to the identification of intermediate host species for SARS-CoV-2, guide the selection of animal models of COVID-19, and assist the conservation of animals both in native habitats and in human care. 
    more » « less
  5. Abstract Summary

    Genomics has become an essential technology for surveilling emerging infectious disease outbreaks. A range of technologies and strategies for pathogen genome enrichment and sequencing are being used by laboratories worldwide, together with different and sometimes ad hoc, analytical procedures for generating genome sequences. A fully integrated analytical process for raw sequence to consensus genome determination, suited to outbreaks such as the ongoing COVID-19 pandemic, is critical to provide a solid genomic basis for epidemiological analyses and well-informed decision making. We have developed a web-based platform and integrated bioinformatic workflows that help to provide consistent high-quality analysis of SARS-CoV-2 sequencing data generated with either the Illumina or Oxford Nanopore Technologies (ONT). Using an intuitive web-based interface, this workflow automates data quality control, SARS-CoV-2 reference-based genome variant and consensus calling, lineage determination and provides the ability to submit the consensus sequence and necessary metadata to GenBank, GISAID and INSDC raw data repositories. We tested workflow usability using real world data and validated the accuracy of variant and lineage analysis using several test datasets, and further performed detailed comparisons with results from the COVID-19 Galaxy Project workflow. Our analyses indicate that EC-19 workflows generate high-quality SARS-CoV-2 genomes. Finally, we share a perspective on patterns and impact observed with Illumina versus ONT technologies on workflow congruence and differences.

    Availability and implementation

    https://edge-covid19.edgebioinformatics.org, and https://github.com/LANL-Bioinformatics/EDGE/tree/SARS-CoV2.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less