Background Viruses influence global patterns of microbial diversity and nutrient cycles. Though viral metagenomics (viromics), specifically targeting dsDNA viruses, has been critical for revealing viral roles across diverse ecosystems, its analyses differ in many ways from those used for microbes. To date, viromics benchmarking has covered read pre-processing, assembly, relative abundance, read mapping thresholds and diversity estimation, but other steps would benefit from benchmarking and standardization. Here we use in silico-generated datasets and an extensive literature survey to evaluate and highlight how dataset composition (i.e., viromes vs bulk metagenomes) and assembly fragmentation impact (i) viral contig identification tool, (ii) virusmore »
This content will become publicly available on December 1, 2022
VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses
Abstract Background Viruses are a significant player in many biosphere and human ecosystems, but most signals remain “hidden” in metagenomic/metatranscriptomic sequence datasets due to the lack of universal gene markers, database representatives, and insufficiently advanced identification tools. Results Here, we introduce VirSorter2, a DNA and RNA virus identification tool that leverages genome-informed database advances across a collection of customized automatic classifiers to improve the accuracy and range of virus sequence detection. When benchmarked against genomes from both isolated and uncultivated viruses, VirSorter2 uniquely performed consistently with high accuracy (F1-score > 0.8) across viral diversity, while all other tools under-detected viruses outside of the group most represented in reference databases (i.e., those in the order Caudovirales ). Among the tools evaluated, VirSorter2 was also uniquely able to minimize errors associated with atypical cellular sequences including eukaryotic genomes and plasmids. Finally, as the virosphere exploration unravels novel viral sequences, VirSorter2’s modular design makes it inherently able to expand to new types of viruses via the design of new classifiers to maintain maximal sensitivity and specificity. Conclusion With multi-classifier and modular design, VirSorter2 demonstrates higher overall accuracy across major viral groups and will advance our knowledge of virus evolution, diversity, and virus-microbe interaction in more »
- Publication Date:
- NSF-PAR ID:
- 10256502
- Journal Name:
- Microbiome
- Volume:
- 9
- Issue:
- 1
- ISSN:
- 2049-2618
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Giant viruses are widespread in the biosphere and play important roles in biogeochemical cycling and host genome evolution. Also known as nucleo-cytoplasmic large DNA viruses (NCLDVs), these eukaryotic viruses harbor the largest and most complex viral genomes known. Studies have shown that NCLDVs are frequently abundant in metagenomic datasets, and that sequences derived from these viruses can also be found endogenized in diverse eukaryotic genomes. The accurate detection of sequences derived from NCLDVs is therefore of great importance, but this task is challenging owing to both the high level of sequence divergence between NCLDV families and the extraordinarily high diversitymore »
-
Wayne, Marta (Ed.)Abstract The Ichneumonoidea (Ichneumonidae and Braconidae) is an incredibly diverse superfamily of parasitoid wasps that includes species that produce virus-like entities in their reproductive tracts to promote successful parasitism of host insects. Research on these entities has traditionally focused upon two viral genera Bracovirus (in Braconidae) and Ichnovirus (in Ichneumonidae). These viruses are produced using genes known collectively as endogenous viral elements (EVEs) that represent historical, now heritable viral integration events in wasp genomes. Here, new genome sequence assemblies for 11 species and 6 publicly available genomes from the Ichneumonoidea were screened with the goal of identifying novel EVEs andmore »
-
Hatfull, Graham F. (Ed.)ABSTRACT Bacteria and bacteriophages (phages) have evolved potent defense and counterdefense mechanisms that allowed their survival and greatest abundance on Earth. CRISPR (clustered regularly interspaced short palindromic repeat)-Cas (CRISPR-associated) is a bacterial defense system that inactivates the invading phage genome by introducing double-strand breaks at targeted sequences. While the mechanisms of CRISPR defense have been extensively investigated, the counterdefense mechanisms employed by phages are poorly understood. Here, we report a novel counterdefense mechanism by which phage T4 restores the genomes broken by CRISPR cleavages. Catalyzed by the phage-encoded recombinase UvsX, this mechanism pairs very short stretches of sequence identity (minihomologymore »
-
Metagenomics has enabled sequencing of viral communities from a myriad of different environments. Viral metagenomic studies routinely uncover sequences with no recognizable homology to known coding regions or genomes. Nevertheless, complete viral genomes have been constructed directly from complex community metagenomes, often through tedious manual curation. To address this, we developed the software tool virMine to identify viral genomes from raw reads representative of viral or mixed (viral and bacterial) communities. virMine automates sequence read quality control, assembly, and annotation. Researchers can easily refine their search for a specific study system and/or feature(s) of interest. In contrast to other viralmore »