skip to main content

Title: vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria

Taxonomic classification of archaeal and bacterial viruses is challenging, yet also fundamental for developing a predictive understanding of microbial ecosystems. Recent identification of hundreds of thousands of new viral genomes and genome fragments, whose hosts remain unknown, requires a paradigm shift away from traditional classification approaches and towards the use of genomes for taxonomy. Here we revisited the use of genomes and their protein content as a means for developing a viral taxonomy for bacterial and archaeal viruses. A network-based analytic was evaluated and benchmarked against authority-accepted taxonomic assignments and found to be largely concordant. Exceptions were manually examined and found to represent areas of viral genome ‘sequence space’ that are under-sampled or prone to excessive genetic exchange. While both cases are poorly resolved by genome-based taxonomic approaches, the former will improve as viral sequence space is better sampled and the latter are uncommon. Finally, given the largely robust taxonomic capabilities of this approach, we sought to enable researchers to easily and systematically classify new viruses. Thus, we established a tool, vConTACT, as an app at iVirus, where it operates as a fast, highly scalable, user-friendly app within the free and powerful CyVerse cyberinfrastructure.

 ;  ;  ;  ;  ;  
Publication Date:
Journal Name:
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    With the advent of metagenomics, the importance of microorganisms and how their interactions are relevant to ecosystem resilience, sustainability, and human health has become evident. Cataloging and preserving biodiversity is paramount not only for the Earth’s natural systems but also for discovering solutions to challenges that we face as a growing civilization. Metagenomics pertains to the in silico study of all microorganisms within an ecological community in situ,however, many software suites recover only prokaryotes and have limited to no support for viruses and eukaryotes.


    In this study, we introduce theViral Eukaryotic Bacterial Archaeal(VEBA) open-source software suite developed to recover genomes from all domains. To our knowledge,VEBAis the first end-to-end metagenomics suite that can directly recover, quality assess, and classify prokaryotic, eukaryotic, and viral genomes from metagenomes.VEBAimplements a novel iterative binning procedure and hybrid sample-specific/multi-sample framework that yields more genomes than any existing methodology alone.VEBAincludes a consensus microeukaryotic database containing proteins from existing databases to optimize microeukaryotic gene modeling and taxonomic classification.VEBAalso provides a unique clustering-based dereplication strategy allowing for sample-specific genomes and genes to be directly compared across non-overlapping biological samples. Finally,VEBAis the only pipeline that automates the detection of candidate phyla radiation bacteria and implements the appropriate genomemore »quality assessments.VEBA’s capabilities are demonstrated by reanalyzing 3 existing public datasets which recovered a total of 948 MAGs (458 prokaryotic, 8 eukaryotic, and 482 viral) including several uncharacterized organisms and organisms with no public genome representatives.


    TheVEBAsoftware suite allows for the in silico recovery of microorganisms from all domains of life by integrating cutting edge algorithms in novel ways.VEBAfully integrates both end-to-end and task-specific metagenomic analysis in a modular architecture that minimizes dependencies and maximizes productivity. The contributions ofVEBAto the metagenomics community includes seamless end-to-end metagenomics analysis but also provides users with the flexibility to perform specific analytical tasks.VEBAallows for the automation of several metagenomics steps and shows that new information can be recovered from existing datasets.

    « less
  2. ABSTRACT Genus assignment is fundamental in the characterization of microbes, yet there is currently no unambiguous way to demarcate genera solely using standard genomic relatedness indices. Here, we propose an approach to demarcate genera that relies on the combined use of the average nucleotide identity, genome alignment fraction, and the distinction between type- and non-type species. More than 3,500 genomes representing type strains of species from >850 genera of either bacterial or archaeal lineages were tested. Over 140 genera were analyzed in detail within the taxonomic context of order/family. Significant genomic differences between members of a genus and type species of other genera in the same order/family were conserved in 94% of the cases. Nearly 90% (92% if polyphyletic genera are excluded) of the type strains were classified in agreement with current taxonomy. The 448 type strains that need reclassification directly impact 33% of the genera analyzed in detail. The results provide a first line of evidence that the combination of genomic indices provides added resolution to effectively demarcate genera within the taxonomic framework that is currently based on the 16S rRNA gene. We also identify the emergence of natural breakpoints at the genome level that can further help inmore »the circumscription of taxa, increasing the proportion of directly impacted genera to at least 43% and pointing at inaccuracies on the use of the 16S rRNA gene as a taxonomic marker, despite its precision. Altogether, these results suggest that genomic coherence is an emergent property of genera in Bacteria and Archaea . IMPORTANCE In recent decades, the taxonomy of Bacteria and Archaea , and therefore genus designation, has been largely based on the use of a single ribosomal gene, the 16S rRNA gene, as a taxonomic marker. We propose an approach to delineate genera that excludes the direct use of the 16S rRNA gene and focuses on a standard genome relatedness index, the average nucleotide identity. Our findings are of importance to the microbiology community because the emergent properties of Bacteria and Archaea that are identified in this study will help assign genera with higher taxonomic resolution.« less
  3. Ho, Simon (Ed.)
    Abstract Whole-genome comparisons based on average nucleotide identities (ANI) and the genome-to-genome distance calculator have risen to prominence in rapidly classifying prokaryotic taxa using whole-genome sequences. Some implementations have even been proposed as a new standard in species classification and have become a common technique for papers describing newly sequenced genomes. However, attempts to apply whole-genome divergence data to the delineation of higher taxonomic units and to phylogenetic inference have had difficulty matching those produced by more complex phylogenetic methods. We present a novel method for generating statistically supported phylogenies of archaeal and bacterial groups using a combined ANI and alignment fraction-based metric. For the test cases to which we applied the developed approach, we obtained results comparable with other methodologies up to at least the family level. The developed method uses nonparametric bootstrapping to gauge support for inferred groups. This method offers the opportunity to make use of whole-genome comparison data, that is already being generated, to quickly produce phylogenies including support for inferred groups. Additionally, the developed ANI methodology can assist the classification of higher taxonomic groups.[Average nucleotide identity (ANI); genome evolution; prokaryotic species delineation; taxonomy.]
  4. Rao, Krishna (Ed.)
    ABSTRACT Gardnerella is a frequent member of the urogenital microbiota. Given the association between Gardnerella vaginalis and bacterial vaginosis (BV), significant efforts have been focused on characterizing this species in the vaginal microbiota. However, Gardnerella also is a frequent member of the urinary microbiota. In an effort to characterize the bacterial species of the urinary microbiota, we present here 10 genomes of urinary Gardnerella isolates from women with and without lower urinary tract symptoms. These genomes complement those of 22 urinary Gardnerella strains previously isolated and sequenced by our team. We included these genomes in a comparative genome analysis of all publicly available Gardnerella genomes, which include 33 urinary isolates, 78 vaginal isolates, and 2 other isolates. While once this genus was thought to consist of a single species, recent comparative genome analyses have revealed 3 new species and an additional 9 groups within Gardnerella . Based upon our analysis, we suggest a new group for the species. We also find that distinction between these Gardnerella species/groups is possible only when considering the core or whole-genome sequence, as neither the sialidase nor vaginolysin genes are sufficient for distinguishing between species/groups despite their clinical importance. In contrast to the vaginal microbiota,more »we found that only five Gardnerella species/groups have been detected within the lower urinary tract. Although we found no association between a particular Gardnerella species/group(s) and urinary symptoms, further sequencing of urinary Gardnerella isolates is needed for both comprehensive taxonomic characterization and etiological classification of Gardnerella in the urinary tract. IMPORTANCE Prior research into the bacterium Gardnerella vaginalis has largely focused on its association with bacterial vaginosis (BV). However, G. vaginalis is also frequently found within the urinary microbiota of women with and without lower urinary tract symptoms as well as individuals with chronic kidney disease, interstitial cystitis, and BV. This prompted our investigation into Gardnerella from the urinary microbiota and all publicly available Gardnerella genomes from the urogenital tract. Our work suggests that while some Gardnerella species can survive in both the urinary tract and vagina, others likely cannot. This study provides the foundation for future studies of Gardnerella within the urinary tract and its possible contribution to lower urinary tract symptoms.« less
  5. Background Viruses influence global patterns of microbial diversity and nutrient cycles. Though viral metagenomics (viromics), specifically targeting dsDNA viruses, has been critical for revealing viral roles across diverse ecosystems, its analyses differ in many ways from those used for microbes. To date, viromics benchmarking has covered read pre-processing, assembly, relative abundance, read mapping thresholds and diversity estimation, but other steps would benefit from benchmarking and standardization. Here we use in silico-generated datasets and an extensive literature survey to evaluate and highlight how dataset composition (i.e., viromes vs bulk metagenomes) and assembly fragmentation impact (i) viral contig identification tool, (ii) virus taxonomic classification, and (iii) identification and curation of auxiliary metabolic genes (AMGs). Results The in silico benchmarking of five commonly used virus identification tools show that gene-content-based tools consistently performed well for long (≥3 kbp) contigs, while k -mer- and blast-based tools were uniquely able to detect viruses from short (≤3 kbp) contigs. Notably, however, the performance increase of k -mer- and blast-based tools for short contigs was obtained at the cost of increased false positives (sometimes up to ∼5% for virome and ∼75% bulk samples), particularly when eukaryotic or mobile genetic element sequences were included in the test datasets.more »For viral classification, variously sized genome fragments were assessed using gene-sharing network analytics to quantify drop-offs in taxonomic assignments, which revealed correct assignations ranging from ∼95% (whole genomes) down to ∼80% (3 kbp sized genome fragments). A similar trend was also observed for other viral classification tools such as VPF-class, ViPTree and VIRIDIC, suggesting that caution is warranted when classifying short genome fragments and not full genomes. Finally, we highlight how fragmented assemblies can lead to erroneous identification of AMGs and outline a best-practices workflow to curate candidate AMGs in viral genomes assembled from metagenomes. Conclusion Together, these benchmarking experiments and annotation guidelines should aid researchers seeking to best detect, classify, and characterize the myriad viruses ‘hidden’ in diverse sequence datasets.« less