Background: With the advent of metagenomics, the importance of microorganisms and how their interactions are relevant to ecosystem resilience, sustainability, and human health has become evident. Cataloging and preserving biodiversity is para- mount not only for the Earth’s natural systems but also for discovering solutions to challenges that we face as a growing civilization. Metagenomics pertains to the in silico study of all microorganisms within an ecological community in situ, however, many software suites recover only prokaryotes and have limited to no support for viruses and eukaryotes. Results: In this study, we introduce the Viral Eukaryotic Bacterial Archaeal (VEBA) open- source software suite developed to recover genomes from all domains. To our knowl- edge, VEBA is the first end-to-end metagenomics suite that can directly recover, quality assess, and classify prokaryotic, eukaryotic, and viral genomes from metagenomes. VEBA implements a novel iterative binning procedure and hybrid sample-specific/ multi-sample framework that yields more genomes than any existing methodology alone. VEBA includes a consensus microeukaryotic database containing proteins from existing databases to optimize microeukaryotic gene modeling and taxonomic classifi- cation. VEBA also provides a unique clustering-based dereplication strategy allowing for sample-specific genomes and genes to be directly compared across non-overlapping biological samples. Finally, VEBA is the only pipeline that automates the detection of candidate phyla radiation bacteria and implements the appropriate genome quality assessments. VEBA’s capabilities are demonstrated by reanalyzing 3 existing public datasets which recovered a total of 948 MAGs (458 prokaryotic, 8 eukaryotic, and 482 viral) including several uncharacterized organisms and organisms with no public genome representatives. Conclusions: The VEBA software suite allows for the in silico recovery of microorgan- isms from all domains of life by integrating cutting edge algorithms in novel ways. VEBA fully integrates both end-to-end and task-specific metagenomic analysis in a modular architecture that minimizes dependencies and maximizes productivity. The contributions of VEBA to the metagenomics community includes seamless end-to-end metagenomics analysis but also provides users with the frexibility to perform specific analytical tasks. VEBA allos for the automation of several metagenomics steps and shows that new information can be recovered from existing datasets.
more »
« less
Unveiling the microbial realm with VEBA 2.0: a modular bioinformatics suite for end-to-end genome-resolved prokaryotic, (micro)eukaryotic and viral multi-omics from either short- or long-read sequencing
Abstract The microbiome is a complex community of microorganisms, encompassing prokaryotic (bacterial and archaeal), eukaryotic, and viral entities. This microbial ensemble plays a pivotal role in influencing the health and productivity of diverse ecosystems while shaping the web of life. However, many software suites developed to study microbiomes analyze only the prokaryotic community and provide limited to no support for viruses and microeukaryotes. Previously, we introduced the Viral Eukaryotic Bacterial Archaeal (VEBA) open-source software suite to address this critical gap in microbiome research by extending genome-resolved analysis beyond prokaryotes to encompass the understudied realms of eukaryotes and viruses. Here we present VEBA 2.0 with key updates including a comprehensive clustered microeukaryotic protein database, rapid genome/protein-level clustering, bioprospecting, non-coding/organelle gene modeling, genome-resolved taxonomic/pathway profiling, long-read support, and containerization. We demonstrate VEBA’s versatile application through the analysis of diverse case studies including marine water, Siberian permafrost, and white-tailed deer lung tissues with the latter showcasing how to identify integrated viruses. VEBA represents a crucial advancement in microbiome research, offering a powerful and accessible software suite that bridges the gap between genomics and biotechnological solutions.
more »
« less
- PAR ID:
- 10516544
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Nucleic Acids Research
- Volume:
- 52
- Issue:
- 14
- ISSN:
- 0305-1048
- Format(s):
- Medium: X Size: p. e63-e63
- Size(s):
- p. e63-e63
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Metagenomics has enabled sequencing of viral communities from a myriad of different environments. Viral metagenomic studies routinely uncover sequences with no recognizable homology to known coding regions or genomes. Nevertheless, complete viral genomes have been constructed directly from complex community metagenomes, often through tedious manual curation. To address this, we developed the software tool virMine to identify viral genomes from raw reads representative of viral or mixed (viral and bacterial) communities. virMine automates sequence read quality control, assembly, and annotation. Researchers can easily refine their search for a specific study system and/or feature(s) of interest. In contrast to other viral genome detection tools that often rely on the recognition of viral signature sequences, virMine is not restricted by the insufficient representation of viral diversity in public data repositories. Rather, viral genomes are identified through an iterative approach, first omitting non-viral sequences. Thus, both relatives of previously characterized viruses and novel species can be detected, including both eukaryotic viruses and bacteriophages. Here we present virMine and its analysis of synthetic communities as well as metagenomic data sets from three distinctly different environments: the gut microbiota, the urinary microbiota, and freshwater viromes. Several new viral genomes were identified and annotated, thus contributing to our understanding of viral genetic diversity in these three environments.more » « less
-
The microbiomes of tropical corals are actively studied using 16S rRNA gene amplicons to understand microbial roles in coral health, metabolism, and disease resistance. However, due to the prokaryotic origins of mitochondria, primers targeting bacterial and archaeal 16S rRNA genes may also amplify homologous 12S mitochondrial rRNA genes from the host coral, associated microbial eukaryotes, and encrusting organisms. Standard microbial bioinformatics pipelines attempt to identify and remove these sequences by comparing them to reference taxonomies. However, commonly used tools have severely under-annotated mitochondrial sequences in 1440 coral microbiomes from the Global Coral Microbiome Project, preventing annotation of over 95% of reads in some samples. This issue persists when using Greengenes or SILVA prokaryotic reference taxonomies, and in other hosts, including 16S studies of vertebrates, and of marine sponges. Worse, mitochondrial under-annotation varies between coral families and across coral compartments, biasing comparisons of - and -diversity. By supplementing existing reference taxonomies with over 3000 animal mitochondrial rRNA gene sequences, we resolved roughly 97% of unique unclassified sequences as mitochondrial. These additional sequences did not cause a false elevation in mitochondrial annotations in mock communities with known compositions. We recommend using these extended taxonomies for coral microbiome analysis and whenever eukaryotic contamination may be a concern.more » « less
-
null (Ed.)Abstract Although viruses in their natural habitats add up to less than 10% of the biomass, they contribute more than 90% of the genome sequences [1]. These viral sequences or ‘viromes’ encode viruses that populate the Earth’s oceans [2, 3] and terrestrial environments [4, 5], where their infections impact life across diverse ecological niches and scales [6, 7], including humans [8–10]. Most viruses have yet to be isolated and cultured [11–13], and surprisingly few efforts have explored what analysis of available data might reveal about their nature. Here, we compiled and analyzed seven decades of one-step growth and other data for viruses from six major families, including their infections of archaeal, bacterial and eukaryotic hosts [14–191]. We found that the use of host cell biomass for virus production was highest for archaea at 10%, followed by bacteria at 1% and eukarya at 0.01%, highlighting the degree to which viruses of archaea and bacteria exploit their host cells. For individual host cells, the yield of virus progeny spanned a relatively narrow range (10–1000 infectious particles per cell) compared with the million-fold difference in size between the smallest and largest cells. Furthermore, healthy and infected host cells were remarkably similar in the time they needed to multiply themselves or their virus progeny. Specifically, the doubling time of healthy cells and the delay time for virus release from infected cells were not only correlated (r = 0.71, p < 10−10, n = 101); they also spanned the same range from tens of minutes to about a week. These results have implications for better understanding the growth, spread and persistence of viruses in complex natural habitats that abound with diverse hosts, including humans and their associated microbes.more » « less
-
Abstract Lichen thalli are formed through the symbiotic association of a filamentous fungus and photosynthetic green alga and/or cyanobacterium. Recent studies have revealed lichens also host highly diverse communities of secondary fungal and bacterial symbionts, yet few studies have examined the viral component within these complex symbioses. Here, we describe viral biodiversity and functions in cyanolichens collected from across North America and Europe. As current machine-learning viral-detection tools are not trained on complex eukaryotic metagenomes, we first developed efficient methods to remove eukaryotic reads prior to viral detection and a custom pipeline to validate viral contigs predicted with three machine-learning methods. Our resulting high-quality viral data illustrate that every cyanolichen thallus contains diverse viruses that are distinct from viruses in other terrestrial ecosystems. In addition to cyanobacteria, predicted viral hosts include other lichen-associated bacterial lineages and algae, although a large fraction of viral contigs had no host prediction. Functional annotation of cyanolichen viral sequences predicts numerous viral-encoded auxiliary metabolic genes (AMGs) involved in amino acid, nucleotide, and carbohydrate metabolism, including AMGs for secondary metabolism (antibiotics and antimicrobials) and fatty acid biosynthesis. Overall, the diversity of cyanolichen AMGs suggests that viruses may alter microbial interactions within these complex symbiotic assemblages.more » « less