Abstract BackgroundScientists have amassed a wealth of microbiome datasets, making it possible to study microbes in biotic and abiotic systems on a population or planetary scale; however, this potential has not been fully realized given that the tools, datasets, and computation are available in diverse repositories and locations. To address this challenge, we developed iMicrobe.us, a community-driven microbiome data marketplace and tool exchange for users to integrate their own data and tools with those from the broader community. FindingsThe iMicrobe platform brings together analysis tools and microbiome datasets by leveraging National Science Foundation–supported cyberinfrastructure and computing resources from CyVerse, Agave, and XSEDE. The primary purpose of iMicrobe is to provide users with a freely available, web-based platform to (1) maintain and share project data, metadata, and analysis products, (2) search for related public datasets, and (3) use and publish bioinformatics tools that run on highly scalable computing resources. Analysis tools are implemented in containers that encapsulate complex software dependencies and run on freely available XSEDE resources via the Agave API, which can retrieve datasets from the CyVerse Data Store or any web-accessible location (e.g., FTP, HTTP). ConclusionsiMicrobe promotes data integration, sharing, and community-driven tool development by making open source data and tools accessible to the research community in a web-based platform.
more »
« less
iVirus 2.0: Cyberinfrastructure-supported tools and data to power DNA virus ecology
Abstract Microbes drive myriad ecosystem processes, but under strong influence from viruses. Because studying viruses in complex systems requires different tools than those for microbes, they remain underexplored. To combat this, we previously aggregated double-stranded DNA (dsDNA) virus analysis capabilities and resources into ‘iVirus’ on the CyVerse collaborative cyberinfrastructure. Here we substantially expand iVirus’s functionality and accessibility, to iVirus 2.0, as follows. First, core iVirus apps were integrated into the Department of Energy’s Systems Biology KnowledgeBase (KBase) to provide an additional analytical platform. Second, at CyVerse, 20 software tools (apps) were upgraded or added as new tools and capabilities. Third, nearly 20-fold more sequence reads were aggregated to capture new data and environments. Finally, documentation, as “live” protocols, was updated to maximize user interaction with and contribution to infrastructure development. Together, iVirus 2.0 serves as a uniquely central and accessible analytical platform for studying how viruses, particularly dsDNA viruses, impact diverse microbial ecosystems.
more »
« less
- Award ID(s):
- 1759874
- PAR ID:
- 10383857
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- ISME Communications
- Volume:
- 1
- Issue:
- 1
- ISSN:
- 2730-6151
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Climate change is disproportionately warming northern peatlands, which may release large carbon stores via increased microbial activity. While there are many unknowns about such microbial responses, virus roles are especially poorly characterized with studies to date largely restricted to “bycatch” from bulk metagenomes. Here, we used optimized viral particle purification techniques on 20 samples along a highly contextualized peatland permafrost thaw gradient, extracted and sequenced viral particle DNA using two library kits to capture single-stranded (ssDNA) and double-stranded (dsDNA) virus genomes (40 total viromes), and explored their diversity and potential ecosystem impacts. Both kits recovered similar dsDNA virus numbers, but only one also captured thousands of ssDNA viruses. Combining these data, we explored population-level ecology using genomic representation from 9,560 viral operational taxonomic units (vOTUs); nearly a 4-fold expansion from permafrost-associated soils, and 97% of which were novel when compared against large datasets from soils, oceans, and the human gut.In silicopredictions identified putative hosts for 44% (4,149 dsDNA + 17 ssDNA) of the identified vOTUs spanning 2 eukaryotic, 12 archaeal, and 30 bacterial phyla. The recovered vOTUs encoded 1,684 putative auxiliary metabolic genes (AMGs) and other metabolic genes carried by ∼10% of detected vOTUs, of which 46% were related to carbon processing and 644 were novel. These AMGs grouped into five functional categories and 11 subcategories, and nearly half (47%) of the AMGs were involved in carbon utilization. Of these, 112 vOTUs encoded 123 glycoside hydrolases spanning 15 types involved in the degradation of polysaccharides (e.g., cellulose) to monosaccharides (e.g., galactose), or further monosaccharide degradation, which suggests virus involvement in myriad metabolisms including fermentation and central carbon metabolism. These findings expand the scope of viral roles in microbial carbon processing and suggest viruses may be critical for understanding the fate of soil organic carbon in peatlands.more » « less
-
Abstract CRISPR-Cas12a can induce nonspecific trans-cleavage of dsDNA substrate, including long and stable λ DNA. However, the mechanism behind this is still largely undetermined. In this study, we observed that while trans-activated Cas12a didn’t cleave blunt-end dsDNA within a short reaction time, it could degrade dsDNA reporters with a short overhang. More interestingly, we discovered that the location of the overhang also affected the susceptibility of dsDNA substrate to trans-activated Cas12a. Cas12a trans-cleaved 3′ overhang dsDNA substrates at least 3 times faster than 5′ overhang substrates. We attributed this unique preference of overhang location to the directional trans-cleavage behavior of Cas12a, which may be governed by RuvC and Nuc domains. Utilizing this new finding, we designed a new hybrid DNA reporter as nonoptical substrate for the CRISPR-Cas12a detection platform, which sensitively detected ssDNA targets at sub picomolar level. This study not only unfolded new insight into the trans-cleavage behavior of Cas12a but also demonstrated a sensitive CRISPR-Cas12a assay by using a hybrid dsDNA reporter molecule.more » « less
-
null (Ed.)Background Viruses influence global patterns of microbial diversity and nutrient cycles. Though viral metagenomics (viromics), specifically targeting dsDNA viruses, has been critical for revealing viral roles across diverse ecosystems, its analyses differ in many ways from those used for microbes. To date, viromics benchmarking has covered read pre-processing, assembly, relative abundance, read mapping thresholds and diversity estimation, but other steps would benefit from benchmarking and standardization. Here we use in silico-generated datasets and an extensive literature survey to evaluate and highlight how dataset composition (i.e., viromes vs bulk metagenomes) and assembly fragmentation impact (i) viral contig identification tool, (ii) virus taxonomic classification, and (iii) identification and curation of auxiliary metabolic genes (AMGs). Results The in silico benchmarking of five commonly used virus identification tools show that gene-content-based tools consistently performed well for long (≥3 kbp) contigs, while k -mer- and blast-based tools were uniquely able to detect viruses from short (≤3 kbp) contigs. Notably, however, the performance increase of k -mer- and blast-based tools for short contigs was obtained at the cost of increased false positives (sometimes up to ∼5% for virome and ∼75% bulk samples), particularly when eukaryotic or mobile genetic element sequences were included in the test datasets. For viral classification, variously sized genome fragments were assessed using gene-sharing network analytics to quantify drop-offs in taxonomic assignments, which revealed correct assignations ranging from ∼95% (whole genomes) down to ∼80% (3 kbp sized genome fragments). A similar trend was also observed for other viral classification tools such as VPF-class, ViPTree and VIRIDIC, suggesting that caution is warranted when classifying short genome fragments and not full genomes. Finally, we highlight how fragmented assemblies can lead to erroneous identification of AMGs and outline a best-practices workflow to curate candidate AMGs in viral genomes assembled from metagenomes. Conclusion Together, these benchmarking experiments and annotation guidelines should aid researchers seeking to best detect, classify, and characterize the myriad viruses ‘hidden’ in diverse sequence datasets.more » « less
-
null (Ed.)Abstract Background Viruses are a significant player in many biosphere and human ecosystems, but most signals remain “hidden” in metagenomic/metatranscriptomic sequence datasets due to the lack of universal gene markers, database representatives, and insufficiently advanced identification tools. Results Here, we introduce VirSorter2, a DNA and RNA virus identification tool that leverages genome-informed database advances across a collection of customized automatic classifiers to improve the accuracy and range of virus sequence detection. When benchmarked against genomes from both isolated and uncultivated viruses, VirSorter2 uniquely performed consistently with high accuracy (F1-score > 0.8) across viral diversity, while all other tools under-detected viruses outside of the group most represented in reference databases (i.e., those in the order Caudovirales ). Among the tools evaluated, VirSorter2 was also uniquely able to minimize errors associated with atypical cellular sequences including eukaryotic genomes and plasmids. Finally, as the virosphere exploration unravels novel viral sequences, VirSorter2’s modular design makes it inherently able to expand to new types of viruses via the design of new classifiers to maintain maximal sensitivity and specificity. Conclusion With multi-classifier and modular design, VirSorter2 demonstrates higher overall accuracy across major viral groups and will advance our knowledge of virus evolution, diversity, and virus-microbe interaction in various ecosystems. Source code of VirSorter2 is freely available ( https://bitbucket.org/MAVERICLab/virsorter2 ), and VirSorter2 is also available both on bioconda and as an iVirus app on CyVerse ( https://de.cyverse.org/de ).more » « less