skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Viral DB for Madracis Project
Pooled viral genomes and genome fragments from coral and seawater metagenomes collected in Curaçao. Viral contigs were identified by Genomad and Vibrant and binned by VRhyme. Viral bins are N-linked with 1000 Ns between contigs belonging to the same bin. All viral genomes were dereplicated at MIUViG standards (95% ANI over 80% AF) with CheckV's rapid genome clustering based on pairwise ANI. All viruses which had no viral genes and were reported to have a CheckV quality of “Not-determined” were removed from this database.  more » « less
Award ID(s):
2424579
PAR ID:
10646028
Author(s) / Creator(s):
Publisher / Repository:
figshare
Date Published:
Subject(s) / Keyword(s):
Sequence analysis Genomics and transcriptomics
Format(s):
Medium: X Size: 109887080 Bytes
Size(s):
109887080 Bytes
Sponsoring Org:
National Science Foundation
More Like this
  1. Echinoderms are a phylum of marine invertebrates that include model organisms, keystone species, and animals commercially harvested for seafood. Despite their scientific, ecological, and economic importance, there is little known about the diversity of RNA viruses that infect echinoderms compared to other invertebrates. We screened over 900 transcriptomes and viral metagenomes to characterize the RNA virome of 38 echinoderm species from all five classes (Crinoidea, Holothuroidea, Asteroidea, Ophiuroidea and Echinoidea). We identified 347 viral genome fragments that were classified to genera and families within nine viral orders - Picornavirales, Durnavirales, Martellivirales, Nodamuvirales, Reovirales, Amarillovirales, Ghabrivirales, Mononegavirales, and Hepelivirales . We compared the relative viral representation across three life stages (embryo, larvae, adult) and characterized the gene content of contigs which encoded complete or near-complete genomes. The proportion of viral reads in a given transcriptome was not found to significantly differ between life stages though the majority of viral contigs were discovered from transcriptomes of adult tissue. This study illuminates the biodiversity of RNA viruses from echinoderms, revealing the occurrence of viral groups in natural populations. 
    more » « less
  2. null (Ed.)
    Background Viruses influence global patterns of microbial diversity and nutrient cycles. Though viral metagenomics (viromics), specifically targeting dsDNA viruses, has been critical for revealing viral roles across diverse ecosystems, its analyses differ in many ways from those used for microbes. To date, viromics benchmarking has covered read pre-processing, assembly, relative abundance, read mapping thresholds and diversity estimation, but other steps would benefit from benchmarking and standardization. Here we use in silico-generated datasets and an extensive literature survey to evaluate and highlight how dataset composition (i.e., viromes vs bulk metagenomes) and assembly fragmentation impact (i) viral contig identification tool, (ii) virus taxonomic classification, and (iii) identification and curation of auxiliary metabolic genes (AMGs). Results The in silico benchmarking of five commonly used virus identification tools show that gene-content-based tools consistently performed well for long (≥3 kbp) contigs, while k -mer- and blast-based tools were uniquely able to detect viruses from short (≤3 kbp) contigs. Notably, however, the performance increase of k -mer- and blast-based tools for short contigs was obtained at the cost of increased false positives (sometimes up to ∼5% for virome and ∼75% bulk samples), particularly when eukaryotic or mobile genetic element sequences were included in the test datasets. For viral classification, variously sized genome fragments were assessed using gene-sharing network analytics to quantify drop-offs in taxonomic assignments, which revealed correct assignations ranging from ∼95% (whole genomes) down to ∼80% (3 kbp sized genome fragments). A similar trend was also observed for other viral classification tools such as VPF-class, ViPTree and VIRIDIC, suggesting that caution is warranted when classifying short genome fragments and not full genomes. Finally, we highlight how fragmented assemblies can lead to erroneous identification of AMGs and outline a best-practices workflow to curate candidate AMGs in viral genomes assembled from metagenomes. Conclusion Together, these benchmarking experiments and annotation guidelines should aid researchers seeking to best detect, classify, and characterize the myriad viruses ‘hidden’ in diverse sequence datasets. 
    more » « less
  3. Cooper, Vaughn S (Ed.)
    ABSTRACT Despite the importance of intra-species variants of viruses for causing disease and/or disrupting ecosystem functioning, there is no universally applicable standard to define these. A (natural) gap in whole-genome average nucleotide identity (ANI) values around 95% is commonly used to define species, especially for bacteriophages, but whether a similar gap exists within species that can be used to define intra-species units has not been evaluated yet. Whole-genome comparisons among members of 1,016 bacteriophage (Caudoviricetes) species revealed a region of low frequency of ANI values around 99.2%–99.8%, showing threefold or fewer pairs than expected for an even distribution. This second gap is prevalent in viruses infecting various cultured or uncultured hosts from a variety of environments, although a few exceptions to this pattern were also observed (3.7% of total species) and are likely attributed to cultivation biases or other factors. Similar results were observed for a limited set of eukaryotic viruses that are adequately sampled, including SARS-CoV-2, whose ANI-based clusters matched well with the WHO-defined variants of concern, indicating that our findings from bacteriophages might be more broadly applicable and the ANI-based clusters may represent functionally and/or ecologically distinct units. These units appear to be predominantly driven by (high) ecological cohesiveness coupled to either frequent recombination for bacteriophages or selection and clonal evolution for other viruses such as SARS-CoV-2, indicating that fundamentally different underlying mechanisms could lead to similar diversity patterns. Accordingly, we propose the ANI gap approach outlined above for defining viral intra-species units, for which we propose the term genomovars. IMPORTANCEViral species are composed of an ensemble of intra-species variants whose individual dynamics may have major implications for human and animal health and/or ecosystem functioning. However, the lack of universally accepted standards to define these intra-species variants has led researchers to use different approaches for this task, creating inconsistent intra-species units across different viral families and confusion in communication. By comparing hundreds of mostly bacteriophage genomes, we show that there is a widely distributed natural gap in whole-genome average nucleotide identity values in most, but not all, of these species that can be used to define intra-species units. Therefore, these results advance the molecular toolbox for tracking viral intra-species units and should facilitate future epidemiological and environmental studies. 
    more » « less
  4. Jouline, Igor B (Ed.)
    ABSTRACT Large-scale surveys of prokaryotic communities (metagenomes), as well as isolate genomes, have revealed that their diversity is predominantly organized in sequence-discrete units that may be equated to species. Specifically, genomes of the same species commonly show genome-aggregate average nucleotide identity (ANI) >95% among themselves and ANI <90% to members of other species, while genomes showing ANI 90%–95% are comparatively rare. However, it remains unclear if such “discontinuities” or gaps in ANI values can be observed within species and thus used to advance and standardize intra-species units. By analyzing 18,123 complete isolate genomes from 330 bacterial species with at least 10 genome representatives each and available long-read metagenomes, we show that another discontinuity exists between 99.2% and 99.8% (midpoint 99.5%) ANI in most of these species. The 99.5% ANI threshold is largely consistent with how sequence types have been defined in previous epidemiological studies but provides clusters with ~20% higher accuracy in terms of evolutionary and gene-content relatedness of the grouped genomes, while strains should be consequently defined at higher ANI values (>99.99% proposed). Collectively, our results should facilitate future micro-diversity studies across clinical or environmental settings because they provide a more natural definition of intra-species units of diversity. IMPORTANCEBacterial strains and clonal complexes are two cornerstone concepts for microbiology that remain loosely defined, which confuses communication and research. Here we identify a natural gap in genome sequence comparisons among isolate genomes of all well-sequenced species that has gone unnoticed so far and could be used to more accurately and precisely define these and related concepts compared to current methods. These findings advance the molecular toolbox for accurately delineating and following the important units of diversity within prokaryotic species and thus should greatly facilitate future epidemiological and micro-diversity studies across clinical and environmental settings. 
    more » « less
  5. NA- but there is a DOI 10.1038/s41564-018-0225-4 (Ed.)
    Because of their agricultural value, there is a great body of research dedicated to understanding the microorganisms responsible for rumen carbon degradation. However, we lack a holistic view of the microbial food web responsible for carbon processing in this ecosystem. Here, we sampled rumen-fistulated moose, allowing access to rumen microbial communities actively degrading woody plant biomass in real time. We resolved 1,193 viral contigs and 77 unique, near-complete microbial metagenome-assembled genomes, many of which lacked previous metabolic insights. Plant-derived metabolites were measured with NMR and carbohydrate microarrays to quantify the carbon nutrient landscape. Network analyses directly linked measured metabolites to expressed proteins from these unique metagenome-assembled genomes, revealing a genome-resolved three tiered carbohydrate-fuelled trophic system. This provided a glimpse into microbial specialization into functional guilds defined by specific metabolites. To validate our proteomic inferences, the catalytic activity of a polysaccharide utilization locus from a highly connected metabolic hub genome was confirmed using heterologous gene expression. Viral detected proteins and linkages to microbial hosts demonstrated that phage are active controllers of rumen ecosystem function. Our findings elucidate the microbial and viral members, as well as their metabolic interdependencies, that support in situ carbon degradation in the rumen ecosystem. 
    more » « less