skip to main content


Title: Data Proliferation, Reconciliation, and Synthesis in Viral Ecology
Abstract The fields of viral ecology and evolution are rapidly expanding, motivated in part by concerns around emerging zoonoses. One consequence is the proliferation of host–virus association data, which underpin viral macroecology and zoonotic risk prediction but remain fragmented across numerous data portals. In the present article, we propose that synthesis of host–virus data is a central challenge to characterize the global virome and develop foundational theory in viral ecology. To illustrate this, we build an open database of mammal host–virus associations that reconciles four published data sets. We show that this offers a substantially richer view of the known virome than any individual source data set but also that databases such as these risk becoming out of date as viral discovery accelerates. We argue for a shift in practice toward the development, incremental updating, and use of synthetic data sets in viral ecology, to improve replicability and facilitate work to predict the structure and dynamics of the global virome.  more » « less
Award ID(s):
2021909
NSF-PAR ID:
10312500
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
BioScience
Volume:
71
Issue:
11
ISSN:
0006-3568
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Pickett, Brett E. ; Jurado, Kellie (Ed.)
    ABSTRACT Data that catalogue viral diversity on Earth have been fragmented across sources, disciplines, formats, and various degrees of open sharing, posing challenges for research on macroecology, evolution, and public health. Here, we solve this problem by establishing a dynamically maintained database of vertebrate-virus associations, called The Global Virome in One Network (VIRION). The VIRION database has been assembled through both reconciliation of static data sets and integration of dynamically updated databases. These data sources are all harmonized against one taxonomic backbone, including metadata on host and virus taxonomic validity and higher classification; additional metadata on sampling methodology and evidence strength are also available in a harmonized format. In total, the VIRION database is the largest open-source, open-access database of its kind, with roughly half a million unique records that include 9,521 resolved virus “species” (of which 1,661 are ICTV ratified), 3,692 resolved vertebrate host species, and 23,147 unique interactions between taxonomically valid organisms. Together, these data cover roughly a quarter of mammal diversity, a 10th of bird diversity, and ∼6% of the estimated total diversity of vertebrates, and a much larger proportion of their virome than any previous database. We show how these data can be used to test hypotheses about microbiology, ecology, and evolution and make suggestions for best practices that address the unique mix of evidence that coexists in these data. IMPORTANCE Animals and their viruses are connected by a sprawling, tangled network of species interactions. Data on the host-virus network are available from several sources, which use different naming conventions and often report metadata in different levels of detail. VIRION is a new database that combines several of these existing data sources, reconciles taxonomy to a single consistent backbone, and reports metadata in a format designed by and for virologists. Researchers can use VIRION to easily answer questions like “Can any fish viruses infect humans?” or “Which bats host coronaviruses?” or to build more advanced predictive models, making it an unprecedented step toward a full inventory of the global virome. 
    more » « less
  2. Host-virus association data underpin research into the distribution and eco-evolutionary correlates of viral diversity and zoonotic risk across host species. However, current knowledge of the wildlife virome is inherently constrained by historical discovery effort, and there are concerns that the reliability of ecological inference from host-virus data may be undermined by taxonomic and geographical sampling biases. Here, we evaluate whether current estimates of host-level viral diversity in wild mammals are stable enough to be considered biologically meaningful, by analysing a comprehensive dataset of discovery dates of 6571 unique mammal host-virus associations between 1930 and 2018. We show that virus discovery rates in mammal hosts are either constant or accelerating, with little evidence of declines towards viral richness asymptotes, even in highly sampled hosts. Consequently, inference of relative viral richness across host species has been unstable over time, particularly in bats, where intensified surveillance since the early 2000s caused a rapid rearrangement of species' ranked viral richness. Our results illustrate that comparative inference of host-level virus diversity across mammals is highly sensitive to even short-term changes in sampling effort. We advise caution to avoid overinterpreting patterns in current data, since it is feasible that an analysis conducted today could draw quite different conclusions than one conducted only a decade ago. 
    more » « less
  3. null (Ed.)
    Sea cucumbers (Holothuroidea; Echinodermata) are ecologically significant constituents of benthic marine habitats. We surveilled RNA viruses inhabiting eight species (representing four families) of holothurian collected from four geographically distinct locations by viral metagenomics, including a single specimen of Apostichopus californicus affected by a hitherto undocumented wasting disease. The RNA virome comprised genome fragments of both single-stranded positive sense and double stranded RNA viruses, including those assigned to the Picornavirales, Ghabrivirales, and Amarillovirales. We discovered an unconventional flavivirus genome fragment which was most similar to a shark virus. Ghabivirales-like genome fragments were most similar to fungal totiviruses in both genome architecture and homology and had likely infected mycobiome constituents. Picornavirales, which are commonly retrieved in host-associated viral metagenomes, were similar to invertebrate transcriptome-derived picorna-like viruses. The greatest number of viral genome fragments was recovered from the wasting A. californicus library compared to the asymptomatic A. californicus library. However, reads from the asymptomatic library recruited to nearly all recovered wasting genome fragments, suggesting that they were present but not well represented in the grossly normal specimen. These results expand the known host range of flaviviruses and suggest that fungi and their viruses may play a role in holothurian ecology. 
    more » « less
  4. Abstract Background

    Insects are an important reservoir of viral biodiversity, but the vast majority of viruses associated with insects have not been discovered. Recent studies have employed high-throughput RNA sequencing, which has led to rapid advances in our understanding of insect viral diversity. However, insect genomes frequently contain transcribed endogenous viral elements (EVEs) with significant homology to exogenous viruses, complicating the use of RNAseq for viral discovery.

    Methods

    In this study, we used a multi-pronged sequencing approach to study the virome of an important agricultural pest and prolific vector of plant pathogens, the potato aphidMacrosiphum euphorbiae. We first used rRNA-depleted RNAseq to characterize the microbes found in individual insects. We then used PCR screening to measure the frequency of two heritable viruses in a local aphid population. Lastly, we generated a quality draft genome assembly forM. euphorbiaeusing Illumina-corrected Nanopore sequencing to identify transcriptionally active EVEs in the host genome.

    Results

    We found reads from two insect-specific viruses (aFlavivirusand anAmbidensovirus) in our RNAseq data, as well as a parasitoid virus (Bracovirus), a plant pathogenic virus (Tombusvirus), and two phages (Acinetobacter and APSE). However, our genome assembly showed that part of the ‘virome’ of this insect can be attributed to EVEs in the host genome.

    Conclusion

    Our work shows that EVEs have led to the misidentification of aphid viruses from RNAseq data, and we argue that this is a widespread challenge for the study of viral diversity in insects.

     
    more » « less
  5. Background

    Viruses strongly influence microbial population dynamics and ecosystem functions. However, our ability to quantitatively evaluate those viral impacts is limited to the few cultivated viruses and double-stranded DNA (dsDNA) viral genomes captured in quantitative viral metagenomes (viromes). This leaves the ecology of non-dsDNA viruses nearly unknown, including single-stranded DNA (ssDNA) viruses that have been frequently observed in viromes, but not quantified due to amplification biases in sequencing library preparations (Multiple Displacement Amplification, Linker Amplification or Tagmentation).

    Methods

    Here we designed mock viral communities including both ssDNA and dsDNA viruses to evaluate the capability of a sequencing library preparation approach including an Adaptase step prior to Linker Amplification for quantitative amplification of both dsDNA and ssDNA templates. We then surveyed aquatic samples to provide first estimates of the abundance of ssDNA viruses.

    Results

    Mock community experiments confirmed the biased nature of existing library preparation methods for ssDNA templates (either largely enriched or selected against) and showed that the protocol using Adaptase plus Linker Amplification yielded viromes that were ±1.8-fold quantitative for ssDNA and dsDNA viruses. Application of this protocol to community virus DNA from three freshwater and three marine samples revealed that ssDNA viruses as a whole represent only a minor fraction (<5%) of DNA virus communities, though individual ssDNA genomes, both eukaryote-infecting Circular Rep-Encoding Single-Stranded DNA (CRESS-DNA) viruses and bacteriophages from theMicroviridaefamily, can be among the most abundant viral genomes in a sample.

    Discussion

    Together these findings provide empirical data for a new virome library preparation protocol, and a first estimate of ssDNA virus abundance in aquatic systems.

     
    more » « less