Abstract Dinoflagellates from the family Symbiodiniaceae are phototrophic marine protists that engage in symbiosis with diverse hosts. Their large and distinct genomes are characterized by pervasive gene duplication and large-scale retroposition events. However, little is known about the role and scale of horizontal gene transfer (HGT) in the evolution of this algal family. In other dinoflagellates, high levels of HGTs have been observed, linked to major genomic transitions, such as the appearance of a viral-acquired nucleoprotein that originated via HGT from a large DNA algal virus. Previous work showed that Symbiodiniaceae from different hosts are actively infected by viral groups, such as giant DNA viruses and ssRNA viruses, that may play an important role in coral health. Latent viral infections may also occur, whereby viruses could persist in the cytoplasm or integrate into the host genome as a provirus. This hypothesis received experimental support; however, the cellular localization of putative latent viruses and their taxonomic affiliation are still unknown. In addition, despite the finding of viral sequences in some genomes of Symbiodiniaceae, viral origin, taxonomic breadth, and metabolic potential have not been explored. To address these questions, we searched for putative viral-derived proteins in thirteen Symbiodiniaceae genomes. We found fifty-nine candidate viral-derived HGTs that gave rise to twelve phylogenies across ten genomes. We also describe the taxonomic affiliation of these virus-related sequences, their structure, and their genomic context. These results lead us to propose a model to explain the origin and fate of Symbiodiniaceae viral acquisitions.
more »
« less
Automated classification of giant virus genomes using a random forest model built on trademark protein families
Abstract Viruses of the phylumNucleocytoviricota, often referred to as “giant viruses,” are prevalent in various environments around the globe and play significant roles in shaping eukaryotic diversity and activities in global ecosystems. Given the extensive phylogenetic diversity within this viral group and the highly complex composition of their genomes, taxonomic classification of giant viruses, particularly incomplete metagenome-assembled genomes (MAGs) can present a considerable challenge. Here we developed TIGTOG (TaxonomicInformation ofGiant viruses usingTrademarkOrthologousGroups), a machine learning-based approach to predict the taxonomic classification of novel giant virus MAGs based on profiles of protein family content. We applied a random forest algorithm to a training set of 1531 quality-checked, phylogenetically diverseNucleocytoviricotagenomes using pre-selected sets of giant virus orthologous groups (GVOGs). The classification models were predictive of viral taxonomic assignments with a cross-validation accuracy of 99.6% at the order level and 97.3% at the family level. We found that no individual GVOGs or genome features significantly influenced the algorithm’s performance or the models’ predictions, indicating that classification predictions were based on a comprehensive genomic signature, which reduced the necessity of a fixed set of marker genes for taxonomic assigning purposes. Our classification models were validated with an independent test set of 823 giant virus genomes with varied genomic completeness and taxonomy and demonstrated an accuracy of 98.6% and 95.9% at the order and family level, respectively. Our results indicate that protein family profiles can be used to accurately classify large DNA viruses at different taxonomic levels and provide a fast and accurate method for the classification of giant viruses. This approach could easily be adapted to other viral groups.
more »
« less
- Award ID(s):
- 2141862
- PAR ID:
- 10494642
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- npj Viruses
- Volume:
- 2
- Issue:
- 1
- ISSN:
- 2948-1767
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Bordenstein, Seth (Ed.)ABSTRACT Viruses belonging to the Nucleocytoviricota phylum are globally distributed and include members with notably large genomes and complex functional repertoires. Recent studies have shown that these viruses are particularly diverse and abundant in marine systems, but the magnitude of actively replicating Nucleocytoviricota present in ocean habitats remains unclear. In this study, we compiled a curated database of 2,431 Nucleocytoviricota genomes and used it to examine the gene expression of these viruses in a 2.5-day metatranscriptomic time-series from surface waters of the California Current. We identified 145 viral genomes with high levels of gene expression, including 90 Imitervirales and 49 Algavirales viruses. In addition to recovering high expression of core genes involved in information processing that are commonly expressed during viral infection, we also identified transcripts of diverse viral metabolic genes from pathways such as glycolysis, the TCA cycle, and the pentose phosphate pathway, suggesting that virus-mediated reprogramming of central carbon metabolism is common in oceanic surface waters. Surprisingly, we also identified viral transcripts with homology to actin, myosin, and kinesin domains, suggesting that viruses may use these gene products to manipulate host cytoskeletal dynamics during infection. We performed phylogenetic analysis on the virus-encoded myosin and kinesin proteins, which demonstrated that most belong to deep-branching viral clades, but that others appear to have been acquired from eukaryotes more recently. Our results highlight a remarkable diversity of active Nucleocytoviricota in a coastal marine system and underscore the complex functional repertoires expressed by these viruses during infection. IMPORTANCE The discovery of giant viruses has transformed our understanding of viral complexity. Although viruses have traditionally been viewed as filterable infectious agents that lack metabolism, giant viruses can reach sizes rivalling cellular lineages and possess genomes encoding central metabolic processes. Recent studies have shown that giant viruses are widespread in aquatic systems, but the activity of these viruses and the extent to which they reprogram host physiology in situ remains unclear. Here, we show that numerous giant viruses consistently express central metabolic enzymes in a coastal marine system, including components of glycolysis, the TCA cycle, and other pathways involved in nutrient homeostasis. Moreover, we found expression of several viral-encoded actin, myosin, and kinesin genes, indicating viral manipulation of the host cytoskeleton during infection. Our study reveals a high activity of giant viruses in a coastal marine system and indicates they are a diverse and underappreciated component of microbial diversity in the ocean.more » « less
-
null (Ed.)The family Asfarviridae is a group of nucleo-cytoplasmic large DNA viruses (NCLDVs) of which African swine fever virus (ASFV) is well-characterized. Recently the discovery of several Asfarviridae members other than ASFV has suggested that this family represents a diverse and cosmopolitan group of viruses, but the genomics and distribution of this family have not been studied in detail. To this end we analyzed five complete genomes and 35 metagenome-assembled genomes (MAGs) of viruses from this family to shed light on their evolutionary relationships and environmental distribution. The Asfarvirus MAGs derive from diverse marine, freshwater, and terrestrial habitats, underscoring the broad environmental distribution of this family. We present phylogenetic analyses using conserved marker genes and whole-genome comparison of pairwise average amino acid identity (AAI) values, revealing a high level of genomic divergence across disparate Asfarviruses. Further, we found that Asfarviridae genomes encode genes with diverse predicted metabolic roles and detectable sequence homology to proteins in bacteria, archaea, and eukaryotes, highlighting the genomic chimerism that is a salient feature of NCLDV. Our read mapping from Tara oceans metagenomic data also revealed that three Asfarviridae MAGs were present in multiple marine samples, indicating that they are widespread in the ocean. In one of these MAGs we identified four marker genes with > 95% AAI to genes sequenced from a virus that infects the dinoflagellate Heterocapsa circularisquama (HcDNAV). This suggests a potential host for this MAG, which would thereby represent a reference genome of a dinoflagellate-infecting giant virus. Together, these results show that Asfarviridae are ubiquitous, comprise similar sequence divergence as other NCLDV families, and include several members that are widespread in the ocean and potentially infect ecologically important protists.more » « less
-
Since the discovery of the first “giant virus,” particular attention has been paid toward isolating and culturing these large DNA viruses throughAcanthamoebaspp. bait systems. While this method has allowed for the discovery of plenty novel viruses in theNucleocytoviricota, environmental -omics-based analyses have shown that there is a wealth of diversity among this phylum, particularly in marine datasets. The prevalence of these viruses in metatranscriptomes points toward their ecological importance in nutrient turnover in our oceans and as such, in depth study into non-amoebalNucleocytoviricotashould be considered a focal point in viral ecology. In this review, we report onKratosvirus quantuckense(née Aureococcus anophagefferens Virus), an algae-infecting virus of theImitervirales. Current systems for study in theNucleocytoviricotadiffer significantly from this virus and its relatives, and a litany of trade-offs within physiology, coding potential, and ecology compared to these other viruses reveal the importance ofK. quantuckense. Herein, we review the research that has been performed on this virus as well as its potential as a model system for algal-virus interactions.more » « less
-
Scarpino, Samuel V (Ed.)Viruses of microbes are ubiquitous biological entities that reprogram their hosts’ metabolisms during infection in order to produce viral progeny, impacting the ecology and evolution of microbiomes with broad implications for human and environmental health. Advances in genome sequencing have led to the discovery of millions of novel viruses and an appreciation for the great diversity of viruses on Earth. Yet, with knowledge of only“who is there?”we fall short in our ability to infer the impacts of viruses on microbes at population, community, and ecosystem-scales. To do this, we need a more explicit understanding“who do they infect?”Here, we developed a novel machine learning model (ML), Virus-Host Interaction Predictor (VHIP), to predict virus-host interactions (infection/non-infection) from input virus and host genomes. This ML model was trained and tested on a high-value manually curated set of 8849 virus-host pairs and their corresponding sequence data. The resulting dataset, ‘Virus Host Range network’ (VHRnet), is core to VHIP functionality. Each data point that underlies the VHIP training and testing represents a lab-tested virus-host pair in VHRnet, from which meaningful signals of viral adaptation to host were computed from genomic sequences. VHIP departs from existing virus-host prediction models in its ability to predict multiple interactions rather than predicting a single most likely host or host clade. As a result, VHIP is able to infer the complexity of virus-host networks in natural systems. VHIP has an 87.8% accuracy rate at predicting interactions between virus-host pairs at the species level and can be applied to novel viral and host population genomes reconstructed from metagenomic datasets.more » « less