skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 5:00 PM ET until 11:00 PM ET on Friday, June 21 due to maintenance. We apologize for the inconvenience.

This content will become publicly available on December 19, 2024

Title: Eukaryotic genomes from a global metagenomic data set illuminate trophic modes and biogeography of ocean plankton

Metagenomics is a powerful method for interpreting the ecological roles and physiological capabilities of mixed microbial communities. Yet, many tools for processing metagenomic data are neither designed to consider eukaryotes nor are they built for an increasing amount of sequence data. EukHeist is an automated pipeline to retrieve eukaryotic and prokaryotic metagenome-assembled genomes (MAGs) from large-scale metagenomic sequence data sets. We developed the EukHeist workflow to specifically process large amounts of both metagenomic and/or metatranscriptomic sequence data in an automated and reproducible fashion. Here, we applied EukHeist to the large-size fraction data (0.8–2,000 µm) from Tara Oceans to recover both eukaryotic and prokaryotic MAGs, which we refer to as TOPAZ (Tara Oceans Particle-Associated MAGs). The TOPAZ MAGs consisted of >900 environmentally relevant eukaryotic MAGs and >4,000 bacterial and archaeal MAGs. The bacterial and archaeal TOPAZ MAGs expand upon the phylogenetic diversity of likely particle- and host-associated taxa. We use these MAGs to demonstrate an approach to infer the putative trophic mode of the recovered eukaryotic MAGs. We also identify ecological cohorts of co-occurring MAGs, which are driven by specific environmental factors and putative host-microbe associations. These data together add to a number of growing resources of environmentally relevant eukaryotic genomic information. Complementary and expanded databases of MAGs, such as those provided through scalable pipelines like EukHeist, stand to advance our understanding of eukaryotic diversity through increased coverage of genomic representatives across the tree of life.


Single-celled eukaryotes play ecologically significant roles in the marine environment, yet fundamental questions about their biodiversity, ecological function, and interactions remain. Environmental sequencing enables researchers to document naturally occurring protistan communities, without culturing bias, yet metagenomic and metatranscriptomic sequencing approaches cannot separate individual species from communities. To more completely capture the genomic content of mixed protistan populations, we can create bins of sequences that represent the same organism (metagenome-assembled genomes [MAGs]). We developed the EukHeist pipeline, which automates the binning of population-level eukaryotic and prokaryotic genomes from metagenomic reads. We show exciting insight into what protistan communities are present and their trophic roles in the ocean. Scalable computational tools, like EukHeist, may accelerate the identification of meaningful genetic signatures from large data sets and complement researchers’ efforts to leverage MAG databases for addressing ecological questions, resolving evolutionary relationships, and discovering potentially novel biodiversity.

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ;
Fraser, Claire M.
Publisher / Repository:
Date Published:
Journal Name:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The family Asfarviridae is a group of nucleo-cytoplasmic large DNA viruses (NCLDVs) of which African swine fever virus (ASFV) is well-characterized. Recently the discovery of several Asfarviridae members other than ASFV has suggested that this family represents a diverse and cosmopolitan group of viruses, but the genomics and distribution of this family have not been studied in detail. To this end we analyzed five complete genomes and 35 metagenome-assembled genomes (MAGs) of viruses from this family to shed light on their evolutionary relationships and environmental distribution. The Asfarvirus MAGs derive from diverse marine, freshwater, and terrestrial habitats, underscoring the broad environmental distribution of this family. We present phylogenetic analyses using conserved marker genes and whole-genome comparison of pairwise average amino acid identity (AAI) values, revealing a high level of genomic divergence across disparate Asfarviruses. Further, we found that Asfarviridae genomes encode genes with diverse predicted metabolic roles and detectable sequence homology to proteins in bacteria, archaea, and eukaryotes, highlighting the genomic chimerism that is a salient feature of NCLDV. Our read mapping from Tara oceans metagenomic data also revealed that three Asfarviridae MAGs were present in multiple marine samples, indicating that they are widespread in the ocean. In one of these MAGs we identified four marker genes with > 95% AAI to genes sequenced from a virus that infects the dinoflagellate Heterocapsa circularisquama (HcDNAV). This suggests a potential host for this MAG, which would thereby represent a reference genome of a dinoflagellate-infecting giant virus. Together, these results show that Asfarviridae are ubiquitous, comprise similar sequence divergence as other NCLDV families, and include several members that are widespread in the ocean and potentially infect ecologically important protists. 
    more » « less
  2. Abstract Background

    With the advent of metagenomics, the importance of microorganisms and how their interactions are relevant to ecosystem resilience, sustainability, and human health has become evident. Cataloging and preserving biodiversity is paramount not only for the Earth’s natural systems but also for discovering solutions to challenges that we face as a growing civilization. Metagenomics pertains to the in silico study of all microorganisms within an ecological community in situ,however, many software suites recover only prokaryotes and have limited to no support for viruses and eukaryotes.


    In this study, we introduce theViral Eukaryotic Bacterial Archaeal(VEBA) open-source software suite developed to recover genomes from all domains. To our knowledge,VEBAis the first end-to-end metagenomics suite that can directly recover, quality assess, and classify prokaryotic, eukaryotic, and viral genomes from metagenomes.VEBAimplements a novel iterative binning procedure and hybrid sample-specific/multi-sample framework that yields more genomes than any existing methodology alone.VEBAincludes a consensus microeukaryotic database containing proteins from existing databases to optimize microeukaryotic gene modeling and taxonomic classification.VEBAalso provides a unique clustering-based dereplication strategy allowing for sample-specific genomes and genes to be directly compared across non-overlapping biological samples. Finally,VEBAis the only pipeline that automates the detection of candidate phyla radiation bacteria and implements the appropriate genome quality assessments.VEBA’s capabilities are demonstrated by reanalyzing 3 existing public datasets which recovered a total of 948 MAGs (458 prokaryotic, 8 eukaryotic, and 482 viral) including several uncharacterized organisms and organisms with no public genome representatives.


    TheVEBAsoftware suite allows for the in silico recovery of microorganisms from all domains of life by integrating cutting edge algorithms in novel ways.VEBAfully integrates both end-to-end and task-specific metagenomic analysis in a modular architecture that minimizes dependencies and maximizes productivity. The contributions ofVEBAto the metagenomics community includes seamless end-to-end metagenomics analysis but also provides users with the flexibility to perform specific analytical tasks.VEBAallows for the automation of several metagenomics steps and shows that new information can be recovered from existing datasets.

    more » « less
  3. Moran, Mary Ann (Ed.)
    ABSTRACT The genomes of Asgard Archaea , a novel archaeal proposed superphylum, share an enriched repertoire of eukaryotic signature genes and thus promise to provide insights into early eukaryote evolution. However, the distribution, metabolisms, cellular structures, and ecology of the members within this superphylum are not well understood. Here we provide a meta-analysis of the environmental distribution of the Asgard archaea, based on available 16S rRNA gene sequences. Metagenome sequencing of samples from a salt-crusted lagoon on the Baja California Peninsula of Mexico allowed the assembly of a new Thorarchaeota and three Lokiarchaeota genomes. Comparative analyses of all known Lokiarchaeota and Thorarchaeota genomes revealed overlapping genome content, including central carbon metabolism. Members of both groups contained putative reductive dehalogenase genes, suggesting that these organisms might be able to metabolize halogenated organic compounds. Unlike the first report on Lokiarchaeota , we identified genes encoding glycerol-1-phosphate dehydrogenase in all Loki - and Thorarchaeota genomes, suggesting that these organisms are able to synthesize bona fide archaeal lipids with their characteristic glycerol stereochemistry. IMPORTANCE Microorganisms of the superphylum Asgard Archaea are considered to be the closest living prokaryotic relatives of eukaryotes (including plants and animals) and thus promise to give insights into the early evolution of more complex life forms. However, very little is known about their biology as none of the organisms has yet been cultivated in the laboratory. Here we report on the ecological distribution of Asgard Archaea and on four newly sequenced genomes of the Lokiarchaeota and Thorarchaeota lineages that give insight into possible metabolic features that might eventually help to identify these enigmatic groups of archaea in the environment and to culture them. 
    more » « less
  4. Hird, Sarah M. (Ed.)
    The gut microbiome provides vital functions for mammalian hosts, yet research on its variability and function across adult life spans and multiple generations is limited in large mammalian carnivores. Here, we used 16S rRNA gene and metagenomic high-throughput sequencing to profile the bacterial taxonomic composition, genomic diversity, and metabolic function of fecal samples collected from 12 wild spotted hyenas ( Crocuta crocuta ) residing in the Masai Mara National Reserve, Kenya, over a 23-year period spanning three generations. The metagenomic data came from four of these hyenas and spanned two 2-year periods. With these data, we determined the extent to which host factors predicted variation in the gut microbiome and identified the core microbes present in the guts of hyenas. We also investigated novel genomic diversity in the mammalian gut by reporting the first metagenome-assembled genomes (MAGs) for hyenas. We found that gut microbiome taxonomic composition varied temporally, but despite this, a core set of 14 bacterial genera were identified. The strongest predictors of the microbiome were host identity and age, suggesting that hyenas possess individualized microbiomes and that these may change with age during adulthood. The gut microbiome functional profiles of the four adult hyenas were also individual specific and were associated with prey abundance, indicating that the functions of the gut microbiome vary with host diet. We recovered 149 high-quality MAGs from the hyenas’ guts; some MAGs were classified as taxa previously reported for other carnivores, but many were novel and lacked species-level matches to genomes in existing reference databases. IMPORTANCE There is a gap in knowledge regarding the genomic diversity and variation of the gut microbiome across a host’s life span and across multiple generations of hosts in wild mammals. Using two types of sequencing approaches, we found that although gut microbiomes were individualized and temporally variable among hyenas, they correlated similarly to large-scale changes in the ecological conditions experienced by their hosts. We also recovered 149 high-quality MAGs from the hyena gut, greatly expanding the microbial genome repertoire known for hyenas, carnivores, and wild mammals in general. Some MAGs came from genera abundant in the gastrointestinal tracts of canid species and other carnivores, but over 80% of MAGs were novel and from species not previously represented in genome databases. Collectively, our novel body of work illustrates the importance of surveying the gut microbiome of nonmodel wild hosts, using multiple sequencing methods and computational approaches and at distinct scales of analysis. 
    more » « less
  5. McMahon, Katherine (Ed.)
    ABSTRACT Mobile genetic elements (MGEs) drive bacterial evolution, alter gene availability within microbial communities, and facilitate adaptation to ecological niches. In natural systems, bacteria simultaneously possess or encounter multiple MGEs, yet their combined influences on microbial communities are poorly understood. Here, we investigate interactions among MGEs in the marine bacterium Sulfitobacter pontiacus . Two related strains, CB-D and CB-A, each harbor a single prophage. These prophages share high sequence identity with one another and an integration site within the host genome, yet these strains exhibit differences in “spontaneous” prophage induction (SPI) and consequent fitness. To better understand mechanisms underlying variation in SPI between these lysogens, we closed their genomes, which revealed that in addition to harboring different prophage genotypes, CB-A lacks two of the four large, low-copy-number plasmids possessed by CB-D. To assess the relative roles of plasmid content versus prophage genotype on host physiology, a panel of derivative strains varying in MGE content were generated. Characterization of these derivatives revealed a robust link between plasmid content and SPI, regardless of prophage genotype. Strains possessing all four plasmids had undetectable phage in cell-free lysates, while strains lacking either one plasmid (pSpoCB-1) or a combination of two plasmids (pSpoCB-2 and pSpoCB-4) produced high (>10 5 PFU/mL) phage titers. Homologous plasmid sequences were identified in related bacteria, and plasmid and phage genes were found to be widespread in Tara Oceans metagenomic data sets. This suggests that plasmid-dependent stabilization of prophages may be commonplace throughout the oceans. IMPORTANCE The consequences of prophage induction on the physiology of microbial populations are varied and include enhanced biofilm formation, conferral of virulence, and increased opportunity for horizontal gene transfer. These traits lead to competitive advantages for lysogenized bacteria and influence bacterial lifestyles in a variety of niches. However, biological controls of “spontaneous” prophage induction, the initiation of phage replication and phage-mediated cell lysis without an overt stressor, are not well understood. In this study, we observed a novel interaction between plasmids and prophages in the marine bacterium Sulfitobacter pontiacus . We found that loss of one or more distinct plasmids—which we show carry genes ubiquitous in the world’s oceans—resulted in a marked increase in prophage induction within lysogenized strains. These results demonstrate cross talk between different mobile genetic elements and have implications for our understanding of the lysogenic-lytic switches of prophages found not only in marine environments, but throughout all ecosystems. 
    more » « less