skip to main content


Title: Metagenome-Assembled Genomes of Bacteria Associated with Massospora cicadina Fungal Plugs from Infected Brood VIII Periodical Cicadas
ABSTRACT We report six metagenome-assembled genomes (MAGs) associated with Massospora cicadina strain MCPNR19 (ARSEF 14555), an obligate entomopathogenic fungus of periodical cicadas. The MAGs include representatives of Pantoea , Pseudomonas , Lactococcus , and one potential new Chryseobacterium species. Future research is needed to resolve the ecology of these MAGs and determine whether they represent symbionts or contaminants.  more » « less
Award ID(s):
1441715 1429826 2215705
NSF-PAR ID:
10392802
Author(s) / Creator(s):
; ; ;
Editor(s):
Rokas, Antonis
Date Published:
Journal Name:
Microbiology Resource Announcements
Volume:
11
Issue:
10
ISSN:
2576-098X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Fraser, Claire M. (Ed.)
    ABSTRACT

    Metagenomics is a powerful method for interpreting the ecological roles and physiological capabilities of mixed microbial communities. Yet, many tools for processing metagenomic data are neither designed to consider eukaryotes nor are they built for an increasing amount of sequence data. EukHeist is an automated pipeline to retrieve eukaryotic and prokaryotic metagenome-assembled genomes (MAGs) from large-scale metagenomic sequence data sets. We developed the EukHeist workflow to specifically process large amounts of both metagenomic and/or metatranscriptomic sequence data in an automated and reproducible fashion. Here, we applied EukHeist to the large-size fraction data (0.8–2,000 µm) from Tara Oceans to recover both eukaryotic and prokaryotic MAGs, which we refer to as TOPAZ (Tara Oceans Particle-Associated MAGs). The TOPAZ MAGs consisted of >900 environmentally relevant eukaryotic MAGs and >4,000 bacterial and archaeal MAGs. The bacterial and archaeal TOPAZ MAGs expand upon the phylogenetic diversity of likely particle- and host-associated taxa. We use these MAGs to demonstrate an approach to infer the putative trophic mode of the recovered eukaryotic MAGs. We also identify ecological cohorts of co-occurring MAGs, which are driven by specific environmental factors and putative host-microbe associations. These data together add to a number of growing resources of environmentally relevant eukaryotic genomic information. Complementary and expanded databases of MAGs, such as those provided through scalable pipelines like EukHeist, stand to advance our understanding of eukaryotic diversity through increased coverage of genomic representatives across the tree of life.

    IMPORTANCE

    Single-celled eukaryotes play ecologically significant roles in the marine environment, yet fundamental questions about their biodiversity, ecological function, and interactions remain. Environmental sequencing enables researchers to document naturally occurring protistan communities, without culturing bias, yet metagenomic and metatranscriptomic sequencing approaches cannot separate individual species from communities. To more completely capture the genomic content of mixed protistan populations, we can create bins of sequences that represent the same organism (metagenome-assembled genomes [MAGs]). We developed the EukHeist pipeline, which automates the binning of population-level eukaryotic and prokaryotic genomes from metagenomic reads. We show exciting insight into what protistan communities are present and their trophic roles in the ocean. Scalable computational tools, like EukHeist, may accelerate the identification of meaningful genetic signatures from large data sets and complement researchers’ efforts to leverage MAG databases for addressing ecological questions, resolving evolutionary relationships, and discovering potentially novel biodiversity.

     
    more » « less
  2. McBain, Andrew J. (Ed.)
    ABSTRACT The recovery of metagenome-assembled genomes (MAGs) from metagenomic data has recently become a common task for microbial studies. The strengths and limitations of the underlying bioinformatics algorithms are well appreciated by now based on performance tests with mock data sets of known composition. However, these mock data sets do not capture the complexity and diversity often observed within natural populations, since their construction typically relies on only a single genome of a given organism. Further, it remains unclear if MAGs can recover population-variable genes (those shared by >10% but <90% of the members of the population) as efficiently as core genes (those shared by >90% of the members). To address these issues, we compared the gene variabilities of pathogenic Escherichia coli isolates from eight diarrheal samples, for which the isolate was the causative agent, against their corresponding MAGs recovered from the companion metagenomic data set. Our analysis revealed that MAGs with completeness estimates near 95% captured only 77% of the population core genes and 50% of the variable genes, on average. Further, about 5% of the genes of these MAGs were conservatively identified as missing in the isolate and were of different (non- Enterobacteriaceae ) taxonomic origin, suggesting errors at the genome-binning step, even though contamination estimates based on commonly used pipelines were only 1.5%. Therefore, the quality of MAGs may often be worse than estimated, and we offer examples of how to recognize and improve such MAGs to sufficient quality by (for instance) employing only contigs longer than 1,000 bp for binning. IMPORTANCE Metagenome assembly and the recovery of metagenome-assembled genomes (MAGs) have recently become common tasks for microbiome studies across environmental and clinical settings. However, the extent to which MAGs can capture the genes of the population they represent remains speculative. Current approaches to evaluating MAG quality are limited to the recovery and copy number of universal housekeeping genes, which represent a small fraction of the total genome, leaving the majority of the genome essentially inaccessible. If MAG quality in reality is lower than these approaches would estimate, this could have dramatic consequences for all downstream analyses and interpretations. In this study, we evaluated this issue using an approach that employed comparisons of the gene contents of MAGs to the gene contents of isolate genomes derived from the same sample. Further, our samples originated from a diarrhea case-control study, and thus, our results are relevant for recovering the virulence factors of pathogens from metagenomic data sets. 
    more » « less
  3. null (Ed.)
    Microorganisms can potentially colonise volcanic rocks using the chemical energy in reduced gases such as methane, hydrogen (H2) and carbon monoxide (CO). In this study, we analysed soil metagenomes from Chilean volcanic soils, representing three different successional stages with ages of 380, 269 and 63 years, respectively. A total of 19 metagenome-assembled genomes (MAGs) were retrieved from all stages with a higher number observed in the youngest soil (1640: 2 MAGs, 1751: 1 MAG, 1957: 16 MAGs). Genomic similarity indices showed that several MAGs had amino-acid identity (AAI) values >50% to the phyla Actinobacteria, Acidobacteria, Gemmatimonadetes, Proteobacteria and Chloroflexi. Three MAGs from the youngest site (1957) belonged to the class Ktedonobacteria (Chloroflexi). Complete cellular functions of all the MAGs were characterised, including carbon fixation, terpenoid backbone biosynthesis, formate oxidation and CO oxidation. All 19 environmental genomes contained at least one gene encoding a putative carbon monoxide dehydrogenase (CODH). Three MAGs had form I coxL operon (encoding the large subunit CO-dehydrogenase). One of these MAGs (MAG-1957-2.1, Ktedonobacterales) was highly abundant in the youngest soil. MAG-1957-2.1 also contained genes encoding a [NiFe]-hydrogenase and hyp genes encoding accessory enzymes and proteins. Little is known about the Ktedonobacterales through cultivated isolates, but some species can utilise H2 and CO for growth. Our results strongly suggest that the remote volcanic sites in Chile represent a natural habitat for Ktedonobacteria and they may use reduced gases for growth. 
    more » « less
  4. Members of the archaeal order Caldarchaeales (previously the phylum Aigarchaeota) are poorly sampled and are represented in public databases by relatively few genomes. Additional representative genomes will help resolve their placement among all known members of Archaea and provide insights into their roles in the environment. In this study, we analyzed 16S rRNA gene amplicons belonging to the Caldarchaeales that are available in public databases, which demonstrated that archaea of the order Caldarchaeales are diverse, widespread, and most abundant in geothermal habitats. We also constructed five metagenome-assembled genomes (MAGs) of Caldarchaeales from two geothermal features to investigate their metabolic potential and phylogenomic position in the domain Archaea. Two of the MAGs were assembled from microbial community DNA extracted from fumarolic lava rocks from Mauna Ulu, Hawai‘i, and three were assembled from DNA obtained from hot spring sinters from the El Tatio geothermal field in Chile. MAGs from Hawai‘i are high quality bins with completeness > 95% and contamination < 1%, and one likely belongs to a novel species in a new genus recently discovered at a submarine volcano off New Zealand. MAGs from Chile have lower completeness levels ranging from 27 to 70%. Gene content of the MAGs revealed that these members of Caldarchaeales are likely metabolically versatile and exhibit the potential for both chemoorganotrophic and chemolithotrophic lifestyles. The wide array of metabolic capabilities exhibited by these members of Caldarchaeales might help them thrive under diverse harsh environmental conditions. All the MAGs except one from Chile harbor putative prophage regions encoding several auxiliary metabolic genes (AMGs) that may confer a fitness advantage on their Caldarchaeales hosts by increasing their metabolic potential and make them better adapted to new environmental conditions. Phylogenomic analysis of the five MAGs and over 3,000 representative archaeal genomes showed the order Caldarchaeales forms a monophyletic group that is sister to the clade comprising the orders Geothermarchaeales (previously Candidatus Geothermarchaeota), Conexivisphaerales and Nitrososphaerales (formerly known as Thaumarchaeota), supporting the status of Caldarchaeales members as a clade distinct from the Thaumarchaeota. 
    more » « less
  5. Abstract

    Pan-genome analyses of metagenome-assembled genomes (MAGs) may suffer from the known issues with MAGs: fragmentation, incompleteness and contamination. Here, we conducted a critical assessment of pan-genomics of MAGs, by comparing pan-genome analysis results of complete bacterial genomes and simulated MAGs. We found that incompleteness led to significant core gene (CG) loss. The CG loss remained when using different pan-genome analysis tools (Roary, BPGA, Anvi’o) and when using a mixture of MAGs and complete genomes. Contamination had little effect on core genome size (except for Roary due to in its gene clustering issue) but had major influence on accessory genomes. Importantly, the CG loss was partially alleviated by lowering the CG threshold and using gene prediction algorithms that consider fragmented genes, but to a less degree when incompleteness was higher than 5%. The CG loss also led to incorrect pan-genome functional predictions and inaccurate phylogenetic trees. Our main findings were supported by a study of real MAG-isolate genome data. We conclude that lowering CG threshold and predicting genes in metagenome mode (as Anvi’o does with Prodigal) are necessary in pan-genome analysis of MAGs. Development of new pan-genome analysis tools specifically for MAGs are needed in future studies.

     
    more » « less