skip to main content


Title: Critical assessment of pan-genomic analysis of metagenome-assembled genomes
Abstract

Pan-genome analyses of metagenome-assembled genomes (MAGs) may suffer from the known issues with MAGs: fragmentation, incompleteness and contamination. Here, we conducted a critical assessment of pan-genomics of MAGs, by comparing pan-genome analysis results of complete bacterial genomes and simulated MAGs. We found that incompleteness led to significant core gene (CG) loss. The CG loss remained when using different pan-genome analysis tools (Roary, BPGA, Anvi’o) and when using a mixture of MAGs and complete genomes. Contamination had little effect on core genome size (except for Roary due to in its gene clustering issue) but had major influence on accessory genomes. Importantly, the CG loss was partially alleviated by lowering the CG threshold and using gene prediction algorithms that consider fragmented genes, but to a less degree when incompleteness was higher than 5%. The CG loss also led to incorrect pan-genome functional predictions and inaccurate phylogenetic trees. Our main findings were supported by a study of real MAG-isolate genome data. We conclude that lowering CG threshold and predicting genes in metagenome mode (as Anvi’o does with Prodigal) are necessary in pan-genome analysis of MAGs. Development of new pan-genome analysis tools specifically for MAGs are needed in future studies.

 
more » « less
Award ID(s):
1933521
NSF-PAR ID:
10372003
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Briefings in Bioinformatics
Volume:
23
Issue:
6
ISSN:
1467-5463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. McBain, Andrew J. (Ed.)
    ABSTRACT The recovery of metagenome-assembled genomes (MAGs) from metagenomic data has recently become a common task for microbial studies. The strengths and limitations of the underlying bioinformatics algorithms are well appreciated by now based on performance tests with mock data sets of known composition. However, these mock data sets do not capture the complexity and diversity often observed within natural populations, since their construction typically relies on only a single genome of a given organism. Further, it remains unclear if MAGs can recover population-variable genes (those shared by >10% but <90% of the members of the population) as efficiently as core genes (those shared by >90% of the members). To address these issues, we compared the gene variabilities of pathogenic Escherichia coli isolates from eight diarrheal samples, for which the isolate was the causative agent, against their corresponding MAGs recovered from the companion metagenomic data set. Our analysis revealed that MAGs with completeness estimates near 95% captured only 77% of the population core genes and 50% of the variable genes, on average. Further, about 5% of the genes of these MAGs were conservatively identified as missing in the isolate and were of different (non- Enterobacteriaceae ) taxonomic origin, suggesting errors at the genome-binning step, even though contamination estimates based on commonly used pipelines were only 1.5%. Therefore, the quality of MAGs may often be worse than estimated, and we offer examples of how to recognize and improve such MAGs to sufficient quality by (for instance) employing only contigs longer than 1,000 bp for binning. IMPORTANCE Metagenome assembly and the recovery of metagenome-assembled genomes (MAGs) have recently become common tasks for microbiome studies across environmental and clinical settings. However, the extent to which MAGs can capture the genes of the population they represent remains speculative. Current approaches to evaluating MAG quality are limited to the recovery and copy number of universal housekeeping genes, which represent a small fraction of the total genome, leaving the majority of the genome essentially inaccessible. If MAG quality in reality is lower than these approaches would estimate, this could have dramatic consequences for all downstream analyses and interpretations. In this study, we evaluated this issue using an approach that employed comparisons of the gene contents of MAGs to the gene contents of isolate genomes derived from the same sample. Further, our samples originated from a diarrhea case-control study, and thus, our results are relevant for recovering the virulence factors of pathogens from metagenomic data sets. 
    more » « less
  2. Giovannoni, Stephen J. (Ed.)
    ABSTRACT <p>Archaea belonging to the DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, and Nanohaloarchaeota) superphylum have been found in an expanding number of environments and perform a variety of biogeochemical roles, including contributing to carbon, sulfur, and nitrogen cycling. Generally characterized by ultrasmall cell sizes and reduced genomes, DPANN archaea may form mutualistic, commensal, or parasitic interactions with various archaeal and bacterial hosts, influencing the ecology and functioning of microbial communities. While DPANN archaea reportedly comprise a sizeable fraction of the archaeal community within marine oxygen-deficient zone (ODZ) water columns, little is known about their metabolic capabilities in these ecosystems. We report 33 novel metagenome-assembled genomes (MAGs) belonging to the DPANN phyla Nanoarchaeota, Pacearchaeota, Woesearchaeota, Undinarchaeota, Iainarchaeota, and SpSt-1190 from pelagic ODZs in the Eastern Tropical North Pacific and the Arabian Sea. We find these archaea to be permanent, stable residents of all three major ODZs only within anoxic depths, comprising up to 1% of the total microbial community and up to 25%–50% of archaea as estimated from read mapping to MAGs. ODZ DPANN appear to be capable of diverse metabolic functions, including fermentation, organic carbon scavenging, and the cycling of sulfur, hydrogen, and methane. Within a majority of ODZ DPANN, we identify a gene homologous to nitrous oxide reductase. Modeling analyses indicate the feasibility of a nitrous oxide reduction metabolism for host-attached symbionts, and the small genome sizes and reduced metabolic capabilities of most DPANN MAGs suggest host-associated lifestyles within ODZs.</p></sec> <sec><title>IMPORTANCE

    Archaea from the DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, and Nanohaloarchaeota) superphylum have diverse metabolic capabilities and participate in multiple biogeochemical cycles. While metagenomics and enrichments have revealed that many DPANN are characterized by ultrasmall genomes, few biosynthetic genes, and episymbiotic lifestyles, much remains unknown about their biology. We report 33 new DPANN metagenome-assembled genomes originating from the three global marine oxygen-deficient zones (ODZs), the first from these regions. We survey DPANN abundance and distribution within the ODZ water column, investigate their biosynthetic capabilities, and report potential roles in the cycling of organic carbon, methane, and nitrogen. We test the hypothesis that nitrous oxide reductases found within several ODZ DPANN genomes may enable ultrasmall episymbionts to serve as nitrous oxide consumers when attached to a host nitrous oxide producer. Our results indicate DPANN archaea as ubiquitous residents within the anoxic core of ODZs with the potential to produce or consume key compounds.

     
    more » « less
  3. Abstract

    Synechococcus are the most abundant cyanobacteria in high latitude regions and are responsible for an estimated 17% of annual marine net primary productivity. Despite their biogeochemical importance, Synechococcus populations have been unevenly sampled across the ocean, with most studies focused on low-latitude strains. In particular, the near absence of Synechococcus genomes from high-latitude, High Nutrient Low Chlorophyll (HNLC) regions leaves a gap in our knowledge of picocyanobacterial adaptations to iron limitation and their influence on carbon, nitrogen, and iron cycles. We examined Synechococcus populations from the subarctic North Pacific, a well-characterized HNLC region, with quantitative metagenomics. Assembly with short and long reads produced two near complete Synechococcus metagenome-assembled genomes (MAGs). Quantitative metagenome-derived abundances of these populations matched well with flow cytometry counts, and the Synechococcus MAGs were estimated to comprise >99% of the Synechococcus at Station P. Whereas the Station P Synechococcus MAGs contained multiple genes for adaptation to iron limitation, both genomes lacked genes for uptake and assimilation of nitrate and nitrite, suggesting a dependence on ammonium, urea, and other forms of recycled nitrogen leading to reduced iron requirements. A global analysis of Synechococcus nitrate reductase abundance in the TARA Oceans dataset found nitrate assimilation genes are also lower in other HNLC regions. We propose that nitrate and nitrite assimilation gene loss in Synechococcus may represent an adaptation to severe iron limitation in high-latitude regions where ammonium availability is higher. Our findings have implications for models that quantify the contribution of cyanobacteria to primary production and subsequent carbon export.

     
    more » « less
  4. Abstract Background Tropical members of the sponge genus Ircinia possess highly complex microbiomes that perform a broad spectrum of chemical processes that influence host fitness. Despite the pervasive role of microbiomes in Ircinia biology, it is still unknown how they remain in stable association across tropical species. To address this question, we performed a comparative analysis of the microbiomes of 11 Ircinia species using whole-metagenomic shotgun sequencing data to investigate three aspects of bacterial symbiont genomes—the redundancy in metabolic pathways across taxa, the evolution of genes involved in pathogenesis, and the nature of selection acting on genes relevant to secondary metabolism. Results A total of 424 new, high-quality bacterial metagenome-assembled genomes (MAGs) were produced for 10 Caribbean Ircinia species, which were evaluated alongside 113 publicly available MAGs sourced from the Pacific species Ircinia ramosa . Evidence of redundancy was discovered in that the core genes of several primary metabolic pathways could be found in the genomes of multiple bacterial taxa. Across hosts, the metagenomes were depleted in genes relevant to pathogenicity and enriched in eukaryotic-like proteins (ELPs) that likely mimic the hosts’ molecular patterning. Finally, clusters of steroid biosynthesis genes (CSGs), which appear to be under purifying selection and undergo horizontal gene transfer, were found to be a defining feature of Ircinia metagenomes. Conclusions These results illustrate patterns of genome evolution within highly complex microbiomes that illuminate how associations with hosts are maintained. The metabolic redundancy within the microbiomes could help buffer the hosts from changes in the ambient chemical and physical regimes and from fluctuations in the population sizes of the individual microbial strains that make up the microbiome. Additionally, the enrichment of ELPs and depletion of LPS and cellular motility genes provide a model for how alternative strategies to virulence can evolve in microbiomes undergoing mixed-mode transmission that do not ultimately result in higher levels of damage (i.e., pathogenicity) to the host. Our last set of results provides evidence that sterol biosynthesis in Ircinia -associated bacteria is widespread and that these molecules are important for the survival of bacteria in highly complex Ircinia microbiomes. 
    more » « less
  5. ABSTRACT Ammonia availability due to chloramination can promote the growth of nitrifying organisms, which can deplete chloramine residuals and result in operational problems for drinking water utilities. In this study, we used a metagenomic approach to determine the identity and functional potential of microorganisms involved in nitrogen biotransformation within chloraminated drinking water reservoirs. Spatial changes in the nitrogen species included an increase in nitrate concentrations accompanied by a decrease in ammonium concentrations with increasing distance from the site of chloramination. This nitrifying activity was likely driven by canonical ammonia-oxidizing bacteria (i.e., Nitrosomonas ) and nitrite-oxidizing bacteria (i.e., Nitrospira ) as well as by complete-ammonia-oxidizing (i.e., comammox) Nitrospira -like bacteria. Functional annotation was used to evaluate genes associated with nitrogen metabolism, and the community gene catalogue contained mostly genes involved in nitrification, nitrate and nitrite reduction, and nitric oxide reduction. Furthermore, we assembled 47 high-quality metagenome-assembled genomes (MAGs) representing a highly diverse assemblage of bacteria. Of these, five MAGs showed high coverage across all samples, which included two Nitrosomonas, Nitrospira, Sphingomonas , and Rhizobiales -like MAGs. Systematic genome-level analyses of these MAGs in relation to nitrogen metabolism suggest that under ammonia-limited conditions, nitrate may be also reduced back to ammonia for assimilation. Alternatively, nitrate may be reduced to nitric oxide and may potentially play a role in regulating biofilm formation. Overall, this study provides insight into the microbial communities and their nitrogen metabolism and, together with the water chemistry data, improves our understanding of nitrogen biotransformation in chloraminated drinking water distribution systems. IMPORTANCE Chloramines are often used as a secondary disinfectant when free chlorine residuals are difficult to maintain. However, chloramination is often associated with the undesirable effect of nitrification, which results in operational problems for many drinking water utilities. The introduction of ammonia during chloramination provides a potential source of nitrogen either through the addition of excess ammonia or through chloramine decay. This promotes the growth of nitrifying microorganisms and provides a nitrogen source (i.e., nitrate) for the growth for other organisms. While the roles of canonical ammonia-oxidizing and nitrite-oxidizing bacteria in chloraminated drinking water systems have been extensively investigated, those studies have largely adopted a targeted gene-centered approach. Further, little is known about the potential long-term cooccurrence of complete-ammonia-oxidizing (i.e., comammox) bacteria and the potential metabolic synergies of nitrifying organisms with their heterotrophic counterparts that are capable of denitrification and nitrogen assimilation. This study leveraged data obtained for genome-resolved metagenomics over a time series to show that while nitrifying bacteria are dominant and likely to play a major role in nitrification, their cooccurrence with heterotrophic organisms suggests that nitric oxide production and nitrate reduction to ammonia may also occur in chloraminated drinking water systems. 
    more » « less