Metagenomics studies microbial genomes in an ecosystem such as the gastrointestinal tract of a human. Identification of novel microbial species and quantification of their distributional variations among different samples that are sequenced using next-generation-sequencing technology hold the key to the success of most metagenomic studies. To achieve these goals, we propose a simple yet powerful metagenomic binning method, MetaBMF. The method does not require prior knowledge of reference genomes and produces highly accurate results, even at a strain level. Thus, it can be broadly used to identify disease-related microbial organisms that are not well-studied.
Mathematically, we count the number of mapped reads on each assembled genomic fragment cross different samples as our input matrix and propose a scalable stratified angle regression algorithm to factorize this count matrix into a product of a binary matrix and a nonnegative matrix. The binary matrix can be used to separate microbial species and the nonnegative matrix quantifies the species distributions in different samples. In simulation and empirical studies, we demonstrate that MetaBMF has a high binning accuracy. It can not only bin DNA fragments accurately at a species level but also at a strain level. As shown in our example, we can accurately more »
The software is available at https://github.com/didi10384/MetaBMF.
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Oxford University Press
- Sponsoring Org:
- National Science Foundation
More Like this
Gralnick, Jeffrey A. (Ed.)ABSTRACT Reconstructing microbial genomes from metagenomic short-read data can be challenging due to the unknown and uneven complexity of microbial communities. This complexity encompasses highly diverse populations, which often includes strain variants. Reconstructing high-quality genomes is a crucial part of the metagenomic workflow, as subsequent ecological and metabolic inferences depend on their accuracy, quality, and completeness. In contrast to microbial communities in other ecosystems, there has been no systematic assessment of genome-centric metagenomic workflows for drinking water microbiomes. In this study, we assessed the performance of a combination of assembly and binning strategies for time series drinking water metagenomes that were collected over 6 months. The goal of this study was to identify the combination of assembly and binning approaches that result in high-quality and -quantity metagenome-assembled genomes (MAGs), representing most of the sequenced metagenome. Our findings suggest that the metaSPAdes coassembly strategies had the best performance, as they resulted in larger and less fragmented assemblies, with at least 85% of the sequence data mapping to contigs greater than 1 kbp. Furthermore, a combination of metaSPAdes coassembly strategies and MetaBAT2 produced the highest number of medium-quality MAGs while capturing at least 70% of the metagenomes based on read recruitment. Utilizing different assembly/binningmore »
Metagenomic binning aims to retrieve microbial genomes directly from ecosystems by clustering metagenomic contigs assembled from short reads into draft genomic bins. Traditional shotgun-based binning methods depend on the contigs’ composition and abundance profiles and are impaired by the paucity of enough samples to construct reliable co-abundance profiles. When applied to a single sample, shotgun-based binning methods struggle to distinguish closely related species only using composition information. As an alternative binning approach, Hi-C-based binning employs metagenomic Hi-C technique to measure the proximity contacts between metagenomic fragments. However, spurious inter-species Hi-C contacts inevitably generated by incorrect ligations of DNA fragments between species link the contigs from varying genomes, weakening the purity of final draft genomic bins. Therefore, it is imperative to develop a binning pipeline to overcome the shortcomings of both types of binning methods on a single sample.
We develop HiFine, a novel binning pipeline to refine the binning results of metagenomic contigs by integrating both Hi-C-based and shotgun-based binning tools. HiFine designs a strategy of fragmentation for the original bin sets derived from the Hi-C-based and shotgun-based binning methods, which considerably increases the purity of initial bins, followed by merging fragmented bins and recruiting unbinned contigs. We demonstratemore »
Availability and implementation
HiFine is available at https://github.com/dyxstat/HiFine.
Supplementary data are available at Bioinformatics online.
Iterative subtractive binning of freshwater chronoseries metagenomes identifies over 400 novel species and their ecologic preferences
Recent advances in sequencing technology and bioinformatic pipelines have allowed unprecedented access to the genomes of yet‐uncultivated microorganisms from diverse environments. However, the catalogue of freshwater genomes remains limited, and most genome recovery attempts in freshwater ecosystems have only targeted specific taxa. Here, we present a genome recovery pipeline incorporating iterative subtractive binning, and apply it to a time series of 100 metagenomic datasets from seven connected lakes and estuaries along the Chattahoochee River (Southeastern USA). Our set of metagenome‐assembled genomes (MAGs) represents >400 yet‐unnamed genomospecies, substantially increasing the number of high‐quality MAGs from freshwater lakes. We propose names for two novel species: ‘
CandidatusElulimicrobium humile’ (‘ Ca. Elulimicrobiota’, ‘Patescibacteria’) and ‘ CandidatusAquidulcis frankliniae’ (‘Chloroflexi’). Collectively, our MAGs represented about half of the total microbial community at any sampling point. To evaluate the prevalence of these genomospecies in the chronoseries, we introduce methodologies to estimate relative abundance and habitat preference that control for uneven genome quality and sample representation. We demonstrate high degrees of habitat‐specialization and endemicity for most genomospecies in the Chattahoochee lakes. Wider ecological ranges characterized smaller genomes with higher coding densities, indicating an overall advantage of smaller, more compact genomes for cosmopolitan distributions.
Exploring metagenomic contigs and “binning” them into metagenome-assembled genomes (MAGs) are essential for the delineation of functional and evolutionary guilds within microbial communities. Despite the advances in automated binning algorithms, their capabilities in recovering MAGs with accuracy and biological relevance are so far limited. Researchers often find that human involvement is necessary to achieve representative binning results. This manual process however is expertise demanding and labor intensive, and it deserves to be supported by software infrastructure.
We present BinaRena, a comprehensive and versatile graphic interface dedicated to aiding human operators to explore metagenome assemblies via customizable visualization and to associate contigs with bins. Contigs are rendered as an interactive scatter plot based on various data types, including sequence metrics, coverage profiles, taxonomic assignments, and functional annotations. Various contig-level operations are permitted, such as selection, masking, highlighting, focusing, and searching. Binning plans can be conveniently edited, inspected, and compared visually or using metrics including silhouette coefficient and adjusted Rand index. Completeness and contamination of user-selected contigs can be calculated in real time.
In demonstration of BinaRena’s usability, we show that it facilitated biological pattern discovery, hypothesis generation, and bin refinement in a complex tropical peatland metagenome. It enabled isolation of pathogenicmore »
BinaRena is an installation-free, dependency-free, client-end web application that operates directly in any modern web browser, facilitating ease of deployment and accessibility for researchers of all skill levels. The program is hosted at
https://github.com/qiyunlab/binarena, together with documentation, tutorials, example data, and a live demo. It effectively supports human researchers in intuitive interpretation and fine tuning of metagenomic data.
Decoding diversity in a coral reef fish species complex with restricted range using metagenomic sequencing of gut contents
Identification of the processes that generate and maintain species diversity within the same region can provide insight into biogeographic patterns at broader spatiotemporal scales. Hawkfishes in the genus
Paracirrhitesare a unique taxon to explore with respect to niche differentiation, exhibiting diagnostic differences in coloration, and an apparent center of distribution outside of the Indo–Malay–Philippine (IMP) biodiversity hotspot for coral reef fishes. Our aim is to use next‐generation sequencing methods to leverage samples of a taxon at their center of maximum diversity to explore phylogenetic relationships and a possible mechanism of coexistence. Location
Flint Island, Southern Line Islands, Republic of Kiribati.
A comprehensive review of museum records, the primary literature, and unpublished field survey records was undertaken to determine ranges for four “arc‐eye” hawkfish species in the
Paracirrhitesspecies complex and a potential hybrid. Fish from four Paracirrhitesspecies were collected from Flint Island in the Southern Line Islands, Republic of Kiribati. Hindgut contents were sequenced, and subsequent metagenomic analyses were used to assess the phylogenetic relatedness of the host fish, the microbiome community structure, and prey remains for each species. Results
Phylogenetic analyses conducted with recovered mitochondrial genomes revealed clustering of
P. bicolorwith P. arcatusand P. xanthuswith P. nisus, which were unexpected on the basis of previous morphological work inmore » Main Conclusions
Our findings point toward previously unidentified relationships in this cryptic species complex at its proposed center of distribution. The three species endemic to the Polynesian province (
P. nisus, P. xanthus, and P. bicolor) cluster separately from the more broadly distributed P. arcatuson the basis of relative abundance of metazoan sequences in the gut (presumed prey remains) .Discordance between gut microbial communities and phylogeny of the host fish further reinforce the hypothesis of niche separation.