Abstract MotivationMetagenomic binning aims to retrieve microbial genomes directly from ecosystems by clustering metagenomic contigs assembled from short reads into draft genomic bins. Traditional shotgun-based binning methods depend on the contigs’ composition and abundance profiles and are impaired by the paucity of enough samples to construct reliable co-abundance profiles. When applied to a single sample, shotgun-based binning methods struggle to distinguish closely related species only using composition information. As an alternative binning approach, Hi-C-based binning employs metagenomic Hi-C technique to measure the proximity contacts between metagenomic fragments. However, spurious inter-species Hi-C contacts inevitably generated by incorrect ligations of DNA fragments between species link the contigs from varying genomes, weakening the purity of final draft genomic bins. Therefore, it is imperative to develop a binning pipeline to overcome the shortcomings of both types of binning methods on a single sample. ResultsWe develop HiFine, a novel binning pipeline to refine the binning results of metagenomic contigs by integrating both Hi-C-based and shotgun-based binning tools. HiFine designs a strategy of fragmentation for the original bin sets derived from the Hi-C-based and shotgun-based binning methods, which considerably increases the purity of initial bins, followed by merging fragmented bins and recruiting unbinned contigs. We demonstrate that HiFine significantly improves the existing binning results of both types of binning methods and achieves better performance in constructing species genomes on publicly available datasets. To the best of our knowledge, HiFine is the first pipeline to integrate different types of tools for the binning of metagenomic contigs. Availability and implementationHiFine is available at https://github.com/dyxstat/HiFine. Supplementary informationSupplementary data are available at Bioinformatics online. 
                        more » 
                        « less   
                    
                            
                            Comparison of metagenomic and traditional methods for diagnosis of E. coli enteric infections
                        
                    
    
            ABSTRACT DiarrheagenicEscherichia coli, collectively known as DEC, is a leading cause of diarrhea, particularly in children in low- and middle-income countries. Diagnosing infections caused by different DEC pathotypes traditionally relies on the cultivation and identification of virulence genes, a resource-intensive and error-prone process. Here, we compared culture-based DEC identification with shotgun metagenomic sequencing of whole stool using 35 randomly drawn samples from a cohort of diarrhea-afflicted patients. Metagenomic sequencing detected the cultured isolates in 97% of samples, revealing, overall, reliable detection by this approach. Genome binning yielded high-qualityE. colimetagenome-assembled genomes (MAGs) for 13 samples, and we observed that the MAG did not carry the diagnostic DEC virulence genes of the corresponding isolate in 60% of these samples. Specifically, two distinct scenarios were observed: diffusely adherentE. coli(DAEC) isolates without corresponding DAEC MAGs appeared to be relatively rare members of the microbiome, which was further corroborated by quantitative PCR (qPCR), and thus unlikely to represent the etiological agent in 3 of the 13 samples (~23%). In contrast, ETEC virulence genes were located on plasmids and largely escaped binning in associated MAGs despite being prevalent in the sample (5/13 samples or ~38%), revealing limitations of the metagenomic approach. These results provide important insights for diagnosing DEC infections and demonstrate how metagenomic methods can complement isolation efforts and PCR for pathogen identification and population abundance. IMPORTANCEDiagnosing enteric infections based on traditional methods involving isolation and PCR can be erroneous due to isolation and other biases, e.g., the most abundant pathogen may not be recovered on isolation media. By employing shotgun metagenomics together with traditional methods on the same stool samples, we show that mixed infections caused by multiple pathogens are much more frequent than traditional methods indicate in the case of acute diarrhea. Further, in at least 8.5% of the total samples examined, the metagenomic approach reliably identified a different pathogen than the traditional approach. Therefore, our results provide a methodology to complement existing methods for enteric infection diagnostics with cutting-edge, culture-independent metagenomic techniques, and highlight the strengths and limitations of each approach. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10526688
- Editor(s):
- Parkhill, Julian
- Publisher / Repository:
- mBio journal by the American Society for Microbiology
- Date Published:
- Journal Name:
- mBio
- Volume:
- 15
- Issue:
- 4
- ISSN:
- 2150-7511
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            McBain, Andrew J. (Ed.)ABSTRACT The recovery of metagenome-assembled genomes (MAGs) from metagenomic data has recently become a common task for microbial studies. The strengths and limitations of the underlying bioinformatics algorithms are well appreciated by now based on performance tests with mock data sets of known composition. However, these mock data sets do not capture the complexity and diversity often observed within natural populations, since their construction typically relies on only a single genome of a given organism. Further, it remains unclear if MAGs can recover population-variable genes (those shared by >10% but <90% of the members of the population) as efficiently as core genes (those shared by >90% of the members). To address these issues, we compared the gene variabilities of pathogenic Escherichia coli isolates from eight diarrheal samples, for which the isolate was the causative agent, against their corresponding MAGs recovered from the companion metagenomic data set. Our analysis revealed that MAGs with completeness estimates near 95% captured only 77% of the population core genes and 50% of the variable genes, on average. Further, about 5% of the genes of these MAGs were conservatively identified as missing in the isolate and were of different (non- Enterobacteriaceae ) taxonomic origin, suggesting errors at the genome-binning step, even though contamination estimates based on commonly used pipelines were only 1.5%. Therefore, the quality of MAGs may often be worse than estimated, and we offer examples of how to recognize and improve such MAGs to sufficient quality by (for instance) employing only contigs longer than 1,000 bp for binning. IMPORTANCE Metagenome assembly and the recovery of metagenome-assembled genomes (MAGs) have recently become common tasks for microbiome studies across environmental and clinical settings. However, the extent to which MAGs can capture the genes of the population they represent remains speculative. Current approaches to evaluating MAG quality are limited to the recovery and copy number of universal housekeeping genes, which represent a small fraction of the total genome, leaving the majority of the genome essentially inaccessible. If MAG quality in reality is lower than these approaches would estimate, this could have dramatic consequences for all downstream analyses and interpretations. In this study, we evaluated this issue using an approach that employed comparisons of the gene contents of MAGs to the gene contents of isolate genomes derived from the same sample. Further, our samples originated from a diarrhea case-control study, and thus, our results are relevant for recovering the virulence factors of pathogens from metagenomic data sets.more » « less
- 
            Access to safe and nutritious food is critical for maintaining life and supporting good health. Eating food that is contaminated with pathogens leads to serious diseases ranging from diarrhea to cancer. Many foodborne infections can cause long-term impairment or even death. Hence, early detection of foodborne pathogens such as pathogenicEscherichia colistrains is essential for public safety. Conventional methods for detecting these bacteria are based on culturing on selective media and following standard biochemical identification. Despite their accuracy, these methods are time-consuming. PCR-based detection of pathogens relies on sophisticated equipment and specialized technicians which are difficult to find in areas with limited resources. Whereas CRISPR technology is more specific and sensitive for identifying pathogenic bacteria because it employs programmable CRISPR-Cas systems that target particular DNA sequences, minimizing non-specific binding and cross-reactivity. In this project, a robust detection method based on CRISPR-Cas12a sensing was developed, which is rapid, sensitive and specific for detection of pathogenicE. coliisolates that were collected from the fecal samples from adult goats from 17 farms in Tennessee. Detection reaction contained amplified PCR products for the pathogenic regions, reporter probe, Cas12a enzyme, and crRNA specific to three pathogenic genes—stx1, stx2, and hlyA. The CRISPR reaction with the pathogenic bacteria emitted fluorescence when excited under UV light. To evaluate the detection sensitivity and specificity of this assay, its results were compared with PCR based detection assay. Both methods resulted in similar results for the same samples. This technique is very precise, highly sensitive, quick, cost effective, and easy to use, and can easily overcome the limitations of the present detection methods. This project can result in a versatile detection method that is easily adaptable for rapid response in the detection and surveillance of diseases that pose large-scale biosecurity threats to human health, and plant and animal production.more » « less
- 
            Abstract BackgroundExploring metagenomic contigs and “binning” them into metagenome-assembled genomes (MAGs) are essential for the delineation of functional and evolutionary guilds within microbial communities. Despite the advances in automated binning algorithms, their capabilities in recovering MAGs with accuracy and biological relevance are so far limited. Researchers often find that human involvement is necessary to achieve representative binning results. This manual process however is expertise demanding and labor intensive, and it deserves to be supported by software infrastructure. ResultsWe present BinaRena, a comprehensive and versatile graphic interface dedicated to aiding human operators to explore metagenome assemblies via customizable visualization and to associate contigs with bins. Contigs are rendered as an interactive scatter plot based on various data types, including sequence metrics, coverage profiles, taxonomic assignments, and functional annotations. Various contig-level operations are permitted, such as selection, masking, highlighting, focusing, and searching. Binning plans can be conveniently edited, inspected, and compared visually or using metrics including silhouette coefficient and adjusted Rand index. Completeness and contamination of user-selected contigs can be calculated in real time.In demonstration of BinaRena’s usability, we show that it facilitated biological pattern discovery, hypothesis generation, and bin refinement in a complex tropical peatland metagenome. It enabled isolation of pathogenic genomes within closely related populations from the gut microbiota of diarrheal human subjects. It significantly improved overall binning quality after curating results of automated binners using a simulated marine dataset. ConclusionsBinaRena is an installation-free, dependency-free, client-end web application that operates directly in any modern web browser, facilitating ease of deployment and accessibility for researchers of all skill levels. The program is hosted athttps://github.com/qiyunlab/binarena, together with documentation, tutorials, example data, and a live demo. It effectively supports human researchers in intuitive interpretation and fine tuning of metagenomic data.more » « less
- 
            Abstract Recovering high-quality metagenome-assembled genomes (MAGs) from complex microbial ecosystems remains challenging. Recently, high-throughput chromosome conformation capture (Hi-C) has been applied to simultaneously study multiple genomes in natural microbial communities. We develop HiCBin, a novel open-source pipeline, to resolve high-quality MAGs utilizing Hi-C contact maps. HiCBin employs the HiCzin normalization method and the Leiden clustering algorithm and includes the spurious contact detection into binning pipelines for the first time. HiCBin is validated on one synthetic and two real metagenomic samples and is shown to outperform the existing Hi-C-based binning methods. HiCBin is available at https://github.com/dyxstat/HiCBin .more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    