skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Kairos infers in situ horizontal gene transfer in longitudinally sampled microbiomes through microdiversity-aware sequence analysis
Abstract Horizontal gene transfer (HGT) occurring within microbiomes is linked to complex environmental and ecological dynamics that are challenging to replicate in controlled settings. Consequently, most extant studies of microbiome HGT are either simplistic experimental settings with tenuous relevance to real microbiomes or correlative studies that assume that HGT potential is a function of the relative abundance of mobile genetic elements (MGEs), the vehicles of HGT. Here we introduce Kairos as a bioinformatic tool deployed in nextflow for detecting HGT events “in situ,” i.e., within a microbiome, through analysis of time-series metagenomic sequencing data. Thein-situframework proposed here leverages available metagenomic data from a longitudinally sampled microbiome to assess whether the chronological occurrence of potential donors, recipients, and putatively transferred regions could plausibly have arisen due to HGT over a range of defined time periods. The centerpiece of the Kairos workflow is a novel competitive read alignment method that enables discernment of even very similar genomic sequences, such as those produced by MGE-associated recombination. A key advantage of Kairos is its reliance on assemblies rather than metagenome assembled genomes (MAGs), which avoids systematic exclusion of accessory genes associated with the binning process. In an example test-case of real world data, use of assemblies directly produced a 264-fold increase in the number of antibiotic resistance genes included in the analysis of HGT compared to analysis of MAGs with MetaCHIP. Further,in silicoevaluation of contig taxonomy was performed to assess the accuracy of classification for both chromosomally- and MGE-derived sequences, indicating a high degree of accuracy even for conjugative plasmids up to the level of class or order. Thus, Kairos enables the analysis of very recent HGT events, making it suitable for studying rapid prokaryotic adaptation in environmental systems without disturbing the ornate ecological dynamics associated with microbiomes. Current versions of the Kairos workflow are available here:https://github.com/clb21565/kairos.  more » « less
Award ID(s):
2004751
PAR ID:
10553424
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
bioRxiv
Date Published:
Format(s):
Medium: X
Institution:
bioRxiv
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The widespread misuse of antibiotics has escalated antibiotic resistance into a critical global public health concern. Beyond antibiotics, metals function as antibacterial agents. Metal resistance genes (MRGs) enable bacteria to tolerate metal-based antibacterials and may also foster antibiotic resistance within bacterial communities through co-selection. Thus, predicting bacterial MRGs is vital for elucidating their involvement in antibiotic resistance and metal tolerance mechanisms. The “best hit” approach is mainly utilized to identify and annotate MRGs. This method is sensitive to cutoff values and produces a high false negative rate. Other than the best hit approach, only a few antimicrobial resistance (AMR) detection tools exist for predicting MRGs. However, these tools lack comprehensive annotation for MRGs conferring resistance to multiple metals. To address such limitations, we introduce DeepMRG, a deep learning-based multi-label classifier, to predict bacterial MRGs. Because a bacterial MRG can confer resistance to multiple metals, DeepMRG is designed as a multi-label classifier capable of predicting multiple metal labels associated with an MRG. It leverages bit score-based similarity distribution of sequences with experimentally verified MRGs. To ensure unbiased model evaluation, we employed a clustering method to partition our dataset into six subsets, five for cross-validation and one for testing, with non-homologous sequences, mitigating the impact of sequence homology. DeepMRG consistently achieved high overall F1-scores and significantly reduced false negative rates across a wide range of datasets. It can be used to predict bacterial MRGs in metagenomic or isolate assemblies. The web server of DeepMRG can be accessed athttps://deepmrg.cs.vt.edu/deepmrgand the source code is available athttps://github.com/muhit-emon/DeepMRGunder the MIT license. 
    more » « less
  2. Abstract BackgroundThe aerial surface of plants, known as the phyllosphere, hosts a complex and dynamic microbiome that plays essential roles in plant health and environmental processes. While research has focused on root-associated microbiomes, the phyllosphere remains comparatively understudied, especially in forest ecosystems. Despite the global ecological dominance and importance of conifers, no previous study has applied shotgun metagenomics to their phyllosphere microbiomes. ResultsThis study uses metagenomic sequencing to explore the microbial phyllosphere communities of subalpine Western conifer needle surfaces from 67 trees at six sites spanning the Rocky Mountains, including 31 limber pine, 18 Douglas fir, and 18 Engelmann spruce. Sites span ~ 1,075 km and nearly 10° latitude, from Glacier National Park to Rocky Mountain Biological Laboratory, capturing broad environmental variation. Metagenomes were generated for each of the 67 samples, for which we produced individual assemblies, along with three large coassemblies specific to each conifer host. From these datasets, we reconstructed 447 metagenome-assembled genomes (MAGs), 417 of which are non-redundant at the species level. Beyond increasing the total number of extracted MAGs from 153 to 294, the three coassemblies yielded three large MAGs, representing partial sequences of host genomes. Phylogenomics of all microbial MAGs revealed communities predominantly composed of bacteria (n = 327) and fungi (n = 117). We show that both microbial community composition and metabolic potential differ significantly across host tree species and geographic sites, with site exerting a stronger influence than host. ConclusionsThis dataset offers new insights into the microbial communities inhabiting the conifer needle surface, laying the foundation for future research on needle microbiomes across temporal and spatial scales. Variation in functional capabilities, such as volatile organic compound (VOC) degradation and polysaccharide metabolism, closely tracks shifts in taxonomic composition, indicating that host-specific chemistry, local environmental factors, and regional microbial source pools jointly shape ecological roles. Moreover, the observed patterns of mobile genetic elements and horizontal gene transfer suggest that gene exchange predominantly occurs within microbial lineages, with occasional broader transfers dispersing key functional genes (e.g., those involved in polysaccharide metabolism), which may facilitate microbiome adaptation. 
    more » « less
  3. Gralnick, Jeffrey A. (Ed.)
    ABSTRACT Reconstructing microbial genomes from metagenomic short-read data can be challenging due to the unknown and uneven complexity of microbial communities. This complexity encompasses highly diverse populations, which often includes strain variants. Reconstructing high-quality genomes is a crucial part of the metagenomic workflow, as subsequent ecological and metabolic inferences depend on their accuracy, quality, and completeness. In contrast to microbial communities in other ecosystems, there has been no systematic assessment of genome-centric metagenomic workflows for drinking water microbiomes. In this study, we assessed the performance of a combination of assembly and binning strategies for time series drinking water metagenomes that were collected over 6 months. The goal of this study was to identify the combination of assembly and binning approaches that result in high-quality and -quantity metagenome-assembled genomes (MAGs), representing most of the sequenced metagenome. Our findings suggest that the metaSPAdes coassembly strategies had the best performance, as they resulted in larger and less fragmented assemblies, with at least 85% of the sequence data mapping to contigs greater than 1 kbp. Furthermore, a combination of metaSPAdes coassembly strategies and MetaBAT2 produced the highest number of medium-quality MAGs while capturing at least 70% of the metagenomes based on read recruitment. Utilizing different assembly/binning approaches also assists in the reconstruction of unique MAGs from closely related species that would have otherwise collapsed into a single MAG using a single workflow. Overall, our study suggests that leveraging multiple binning approaches with different metaSPAdes coassembly strategies may be required to maximize the recovery of good-quality MAGs. IMPORTANCE Drinking water contains phylogenetic diverse groups of bacteria, archaea, and eukarya that affect the esthetic quality of water, water infrastructure, and public health. Taxonomic, metabolic, and ecological inferences of the drinking water microbiome depend on the accuracy, quality, and completeness of genomes that are reconstructed through the application of genome-resolved metagenomics. Using time series metagenomic data, we present reproducible genome-centric metagenomic workflows that result in high-quality and -quantity genomes, which more accurately signifies the sequenced drinking water microbiome. These genome-centric metagenomic workflows will allow for improved taxonomic and functional potential analysis that offers enhanced insights into the stability and dynamics of drinking water microbial communities. 
    more » « less
  4. Abstract The introduction of high-throughput chromosome conformation capture (Hi-C) into metagenomics enables reconstructing high-quality metagenome-assembled genomes (MAGs) from microbial communities. Despite recent advances in recovering eukaryotic, bacterial, and archaeal genomes using Hi-C contact maps, few of Hi-C-based methods are designed to retrieve viral genomes. Here we introduce ViralCC, a publicly available tool to recover complete viral genomes and detect virus-host pairs using Hi-C data. Compared to other Hi-C-based methods, ViralCC leverages the virus-host proximity structure as a complementary information source for the Hi-C interactions. Using mock and real metagenomic Hi-C datasets from several different microbial ecosystems, including the human gut, cow fecal, and wastewater, we demonstrate that ViralCC outperforms existing Hi-C-based binning methods as well as state-of-the-art tools specifically dedicated to metagenomic viral binning. ViralCC can also reveal the taxonomic structure of viruses and virus-host pairs in microbial communities. When applied to a real wastewater metagenomic Hi-C dataset, ViralCC constructs a phage-host network, which is further validated using CRISPR spacer analyses. ViralCC is an open-source pipeline available athttps://github.com/dyxstat/ViralCC. 
    more » « less
  5. Abstract BackgroundExploring metagenomic contigs and “binning” them into metagenome-assembled genomes (MAGs) are essential for the delineation of functional and evolutionary guilds within microbial communities. Despite the advances in automated binning algorithms, their capabilities in recovering MAGs with accuracy and biological relevance are so far limited. Researchers often find that human involvement is necessary to achieve representative binning results. This manual process however is expertise demanding and labor intensive, and it deserves to be supported by software infrastructure. ResultsWe present BinaRena, a comprehensive and versatile graphic interface dedicated to aiding human operators to explore metagenome assemblies via customizable visualization and to associate contigs with bins. Contigs are rendered as an interactive scatter plot based on various data types, including sequence metrics, coverage profiles, taxonomic assignments, and functional annotations. Various contig-level operations are permitted, such as selection, masking, highlighting, focusing, and searching. Binning plans can be conveniently edited, inspected, and compared visually or using metrics including silhouette coefficient and adjusted Rand index. Completeness and contamination of user-selected contigs can be calculated in real time.In demonstration of BinaRena’s usability, we show that it facilitated biological pattern discovery, hypothesis generation, and bin refinement in a complex tropical peatland metagenome. It enabled isolation of pathogenic genomes within closely related populations from the gut microbiota of diarrheal human subjects. It significantly improved overall binning quality after curating results of automated binners using a simulated marine dataset. ConclusionsBinaRena is an installation-free, dependency-free, client-end web application that operates directly in any modern web browser, facilitating ease of deployment and accessibility for researchers of all skill levels. The program is hosted athttps://github.com/qiyunlab/binarena, together with documentation, tutorials, example data, and a live demo. It effectively supports human researchers in intuitive interpretation and fine tuning of metagenomic data. 
    more » « less