With growing calls for increased surveillance of antibiotic resistance as an escalating global health threat, improved bioinformatic tools are needed for tracking antibiotic resistance genes (ARGs) across One Health domains. Most studies to date profile ARGs using sequence homology, but such approaches provide limited information about the broader context or function of the ARG in bacterial genomes. Here we introduce a new pipeline for identifying ARGs in genomic data that employs machine learning analysis of Protein-Protein Interaction Networks (PPINs) as a means to improve predictions of ARGs while also providing vital information about the context, such as gene mobility. A random forest model was trained to effectively differentiate between ARGs and nonARGs and was validated using the PPINs of ESKAPE pathogens (
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa , andEnterobacter cloacae ), which represent urgent threats to human health because they tend to be multi-antibiotic resistant. The pipeline exhibited robustness in discriminating ARGs from nonARGs, achieving an average area under the precision-recall curve of 88%. We further identified that the neighbors of ARGs, i.e., genes connected to ARGs by only one edge, were disproportionately associated with mobile genetic elements, which is consistent with the understanding that ARGs tend to be mobile compared to randomly sampled genes in the PPINs. This pipeline showcases the utility of PPINs in discerning distinctive characteristics of ARGs within a broader genomic context and in differentiating ARGs from nonARGs through network-based attributes and interaction patterns. The code for running the pipeline is publicly available athttps://github.com/NazifaMoumi/PPI-ARG-ESKAPE -
Abstract Background While there is increasing recognition of numerous environmental contributions to the spread of antibiotic resistance, quantifying the relative contributions of various sources remains a fundamental challenge. Similarly, there is a need to differentiate acute human health risks corresponding to exposure to a given environment, versus broader ecological risk of evolution and spread of antibiotic resistance genes (ARGs) across microbial taxa. Recent studies have proposed various methods of harnessing the rich information housed by metagenomic data for achieving such aims. Here, we introduce MetaCompare 2.0, which improves upon the original MetaCompare pipeline by differentiating indicators of human health resistome risk (i.e., potential for human pathogens to acquire ARGs) from ecological resistome risk (i.e., overall mobility of ARGs across a given microbiome).
Results To demonstrate the sensitivity of the MetaCompare 2.0 pipeline, we analyzed publicly available metagenomes representing a broad array of environments, including wastewater, surface water, soil, sediment, and human gut. We also assessed the effect of sequence assembly methods on the risk scores. We further evaluated the robustness of the pipeline to sequencing depth, contig count, and metagenomic library coverage bias through comparative analysis of a range of subsamples extracted from a set of deeply sequenced wastewater metagenomes. The analysis utilizing samples from different environments demonstrated that MetaCompare 2.0 consistently produces lower risk scores for environments with little human influence and higher risk scores for human contaminated environments affected by pollution or other stressors. We found that the ranks of risk scores were not measurably affected by different assemblers employed. The Meta-Compare 2.0 risk scores were remarkably consistent despite varying sequencing depth, contig count, and coverage.
Conclusion MetaCompare 2.0 successfully ranked a wide array of environments according to both human health and ecological resistome risks, with both scores being strongly impacted by anthropogenic stress. We packaged the improved pipeline into a publicly-available web service that provides an easy-to-use interface for computing resistome risk scores and visualizing results. The web service is available at
http://metacompare.cs.vt.edu/ -
Abstract Horizontal gene transfer (HGT) occurring within microbiomes is linked to complex environmental and ecological dynamics that are challenging to replicate in controlled settings. Consequently, most extant studies of microbiome HGT are either simplistic experimental settings with tenuous relevance to real microbiomes or correlative studies that assume that HGT potential is a function of the relative abundance of mobile genetic elements (MGEs), the vehicles of HGT. Here we introduce Kairos as a bioinformatic tool deployed in nextflow for detecting HGT events “
in situ, ” i.e., within a microbiome, through analysis of time-series metagenomic sequencing data. Thein-situ framework proposed here leverages available metagenomic data from a longitudinally sampled microbiome to assess whether the chronological occurrence of potential donors, recipients, and putatively transferred regions could plausibly have arisen due to HGT over a range of defined time periods. The centerpiece of the Kairos workflow is a novel competitive read alignment method that enables discernment of even very similar genomic sequences, such as those produced by MGE-associated recombination. A key advantage of Kairos is its reliance on assemblies rather than metagenome assembled genomes (MAGs), which avoids systematic exclusion of accessory genes associated with the binning process. In an example test-case of real world data, use of assemblies directly produced a 264-fold increase in the number of antibiotic resistance genes included in the analysis of HGT compared to analysis of MAGs with MetaCHIP. Further,in silico evaluation of contig taxonomy was performed to assess the accuracy of classification for both chromosomally- and MGE-derived sequences, indicating a high degree of accuracy even for conjugative plasmids up to the level of class or order. Thus, Kairos enables the analysis of very recent HGT events, making it suitable for studying rapid prokaryotic adaptation in environmental systems without disturbing the ornate ecological dynamics associated with microbiomes. Current versions of the Kairos workflow are available here:https://github.com/clb21565/kairos . -
Nojiri, Hideaki (Ed.)ABSTRACT Bacterial mobile genetic elements (MGEs) encode functional modules that perform both core and accessory functions for the element, the latter of which are often only transiently associated with the element. The presence of these accessory genes, which are often close homologs to primarily immobile genes, incur high rates of false positives and, therefore, limits the usability of these databases for MGE annotation. To overcome this limitation, we analyzed 10,776,849 protein sequences derived from eight MGE databases to compile a comprehensive set of 6,140 manually curated protein families that are linked to the “life cycle” (integration/excision, replication/recombination/repair, transfer, stability/transfer/defense, and phage-specific processes) of plasmids, phages, integrative, transposable, and conjugative elements. We overlay experimental information where available to create a tiered annotation scheme of high-quality annotations and annotations inferred exclusively through bioinformatic evidence. We additionally provide an MGE-class label for each entry (e.g., plasmid or integrative element), and assign to each entry a major and minor category. The resulting database, mobileOG-db (for mobile orthologous groups), comprises over 700,000 deduplicated sequences encompassing five major mobileOG categories and more than 50 minor categories, providing a structured language and interpretable basis for an array of MGE-centered analyses. mobileOG-db can be accessed at mobileogdb.flsi.cloud.vt.edu/, where users can select, refine, and analyze custom subsets of the dynamic mobilome. IMPORTANCE The analysis of bacterial mobile genetic elements (MGEs) in genomic data is a critical step toward profiling the root causes of antibiotic resistance, phenotypic or metabolic diversity, and the evolution of bacterial genera. Existing methods for MGE annotation pose high barriers of biological and computational expertise to properly harness. To bridge this gap, we systematically analyzed 10,776,849 proteins derived from eight databases of MGEs to identify 6,140 MGE protein families that can serve as candidate hallmarks, i.e., proteins that can be used as “signatures” of MGEs to aid annotation. The resulting resource, mobileOG-db, provides a multilevel classification scheme that encompasses plasmid, phage, integrative, and transposable element protein families categorized into five major mobileOG categories and more than 50 minor categories. mobileOG-db thus provides a rich resource for simple and intuitive element annotation that can be integrated seamlessly into existing MGE detection pipelines and colocalization analyses.more » « less
-
Abstract In the fight to limit the global spread of antibiotic resistance, the assembly of environmental metagenomes has the potential to provide rich contextual information (e.g., taxonomic hosts, carriage on mobile genetic elements) about antibiotic resistance genes (ARG) in the environment. However, computational challenges associated with assembly can impact the accuracy of downstream analyses. This work critically evaluates the impact of assembly leveraging short reads, nanopore MinION long-reads, and a combination of the two (hybrid) on ARG contextualization for ten environmental metagenomes using seven prominent assemblers (IDBA-UD, MEGAHIT, Canu, Flye, Opera-MS, metaSpades and HybridSpades). While short-read and hybrid assemblies produced similar patterns of ARG contextualization, raw or assembled long nanopore reads produced distinct patterns. Based on an in-silico spike-in experiment using real and simulated reads, we show that low to intermediate coverage species are more likely to be incorporated into chimeric contigs across all assemblers and sequencing technologies, while more abundant species produce assemblies with a greater frequency of inversions and insertion/deletions (indels). In sum, our analyses support hybrid assembly as a valuable technique for boosting the reliability and accuracy of assembly-based analyses of ARGs and neighboring genes at environmentally-relevant coverages, provided that sufficient short-read sequencing depth is achieved.