skip to main content


This content will become publicly available on February 8, 2025

Title: Resistance genes are distinct in protein-protein interaction networks according to drug class and gene mobility
Abstract

With growing calls for increased surveillance of antibiotic resistance as an escalating global health threat, improved bioinformatic tools are needed for tracking antibiotic resistance genes (ARGs) across One Health domains. Most studies to date profile ARGs using sequence homology, but such approaches provide limited information about the broader context or function of the ARG in bacterial genomes. Here we introduce a new pipeline for identifying ARGs in genomic data that employs machine learning analysis of Protein-Protein Interaction Networks (PPINs) as a means to improve predictions of ARGs while also providing vital information about the context, such as gene mobility. A random forest model was trained to effectively differentiate between ARGs and nonARGs and was validated using the PPINs of ESKAPE pathogens (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, andEnterobacter cloacae), which represent urgent threats to human health because they tend to be multi-antibiotic resistant. The pipeline exhibited robustness in discriminating ARGs from nonARGs, achieving an average area under the precision-recall curve of 88%. We further identified that the neighbors of ARGs, i.e., genes connected to ARGs by only one edge, were disproportionately associated with mobile genetic elements, which is consistent with the understanding that ARGs tend to be mobile compared to randomly sampled genes in the PPINs. This pipeline showcases the utility of PPINs in discerning distinctive characteristics of ARGs within a broader genomic context and in differentiating ARGs from nonARGs through network-based attributes and interaction patterns. The code for running the pipeline is publicly available athttps://github.com/NazifaMoumi/PPI-ARG-ESKAPE

 
more » « less
Award ID(s):
2004751
PAR ID:
10553669
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
bioRxiv
Date Published:
Format(s):
Medium: X
Institution:
bioRxiv
Sponsoring Org:
National Science Foundation
More Like this
  1. Antibiotic resistance is of crucial interest to both human and animal medicine. It has been recognized that increased environmental monitoring of antibiotic resistance is needed. Metagenomic DNA sequencing is becoming an attractive method to profile antibiotic resistance genes (ARGs), including a special focus on pathogens. A number of computational pipelines are available and under development to support environmental ARG monitoring; the pipeline we present here is promising for general adoption for the purpose of harmonized global monitoring. Specifically, ARGem is a user-friendly pipeline that provides full-service analysis, from the initial DNA short reads to the final visualization of results. The capture of extensive metadata is also facilitated to support comparability across projects and broader monitoring goals. The ARGem pipeline offers efficient analysis of a modest number of samples along with affordable computational components, though the throughput could be increased through cloud resources, based on the user’s configuration. The pipeline components were carefully assessed and selected to satisfy tradeoffs, balancing efficiency and flexibility. It was essential to provide a step to perform short read assembly in a reasonable time frame to ensure accurate annotation of identified ARGs. Comprehensive ARG and mobile genetic element databases are included in ARGem for annotation support. ARGem further includes an expandable set of analysis tools that include statistical and network analysis and supports various useful visualization techniques, including Cytoscape visualization of co-occurrence and correlation networks. The performance and flexibility of the ARGem pipeline is demonstrated with analysis of aquatic metagenomes. The pipeline is freely available athttps://github.com/xlxlxlx/ARGem.

     
    more » « less
  2. Antibiotic resistance is a continually rising threat to global health. A primary driver of the evolution of new strains of resistant pathogens is the horizontal gene transfer (HGT) of antibiotic resistance genes (ARGs). However, identifying and quantifying ARGs subject to HGT remains a significant challenge. Here, we introduce HT-ARGfinder (horizontally transferred ARG finder), a pipeline that detects and enumerates horizontally transferred ARGs in metagenomic data while also estimating the directionality of transfer. To demonstrate the pipeline, we applied it to an array of publicly-available wastewater metagenomes, including hospital sewage. We compare the horizontally transferred ARGs detected across various sample types and estimate their directionality of transfer among donors and recipients. This study introduces a comprehensive tool to track mobile ARGs in wastewater and other aquatic environments. 
    more » « less
  3. Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. Curated ARG databases exist together with assorted ARG classification approaches (e.g., database alignment, machine learning). Besides ARGs that naturally occur in bacterial strains or are acquired through mobile elements, there are chromosomal genes that can render a bacterium resistant to antibiotics through point mutations, i.e., ARG variants (ARGVs). While ARG repositories also collect ARGVs, there are only a few tools that are able to identify ARGVs from metagenomics and high throughput sequencing data, with a number of limitations (e.g., pre-assembly,a posterioriverification of mutations, or specification of species). In this work we present thek-mer, i.e., strings of fixed lengthk, ARGV analyzer – KARGVA – an open-source, multi-platform tool that provides: (i) anad hoc, large ARGV database derived from multiple sources; (ii) input capability for various types of high-throughput sequencing data; (iii) a three-way, hash-based,k-mer search setup to process data efficiently, linkingk-mers to ARGVs,k-mers to point mutations, and ARGVs tok-mers, respectively; (iv) a statistical filter on sequence classification to reduce type I and II errors. On semi-synthetic data, KARGVA provides very high accuracy even in presence of high sequencing errors or mutations (99.2 and 86.6% accuracy within 1 and 5% base change rates, respectively), and genome rearrangements (98.2% accuracy), with robust performance onad hocfalse positive sets. On data from the worldwide MetaSUB consortium, comprising 3,700+ metagenomics experiments, KARGVA identifies more ARGVs than Resistance Gene Identifier (4.8x) and PointFinder (6.8x), yet all predictions are below the expected false positive estimates. The prevalence of ARGVs is correlated to ARGs but ecological characteristics do not explain well ARGV variance. KARGVA is publicly available athttps://github.com/DataIntellSystLab/KARGVAunder MIT license.

     
    more » « less
  4. Li, Jinyan (Ed.)

    Artificial intelligence-powered protein structure prediction methods have led to a paradigm-shift in computational structural biology, yet contemporary approaches for predicting the interfacial residues (i.e., sites) of protein-protein interaction (PPI) still rely on experimental structures. Recent studies have demonstrated benefits of employing graph convolution for PPI site prediction, but ignore symmetries naturally occurring in 3-dimensional space and act only on experimental coordinates. Here we present EquiPPIS, an E(3) equivariant graph neural network approach for PPI site prediction. EquiPPIS employs symmetry-aware graph convolutions that transform equivariantly with translation, rotation, and reflection in 3D space, providing richer representations for molecular data compared to invariant convolutions. EquiPPIS substantially outperforms state-of-the-art approaches based on the same experimental input, and exhibits remarkable robustness by attaining better accuracy with predicted structural models from AlphaFold2 than what existing methods can achieve even with experimental structures. Freely available athttps://github.com/Bhattacharya-Lab/EquiPPIS, EquiPPIS enables accurate PPI site prediction at scale.

     
    more » « less
  5. Abstract Background

    While there is increasing recognition of numerous environmental contributions to the spread of antibiotic resistance, quantifying the relative contributions of various sources remains a fundamental challenge. Similarly, there is a need to differentiate acute human health risks corresponding to exposure to a given environment, versus broader ecological risk of evolution and spread of antibiotic resistance genes (ARGs) across microbial taxa. Recent studies have proposed various methods of harnessing the rich information housed by metagenomic data for achieving such aims. Here, we introduce MetaCompare 2.0, which improves upon the original MetaCompare pipeline by differentiating indicators of human health resistome risk (i.e., potential for human pathogens to acquire ARGs) from ecological resistome risk (i.e., overall mobility of ARGs across a given microbiome).

    Results

    To demonstrate the sensitivity of the MetaCompare 2.0 pipeline, we analyzed publicly available metagenomes representing a broad array of environments, including wastewater, surface water, soil, sediment, and human gut. We also assessed the effect of sequence assembly methods on the risk scores. We further evaluated the robustness of the pipeline to sequencing depth, contig count, and metagenomic library coverage bias through comparative analysis of a range of subsamples extracted from a set of deeply sequenced wastewater metagenomes. The analysis utilizing samples from different environments demonstrated that MetaCompare 2.0 consistently produces lower risk scores for environments with little human influence and higher risk scores for human contaminated environments affected by pollution or other stressors. We found that the ranks of risk scores were not measurably affected by different assemblers employed. The Meta-Compare 2.0 risk scores were remarkably consistent despite varying sequencing depth, contig count, and coverage.

    Conclusion

    MetaCompare 2.0 successfully ranked a wide array of environments according to both human health and ecological resistome risks, with both scores being strongly impacted by anthropogenic stress. We packaged the improved pipeline into a publicly-available web service that provides an easy-to-use interface for computing resistome risk scores and visualizing results. The web service is available athttp://metacompare.cs.vt.edu/

     
    more » « less