skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The K-mer antibiotic resistance gene variant analyzer (KARGVA)
Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. Curated ARG databases exist together with assorted ARG classification approaches (e.g., database alignment, machine learning). Besides ARGs that naturally occur in bacterial strains or are acquired through mobile elements, there are chromosomal genes that can render a bacterium resistant to antibiotics through point mutations, i.e., ARG variants (ARGVs). While ARG repositories also collect ARGVs, there are only a few tools that are able to identify ARGVs from metagenomics and high throughput sequencing data, with a number of limitations (e.g., pre-assembly,a posterioriverification of mutations, or specification of species). In this work we present thek-mer, i.e., strings of fixed lengthk, ARGV analyzer – KARGVA – an open-source, multi-platform tool that provides: (i) anad hoc, large ARGV database derived from multiple sources; (ii) input capability for various types of high-throughput sequencing data; (iii) a three-way, hash-based,k-mer search setup to process data efficiently, linkingk-mers to ARGVs,k-mers to point mutations, and ARGVs tok-mers, respectively; (iv) a statistical filter on sequence classification to reduce type I and II errors. On semi-synthetic data, KARGVA provides very high accuracy even in presence of high sequencing errors or mutations (99.2 and 86.6% accuracy within 1 and 5% base change rates, respectively), and genome rearrangements (98.2% accuracy), with robust performance onad hocfalse positive sets. On data from the worldwide MetaSUB consortium, comprising 3,700+ metagenomics experiments, KARGVA identifies more ARGVs than Resistance Gene Identifier (4.8x) and PointFinder (6.8x), yet all predictions are below the expected false positive estimates. The prevalence of ARGVs is correlated to ARGs but ecological characteristics do not explain well ARGV variance. KARGVA is publicly available athttps://github.com/DataIntellSystLab/KARGVAunder MIT license.  more » « less
Award ID(s):
2013998
PAR ID:
10473208
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Frontiers (publisher); Web of Science, PubMed (repositories)
Date Published:
Journal Name:
Frontiers in Microbiology
Volume:
14
ISSN:
1664-302X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    High-throughput sequencing is widely used for strain detection and characterization of antibiotic resistance in microbial metagenomic samples. Current analytical tools use curated antibiotic resistance gene (ARG) databases to classify individual sequencing reads or assembled contigs. However, identifying ARGs from raw read data can be time consuming (especially if assembly or alignment is required) and challenging, due to genome rearrangements and mutations. Here, we present the k-mer-based antibiotic gene resistance analyzer (KARGA), a multi-platform Java toolkit for identifying ARGs from metagenomic short read data. KARGA does not perform alignment; it uses an efficient double-lookup strategy, statistical filtering on false positives, and provides individual read classification as well as covering of the database resistome. On simulated data, KARGA’s antibiotic resistance class recall is 99.89% for error/mutation rates within 10%, and of 83.37% for error/mutation rates between 10% and 25%, while it is 99.92% on ARGs with rearrangements. On empirical data, KARGA provides higher hit score (≥1.5-fold) than AMRPlusPlus, DeepARG, and MetaMARC. KARGA has also faster runtimes than all other tools (2x faster than AMRPlusPlus, 7x than DeepARG, and over 100x than MetaMARC). KARGA is available under the MIT license at https://github.com/DataIntellSystLab/KARGA. 
    more » « less
  2. Antibiotic resistance (AR) presents a global health challenge, necessitating an improved understanding of the ecology, evolution, and dissemination of antibiotic resistance genes (ARGs). Several tools, databases, and algorithms are now available to facilitate the identification of ARGs in metagenomic sequencing data; however, direct annotation of short-read data provides limited contextual information. Knowledge of whether an ARG is carried in the chromosome or on a specific mobile genetic element (MGE) is critical to understanding mobility, persistence, and potential for co-selection. Here we developed ARGContextProfiler, a pipeline designed to extract and visualize ARG genomic contexts. By leveraging the assembly graph for genomic neighborhood extraction and validating contexts through read mapping, ARGContextProfiler minimizes chimeric errors that are a common artifact of assembly outputs. Testing on real, synthetic, and semi-synthetic data, including long-read sequencing data from environmental samples, demonstrated that ARGContextProfiler offers superior accuracy, precision, and sensitivity compared to conventional assembly-based methods. ARGContextProfiler thus provides a powerful tool for uncovering the genomic context of ARGs in metagenomic sequencing data, which can be of value to both fundamental and applied research aimed at understanding and stemming the spread of AR. The source code of ARGContextProfiler is publicly available atGitHub. 
    more » « less
  3. Abstract With growing calls for increased surveillance of antibiotic resistance as an escalating global health threat, improved bioinformatic tools are needed for tracking antibiotic resistance genes (ARGs) across One Health domains. Most studies to date profile ARGs using sequence homology, but such approaches provide limited information about the broader context or function of the ARG in bacterial genomes. Here we introduce a new pipeline for identifying ARGs in genomic data that employs machine learning analysis of Protein-Protein Interaction Networks (PPINs) as a means to improve predictions of ARGs while also providing vital information about the context, such as gene mobility. A random forest model was trained to effectively differentiate between ARGs and nonARGs and was validated using the PPINs of ESKAPE pathogens (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, andEnterobacter cloacae), which represent urgent threats to human health because they tend to be multi-antibiotic resistant. The pipeline exhibited robustness in discriminating ARGs from nonARGs, achieving an average area under the precision-recall curve of 88%. We further identified that the neighbors of ARGs, i.e., genes connected to ARGs by only one edge, were disproportionately associated with mobile genetic elements, which is consistent with the understanding that ARGs tend to be mobile compared to randomly sampled genes in the PPINs. This pipeline showcases the utility of PPINs in discerning distinctive characteristics of ARGs within a broader genomic context and in differentiating ARGs from nonARGs through network-based attributes and interaction patterns. The code for running the pipeline is publicly available athttps://github.com/NazifaMoumi/PPI-ARG-ESKAPE 
    more » « less
  4. Nanopore technology enables portable, real-time sequencing of microbial populations from clinical and ecological samples. An emerging healthcare application for Nanopore includes point-of-care, timely identification of antibiotic resistance genes (ARGs) to help developing targeted treatments of bacterial infections, and monitoring resistant outbreaks in the environment. While several computational tools exist for classifying ARGs from sequencing data, to date (2022) none have been developed for mobile devices. We present here KARGAMobile, a mobile app for portable, real-time, easily interpretable analysis of ARGs from Nanopore sequencing. KARGAMobile is the porting of an existing ARG identification tool named KARGA; it retains the same algorithmic structure, but it is optimized for mobile devices. Specifically, KARGAMobile employs a compressed ARG reference database and different internal data structures to save RAM usage. The KARGAMobile app features a friendly graphical user interface that guides through file browsing, loading, parameter setup, and process execution. More importantly, the output files are post-processed to create visual, printable and shareable reports, aiding users to interpret the ARG findings. The difference in classification performance between KARGAMobile and KARGA is minimal (96.2% vs . 96.9% f-measure on semi-synthetic datasets of 1 million reads with known resistance ground truth). Using real Nanopore experiments, KARGAMobile processes on average 1 GB data every 23–48 min (targeted sequencing - metagenomics), with peak RAM usage below 500MB, independently from input file sizes, and an average temperature of 49°C after 1 h of continuous data processing. KARGAMobile is written in Java and is available at https://github.com/Ruiz-HCI-Lab/KargaMobile under the MIT license. 
    more » « less
  5. Abstract A colored de Bruijn graph (also called a set of k-mer sets), is a set of k-mers with every k-mer assigned a set of colors. Colored de Bruijn graphs are used in a variety of applications, including variant calling, genome assembly, and database search. However, their size has posed a scalability challenge to algorithm developers and users. There have been numerous indexing data structures proposed that allow to store the graph compactly while supporting fast query operations. However, disk compression algorithms, which do not need to support queries on the compressed data and can thus be more space-efficient, have received little attention. The dearth of specialized compression tools has been a detriment to tool developers, tool users, and reproducibility efforts. In this paper, we develop a new tool that compresses colored de Bruijn graphs to disk, building on previous ideas for compression of k-mer sets and indexing colored de Bruijn graphs. We test our tool, called ESS-color, on various datasets, including both sequencing data and whole genomes. ESS-color achieves better compression than all evaluated tools and all datasets, with no other tool able to consistently achieve less than 44% space overhead. The software is available athttp://github.com/medvedevgroup/ESSColor. 
    more » « less