skip to main content

Title: Towards routine employment of computational tools for antimicrobial resistance determination via high-throughput sequencing

Antimicrobial resistance (AMR) is a growing threat to public health and farming at large. In clinical and veterinary practice, timely characterization of the antibiotic susceptibility profile of bacterial infections is a crucial step in optimizing treatment. High-throughput sequencing is a promising option for clinical point-of-care and ecological surveillance, opening the opportunity to develop genotyping-based AMR determination as a possibly faster alternative to phenotypic testing. In the present work, we compare the performance of state-of-the-art methods for detection of AMR using high-throughput sequencing data from clinical settings. We consider five computational approaches based on alignment (AMRPlusPlus), deep learning (DeepARG), k-mer genomic signatures (KARGA, ResFinder) or hidden Markov models (Meta-MARC). We use an extensive collection of 585 isolates with available AMR resistance profiles determined by phenotypic tests across nine antibiotic classes. We show how the prediction landscape of AMR classifiers is highly heterogeneous, with balanced accuracy varying from 0.40 to 0.92. Although some algorithms—ResFinder, KARGA and AMRPlusPlus—exhibit overall better balanced accuracy than others, the high per-AMR-class variance and related findings suggest that: (1) all algorithms might be subject to sampling bias both in data repositories used for training and experimental/clinical settings; and (2) a portion of clinical samples might contain uncharacterized AMR genes that the algorithms—mostly trained on known AMR genes—fail to generalize upon. These results lead us to formulate practical advice for software configuration and application, and give suggestions for future study designs to further develop AMR prediction tools from proof-of-concept to bedside.

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Briefings in Bioinformatics
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Antimicrobial resistance (AMR) is a global health concern. High-throughput metagenomic sequencing of microbial samples enables profiling of AMR genes through comparison with curated AMR databases. However, the performance of current methods is often hampered by database incompleteness and the presence of homology/homoplasy with other non-AMR genes in sequenced samples.


    We present AMR-meta, a database-free and alignment-free approach, based on k-mers, which combines algebraic matrix factorization into metafeatures with regularized regression. Metafeatures capture multi-level gene diversity across the main antibiotic classes. AMR-meta takes in reads from metagenomic shotgun sequencing and outputs predictions about whether those reads contribute to resistance against specific classes of antibiotics. In addition, AMR-meta uses an augmented training strategy that joins an AMR gene database with non-AMR genes (used as negative examples). We compare AMR-meta with AMRPlusPlus, DeepARG, and Meta-MARC, further testing their ensemble via a voting system. In cross-validation, AMR-meta has a median f-score of 0.7 (interquartile range, 0.2–0.9). On semi-synthetic metagenomic data—external test—on average AMR-meta yields a 1.3-fold hit rate increase over existing methods. In terms of run-time, AMR-meta is 3 times faster than DeepARG, 30 times faster than Meta-MARC, and as fast as AMRPlusPlus. Finally, we note that differences in AMR ontologies and observed variance of all tools in classification outputs call for further development on standardization of benchmarking data and protocols.


    AMR-meta is a fast, accurate classifier that exploits non-AMR negative sets to improve sensitivity and specificity. The differences in AMR ontologies and the high variance of all tools in classification outputs call for the deployment of standard benchmarking data and protocols, to fairly compare AMR prediction tools.

    more » « less
  2. Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. Curated ARG databases exist together with assorted ARG classification approaches (e.g., database alignment, machine learning). Besides ARGs that naturally occur in bacterial strains or are acquired through mobile elements, there are chromosomal genes that can render a bacterium resistant to antibiotics through point mutations, i.e., ARG variants (ARGVs). While ARG repositories also collect ARGVs, there are only a few tools that are able to identify ARGVs from metagenomics and high throughput sequencing data, with a number of limitations (e.g., pre-assembly,a posterioriverification of mutations, or specification of species). In this work we present thek-mer, i.e., strings of fixed lengthk, ARGV analyzer – KARGVA – an open-source, multi-platform tool that provides: (i) anad hoc, large ARGV database derived from multiple sources; (ii) input capability for various types of high-throughput sequencing data; (iii) a three-way, hash-based,k-mer search setup to process data efficiently, linkingk-mers to ARGVs,k-mers to point mutations, and ARGVs tok-mers, respectively; (iv) a statistical filter on sequence classification to reduce type I and II errors. On semi-synthetic data, KARGVA provides very high accuracy even in presence of high sequencing errors or mutations (99.2 and 86.6% accuracy within 1 and 5% base change rates, respectively), and genome rearrangements (98.2% accuracy), with robust performance onad hocfalse positive sets. On data from the worldwide MetaSUB consortium, comprising 3,700+ metagenomics experiments, KARGVA identifies more ARGVs than Resistance Gene Identifier (4.8x) and PointFinder (6.8x), yet all predictions are below the expected false positive estimates. The prevalence of ARGVs is correlated to ARGs but ecological characteristics do not explain well ARGV variance. KARGVA is publicly available at MIT license.

    more » « less
  3. Antimicrobial resistance (AMR) can develop in deep-pit swine manure storage when bacteria are selectively pressured by unmetabolized antibiotics. Subsequent manure application on row crops is then a source of AMR into soil and downstream runoff water. Therefore, understanding the patterns of diverse antibiotic resistance genes (ARGs) in manure among different farms is important for both interpreting the results of the detection of these genes from previous studies and for the use of these genes as bioindicators of manure borne antibiotic resistance in the environment. Previous studies of manure-associated ARGs are based on limited samples of manures. To better understand the distribution of ARGs between manures, we characterized manures from 48 geographically independent swine farms across Iowa. The objectives of this study were to characterize the distribution of ARGs among these manures and to evaluate what factors in manure management may influence the presence of ARGs in manures. Our analysis included quantification of two commonly found ARGs in swine manure, ermB and tetM . Additionally, we characterized a broader suite of 31 ARGs which allowed for simultaneous assays of the presence or absence of multiple genes. We found the company integrator had a significant effect on both ermB ( P=0.0007 ) and tetM gene concentrations ( P=0.0425 ). Our broad analysis on ARG profiles found that the tet(36) gene was broadly present in swine manures, followed by the detection of tetT , tetM , erm(35) , ermF , ermB , str , aadD , and intl3 in samples from 14 farms. Finally, we provide a comparison of methods to detect ARGs in manures, specifically comparing conventional and high-throughput qPCR and discuss their role in ARG environmental monitoring efforts. Results of this study provide insight into commonalities of ARG presence in manure holding pits and provide supporting evidence that company integrator decisions may impact ARG concentrations. 
    more » « less
  4. Marshall, Christopher W. (Ed.)
    ABSTRACT Identification of genes encoding β-lactamases (BLs) from short-read sequences remains challenging due to the high frequency of shared amino acid functional domains and motifs in proteins encoded by BL genes and related non-BL gene sequences. Divergent BL homologs can be frequently missed during similarity searches, which has important practical consequences for monitoring antibiotic resistance. To address this limitation, we built ROCker models that targeted broad classes (e.g., class A, B, C, and D) and individual families (e.g., TEM) of BLs and challenged them with mock 150-bp- and 250-bp-read data sets of known composition. ROCker identifies most-discriminant bit score thresholds in sliding windows along the sequence of the target protein sequence and hence can account for nondiscriminative domains shared by unrelated proteins. BL ROCker models showed a 0% false-positive rate (FPR), a 0% to 4% false-negative rate (FNR), and an up-to-50-fold-higher F1 score [2 × precision × recall/(precision + recall)] compared to alternative methods, such as similarity searches using BLASTx with various e-value thresholds and BL hidden Markov models, or tools like DeepARG, ShortBRED, and AMRFinder. The ROCker models and the underlying protein sequence reference data sets and phylogenetic trees for read placement are freely available through . Application of these BL ROCker models to metagenomics, metatranscriptomics, and high-throughput PCR gene amplicon data should facilitate the reliable detection and quantification of BL variants encoded by environmental or clinical isolates and microbiomes and more accurate assessment of the associated public health risk, compared to the current practice. IMPORTANCE Resistance genes encoding β-lactamases (BLs) confer resistance to the widely prescribed antibiotic class β-lactams. Therefore, it is important to assess the prevalence of BL genes in clinical or environmental samples for monitoring the spreading of these genes into pathogens and estimating public health risk. However, detecting BLs in short-read sequence data is technically challenging. Our ROCker model-based bioinformatics approach showcases the reliable detection and typing of BLs in complex data sets and thus contributes toward solving an important problem in antibiotic resistance surveillance. The ROCker models developed substantially expand the toolbox for monitoring antibiotic resistance in clinical or environmental settings. 
    more » « less
  5. The overuse of man-made antibiotics has facilitated the global propagation of antibiotic resistance genes in animals, across natural and anthropogenically disturbed environments. Although antibiotic treatment is the most well-studied route by which resistance genes can develop and spread within host-associated microbiota, resistomes also can be acquired or enriched via more indirect routes, such as via transmission between hosts or via contact with antibiotic-contaminated matter within the environment. Relatively little is known about the impacts of anthropogenic disturbance on reservoirs of resistance genes in wildlife and their environments. We therefore tested for (a) antibiotic resistance genes in primate hosts experiencing different severities and types of anthropogenic disturbance (i.e., non-wildlife animal presence, human presence, direct human contact, and antibiotic treatment), and (b) covariation between host-associated and environmental resistomes. We used shotgun metagenomic sequencing of ring-tailed lemur ( Lemur catta ) gut resistomes and associated soil resistomes sampled from up to 10 sites: seven in the wilderness of Madagascar and three in captivity in Madagascar or the United States. We found that, compared to wild lemurs, captive lemurs harbored greater abundances of resistance genes, but not necessarily more diverse resistomes. Abundances of resistance genes were positively correlated with our assessments of anthropogenic disturbance, a pattern that was robust across all ten lemur populations. The composition of lemur resistomes was site-specific and the types of resistance genes reflected antibiotic usage in the country of origin, such as vancomycin use in Madagascar. We found support for multiple routes of ARG enrichment (e.g., via human contact, antibiotic treatment, and environmental acquisition) that differed across lemur populations, but could result in similar degrees of enrichment. Soil resistomes varied across natural habitats in Madagascar and, at sites with greater anthropogenic disturbance, lemurs and soil resistomes covaried. As one of the broadest, single-species investigations of wildlife resistomes to date, we show that the transmission and enrichment of antibiotic resistance genes varies across environments, thereby adding to the mounting evidence that the resistance crisis extends outside of traditional clinical settings. 
    more » « less