skip to main content


Title: Coriolis: enabling metagenomic classification on lightweight mobile devices
Abstract Motivation

The introduction of portable DNA sequencers such as the Oxford Nanopore Technologies MinION has enabled real-time and in the field DNA sequencing. However, in the field sequencing is actionable only when coupled with in the field DNA classification. This poses new challenges for metagenomic software since mobile deployments are typically in remote locations with limited network connectivity and without access to capable computing devices.

Results

We propose new strategies to enable in the field metagenomic classification on mobile devices. We first introduce a programming model for expressing metagenomic classifiers that decomposes the classification process into well-defined and manageable abstractions. The model simplifies resource management in mobile setups and enables rapid prototyping of classification algorithms. Next, we introduce the compact string B-tree, a practical data structure for indexing text in external storage, and we demonstrate its viability as a strategy to deploy massive DNA databases on memory-constrained devices. Finally, we combine both solutions into Coriolis, a metagenomic classifier designed specifically to operate on lightweight mobile devices. Through experiments with actual MinION metagenomic reads and a portable supercomputer-on-a-chip, we show that compared with the state-of-the-art solutions Coriolis offers higher throughput and lower resource consumption without sacrificing quality of classification.

Availability and implementation

Source code and test data are available from http://score-group.org/?id=smarten.

 
more » « less
Award ID(s):
1910193
NSF-PAR ID:
10427255
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
39
Issue:
Supplement_1
ISSN:
1367-4803
Page Range / eLocation ID:
p. i66-i75
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. With the emergence of portable DNA sequencers, such as Oxford Nanopore Technology MinION, metagenomic DNA sequencing can be performed in real-time and directly in the field. However, because metagenomic DNA analysis is computationally and memory intensive, and the current methods are designed for batch processing, the current metagenomic tools are not well suited for mobile devices. In this paper, we propose a new memory-efficient method to identify Operational Taxonomic Units (OTUs) in metagenomic DNA streams. Our method is based on finding connected components in overlap graphs constructed over a real-time stream of long DNA reads as produced by MinION platform. We propose an efficient algorithm to maintain connected components when an overlap graph is streamed, and show how redundant information can be removed from the stream by transitive closures. Through experiments on simulated and real-world metagenomic data, we demonstrate that the resulting solution is able to recover OTUs with high precision while remaining suitable for mobile computing devices. 
    more » « less
  2. null (Ed.)
    Abstract Background Following the miniaturization of integrated circuitry and other computer hardware over the past several decades, DNA sequencing is on a similar path. Leading this trend is the Oxford Nanopore sequencing platform, which currently offers the hand-held MinION instrument and even smaller instruments on the horizon. This technology has been used in several important applications, including the analysis of genomes of major pathogens in remote stations around the world. However, despite the simplicity of the sequencer, an equally simple and portable analysis platform is not yet available. Results iGenomics is the first comprehensive mobile genome analysis application, with capabilities to align reads, call variants, and visualize the results entirely on an iOS device. Implemented in Objective-C using the FM-index, banded dynamic programming, and other high-performance bioinformatics techniques, iGenomics is optimized to run in a mobile environment. We benchmark iGenomics using a variety of real and simulated Nanopore sequencing datasets of viral and bacterial genomes and show that iGenomics has performance comparable to the popular BWA-MEM/SAMtools/IGV suite, without necessitating a laptop or server cluster. Conclusions iGenomics is available open source (https://github.com/stuckinaboot/iGenomics) and for free on Apple's App Store (https://apple.co/2HCplzr). 
    more » « less
  3. Abstract Background

    Total DNA (intracellular, iDNA and extracellular, eDNA) from ancient permafrost records the mixed genetic repository of the past and present microbial populations through geological time. Given the exceptional preservation of eDNA under perennial frozen conditions, typical metagenomic sequencing of total DNA precludes the discrimination between fossil and living microorganisms in ancient cryogenic environments. DNA repair protocols were combined with high throughput sequencing (HTS) of separate iDNA and eDNA fraction to reconstruct metagenome-assembled genomes (MAGs) from ancient microbial DNA entrapped in Siberian coastal permafrost.

    Results

    Despite the severe DNA damage in ancient permafrost, the coupling of DNA repair and HTS resulted in a total of 52 MAGs from sediments across a chronosequence (26–120 kyr). These MAGs were compared with those derived from the same samples but without utilizing DNA repair protocols. The MAGs from the youngest stratum showed minimal DNA damage and thus likely originated from viable, active microbial species. Many MAGs from the older and deeper sediment appear related to past aerobic microbial populations that had died upon freezing. MAGs from anaerobic lineages, includingAsgardarchaea, however exhibited minimal DNA damage and likely represent extant living microorganisms that have become adapted to the cryogenic and anoxic environments. The integration of aspartic acid racemization modeling and metaproteomics further constrained the metabolic status of the living microbial populations. Collectively, combining DNA repair protocols with HTS unveiled the adaptive strategies of microbes to long-term survivability in ancient permafrost.

    Conclusions

    Our results indicated that coupling of DNA repair protocols with simultaneous sequencing of iDNA and eDNA fractions enabled the assembly of MAGs from past and living microorganisms in ancient permafrost. The genomic reconstruction from the past and extant microbial populations expanded our understanding about the microbial successions and biogeochemical alterations from the past paleoenvironment to the present-day frozen state. Furthermore, we provided genomic insights into long-term survival mechanisms of microorganisms under cryogenic conditions through geological time. The combined strategies in this study can be extrapolated to examine other ancient non-permafrost environments and constrain the search for past and extant extraterrestrial life in permafrost and ice deposits on Mars.

     
    more » « less
  4. Abstract Background

    With the advent of metagenomics, the importance of microorganisms and how their interactions are relevant to ecosystem resilience, sustainability, and human health has become evident. Cataloging and preserving biodiversity is paramount not only for the Earth’s natural systems but also for discovering solutions to challenges that we face as a growing civilization. Metagenomics pertains to the in silico study of all microorganisms within an ecological community in situ,however, many software suites recover only prokaryotes and have limited to no support for viruses and eukaryotes.

    Results

    In this study, we introduce theViral Eukaryotic Bacterial Archaeal(VEBA) open-source software suite developed to recover genomes from all domains. To our knowledge,VEBAis the first end-to-end metagenomics suite that can directly recover, quality assess, and classify prokaryotic, eukaryotic, and viral genomes from metagenomes.VEBAimplements a novel iterative binning procedure and hybrid sample-specific/multi-sample framework that yields more genomes than any existing methodology alone.VEBAincludes a consensus microeukaryotic database containing proteins from existing databases to optimize microeukaryotic gene modeling and taxonomic classification.VEBAalso provides a unique clustering-based dereplication strategy allowing for sample-specific genomes and genes to be directly compared across non-overlapping biological samples. Finally,VEBAis the only pipeline that automates the detection of candidate phyla radiation bacteria and implements the appropriate genome quality assessments.VEBA’s capabilities are demonstrated by reanalyzing 3 existing public datasets which recovered a total of 948 MAGs (458 prokaryotic, 8 eukaryotic, and 482 viral) including several uncharacterized organisms and organisms with no public genome representatives.

    Conclusions

    TheVEBAsoftware suite allows for the in silico recovery of microorganisms from all domains of life by integrating cutting edge algorithms in novel ways.VEBAfully integrates both end-to-end and task-specific metagenomic analysis in a modular architecture that minimizes dependencies and maximizes productivity. The contributions ofVEBAto the metagenomics community includes seamless end-to-end metagenomics analysis but also provides users with the flexibility to perform specific analytical tasks.VEBAallows for the automation of several metagenomics steps and shows that new information can be recovered from existing datasets.

     
    more » « less
  5. Nanopore technology enables portable, real-time sequencing of microbial populations from clinical and ecological samples. An emerging healthcare application for Nanopore includes point-of-care, timely identification of antibiotic resistance genes (ARGs) to help developing targeted treatments of bacterial infections, and monitoring resistant outbreaks in the environment. While several computational tools exist for classifying ARGs from sequencing data, to date (2022) none have been developed for mobile devices. We present here KARGAMobile, a mobile app for portable, real-time, easily interpretable analysis of ARGs from Nanopore sequencing. KARGAMobile is the porting of an existing ARG identification tool named KARGA; it retains the same algorithmic structure, but it is optimized for mobile devices. Specifically, KARGAMobile employs a compressed ARG reference database and different internal data structures to save RAM usage. The KARGAMobile app features a friendly graphical user interface that guides through file browsing, loading, parameter setup, and process execution. More importantly, the output files are post-processed to create visual, printable and shareable reports, aiding users to interpret the ARG findings. The difference in classification performance between KARGAMobile and KARGA is minimal (96.2% vs . 96.9% f-measure on semi-synthetic datasets of 1 million reads with known resistance ground truth). Using real Nanopore experiments, KARGAMobile processes on average 1 GB data every 23–48 min (targeted sequencing - metagenomics), with peak RAM usage below 500MB, independently from input file sizes, and an average temperature of 49°C after 1 h of continuous data processing. KARGAMobile is written in Java and is available at https://github.com/Ruiz-HCI-Lab/KargaMobile under the MIT license. 
    more » « less