skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: iGenomics: Comprehensive DNA sequence analysis on your Smartphone
Abstract BackgroundFollowing the miniaturization of integrated circuitry and other computer hardware over the past several decades, DNA sequencing is on a similar path. Leading this trend is the Oxford Nanopore sequencing platform, which currently offers the hand-held MinION instrument and even smaller instruments on the horizon. This technology has been used in several important applications, including the analysis of genomes of major pathogens in remote stations around the world. However, despite the simplicity of the sequencer, an equally simple and portable analysis platform is not yet available. ResultsiGenomics is the first comprehensive mobile genome analysis application, with capabilities to align reads, call variants, and visualize the results entirely on an iOS device. Implemented in Objective-C using the FM-index, banded dynamic programming, and other high-performance bioinformatics techniques, iGenomics is optimized to run in a mobile environment. We benchmark iGenomics using a variety of real and simulated Nanopore sequencing datasets of viral and bacterial genomes and show that iGenomics has performance comparable to the popular BWA-MEM/SAMtools/IGV suite, without necessitating a laptop or server cluster. ConclusionsiGenomics is available open source (https://github.com/stuckinaboot/iGenomics) and for free on Apple's App Store (https://apple.co/2HCplzr).  more » « less
Award ID(s):
1350041
PAR ID:
10555388
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
GigaScience
Volume:
9
Issue:
12
ISSN:
2047-217X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Alkan, Can (Ed.)
    Abstract SummaryWith the rapid development of long-read sequencing technologies, the era of individual complete genomes is approaching. We have developed wgatools, a cross-platform, ultrafast toolkit that supports a range of whole-genome alignment formats, offering practical tools for conversion, processing, evaluation, and visualization of alignments, thereby facilitating population-level genome analysis and advancing functional and evolutionary genomics. Availability and implementationwgatools supports diverse formats and can process, filter, and statistically evaluate alignments, perform alignment-based variant calling, and visualize alignments both locally and genome-wide. Built with Rust for efficiency and safe memory usage, it ensures fast performance and can handle large datasets consisting of hundreds of genomes. wgatools is published as free software under the MIT open-source license, and its source code is freely available at https://github.com/wjwei-handsome/wgatools and https://zenodo.org/records/14882797. 
    more » « less
  2. Abstract BackgroundIn light of the current biodiversity crisis, DNA barcoding is developing into an essential tool to quantify state shifts in global ecosystems. Current barcoding protocols often rely on short amplicon sequences, which yield accurate identification of biological entities in a community but provide limited phylogenetic resolution across broad taxonomic scales. However, the phylogenetic structure of communities is an essential component of biodiversity. Consequently, a barcoding approach is required that unites robust taxonomic assignment power and high phylogenetic utility. A possible solution is offered by sequencing long ribosomal DNA (rDNA) amplicons on the MinION platform (Oxford Nanopore Technologies). FindingsUsing a dataset of various animal and plant species, with a focus on arthropods, we assemble a pipeline for long rDNA barcode analysis and introduce a new software (MiniBar) to demultiplex dual indexed Nanopore reads. We find excellent phylogenetic and taxonomic resolution offered by long rDNA sequences across broad taxonomic scales. We highlight the simplicity of our approach by field barcoding with a miniaturized, mobile laboratory in a remote rainforest. We also test the utility of long rDNA amplicons for analysis of community diversity through metabarcoding and find that they recover highly skewed diversity estimates. ConclusionsSequencing dual indexed, long rDNA amplicons on the MinION platform is a straightforward, cost-effective, portable, and universal approach for eukaryote DNA barcoding. Although bulk community analyses using long-amplicon approaches may introduce biases, the long rDNA amplicons approach signifies a powerful tool for enabling the accurate recovery of taxonomic and phylogenetic diversity across biological communities. 
    more » « less
  3. Nanopore technology enables portable, real-time sequencing of microbial populations from clinical and ecological samples. An emerging healthcare application for Nanopore includes point-of-care, timely identification of antibiotic resistance genes (ARGs) to help developing targeted treatments of bacterial infections, and monitoring resistant outbreaks in the environment. While several computational tools exist for classifying ARGs from sequencing data, to date (2022) none have been developed for mobile devices. We present here KARGAMobile, a mobile app for portable, real-time, easily interpretable analysis of ARGs from Nanopore sequencing. KARGAMobile is the porting of an existing ARG identification tool named KARGA; it retains the same algorithmic structure, but it is optimized for mobile devices. Specifically, KARGAMobile employs a compressed ARG reference database and different internal data structures to save RAM usage. The KARGAMobile app features a friendly graphical user interface that guides through file browsing, loading, parameter setup, and process execution. More importantly, the output files are post-processed to create visual, printable and shareable reports, aiding users to interpret the ARG findings. The difference in classification performance between KARGAMobile and KARGA is minimal (96.2% vs . 96.9% f-measure on semi-synthetic datasets of 1 million reads with known resistance ground truth). Using real Nanopore experiments, KARGAMobile processes on average 1 GB data every 23–48 min (targeted sequencing - metagenomics), with peak RAM usage below 500MB, independently from input file sizes, and an average temperature of 49°C after 1 h of continuous data processing. KARGAMobile is written in Java and is available at https://github.com/Ruiz-HCI-Lab/KargaMobile under the MIT license. 
    more » « less
  4. Alkan, Can (Ed.)
    Abstract MotivationDetection of structural variants (SVs) from the alignment of sample DNA reads to the reference genome is an important problem in understanding human diseases. Long reads that can span repeat regions, along with an accurate alignment of these long reads play an important role in identifying novel SVs. Long-read sequencers, such as nanopore sequencing, can address this problem by providing very long reads but with high error rates, making accurate alignment challenging. Many errors induced by nanopore sequencing have a bias because of the physics of the sequencing process and proper utilization of these error characteristics can play an important role in designing a robust aligner for SV detection problems. In this article, we design and evaluate HQAlign, an aligner for SV detection using nanopore sequenced reads. The key ideas of HQAlign include (i) using base-called nanopore reads along with the nanopore physics to improve alignments for SVs, (ii) incorporating SV-specific changes to the alignment pipeline, and (iii) adapting these into existing state-of-the-art long-read aligner pipeline, minimap2 (v2.24), for efficient alignments. ResultsWe show that HQAlign captures about 4%–6% complementary SVs across different datasets, which are missed by minimap2 alignments while having a standalone performance at par with minimap2 for real nanopore reads data. For the common SV calls between HQAlign and minimap2, HQAlign improves the start and the end breakpoint accuracy by about 10%–50% for SVs across different datasets. Moreover, HQAlign improves the alignment rate to 89.35% from minimap2 85.64% for nanopore reads alignment to recent telomere-to-telomere CHM13 assembly, and it improves to 86.65% from 83.48% for nanopore reads alignment to GRCh37 human genome. Availability and implementationhttps://github.com/joshidhaivat/HQAlign.git. 
    more » « less
  5. Abstract The amount of data produced by genome sequencing experiments has been growing rapidly over the past several years, making compression important for efficient storage, transfer and analysis of the data. In recent years, nanopore sequencing technologies have seen increasing adoption since they are portable, real-time and provide long reads. However, there has been limited progress on compression of nanopore sequencing reads obtained in FASTQ files since most existing tools are either general-purpose or specialized for short read data. We present NanoSpring, a reference-free compressor for nanopore sequencing reads, relying on an approximate assembly approach. We evaluate NanoSpring on a variety of datasets including bacterial, metagenomic, plant, animal, and human whole genome data. For recently basecalled high quality nanopore datasets, NanoSpring, which focuses only on the base sequences in the FASTQ file, uses just 0.35–0.65 bits per base which is 3–6$$\times$$ × lower than general purpose compressors like gzip. NanoSpring is competitive in compression ratio and compression resource usage with the state-of-the-art tool CoLoRd while being significantly faster at decompression when using multiple threads (> 4$$\times$$ × faster decompression with 20 threads). NanoSpring is available on GitHub athttps://github.com/qm2/NanoSpring. 
    more » « less