Sigmoni: classification of nanopore signal with a compressed pangenome index

Shivakumar, Vikram_S; Ahmed, Omar_Y (ORCID:0000000299338508); Kovaka, Sam; Zakeri, Mohsen; Langmead, Ben (ORCID:0000000324371976)

doi:10.1093/bioinformatics/btae213

Citation Details

Sigmoni: classification of nanopore signal with a compressed pangenome index

Abstract SummaryImprovements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck of basecalling. But past methods for signal-based classification do not scale efficiently to large, repetitive references like pangenomes, limiting their utility to partial references or individual genomes. We introduce Sigmoni: a rapid, multiclass classification method based on the r-index that scales to references of hundreds of Gbps. Sigmoni quantizes nanopore signal into a discrete alphabet of picoamp ranges. It performs rapid, approximate matching using matching statistics, classifying reads based on distributions of picoamp matching statistics and co-linearity statistics, all in linear query time without the need for seed-chain-extend. Sigmoni is 10–100× faster than previous methods for adaptive sampling in host depletion experiments with improved accuracy, and can query reads against large microbial or human pangenomes. Sigmoni is the first signal-based tool to scale to a complete human genome and pangenome while remaining fast enough for adaptive sampling applications. Availability and implementationSigmoni is implemented in Python, and is available open-source at https://github.com/vshiv18/sigmoni. more »

Award ID(s):: 2029552

PAR ID:: 10518290

Author(s) / Creator(s):: Shivakumar, Vikram_S; Ahmed, Omar_Y; Kovaka, Sam; Zakeri, Mohsen; Langmead, Ben

Publisher / Repository:: Oxford University Press

Date Published:: 2024-06-28

Journal Name:: Bioinformatics

Volume:: 40

Issue:: Supplement_1

ISSN:: 1367-4803

Format(s):: Medium: X Size: p. i287-i296

Size(s):: p. i287-i296

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1093/bioinformatics/btae213

More Like this