<?xml version="1.0" encoding="UTF-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcq="http://purl.org/dc/terms/"><records count="1" morepages="false" start="1" end="1"><record rownumber="1"><dc:product_type>Journal Article</dc:product_type><dc:title>Sigmoni: classification of nanopore signal with a compressed pangenome index</dc:title><dc:creator>Shivakumar, Vikram S; Ahmed, Omar Y; Kovaka, Sam; Zakeri, Mohsen; Langmead, Ben</dc:creator><dc:corporate_author/><dc:editor/><dc:description>Summary: Improvements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck of basecalling. But past methods for signal-based classification do not scale efficiently to large, repetitive references like pangenomes, limiting their utility to partial references or individual genomes. We introduce Sigmoni: a rapid, multiclass classification method based on the r-index that scales to references of hundreds of Gbps. Sigmoni quantizes nanopore signal into a discrete alphabet of picoamp ranges. It performs rapid, approximate matching using matching statistics, classifying reads based on distributions of picoamp matching statistics and co-linearity statistics, all in linear query time without the need for seed-chain-extend. Sigmoni is 10–100× faster than previous methods for adaptive sampling in host depletion experiments with improved accuracy, and can query reads against large microbial or human pangenomes. Sigmoni is the first signal-based tool to scale to a complete human genome and pangenome while remaining fast enough for adaptive sampling applications.
Availability and implementation: Sigmoni is implemented in Python, and is available open-source at https://github.com/vshiv18/sigmoni</dc:description><dc:publisher>Oxford Press</dc:publisher><dc:date>2024-06-28</dc:date><dc:nsf_par_id>10609224</dc:nsf_par_id><dc:journal_name>Bioinformatics</dc:journal_name><dc:journal_volume>40</dc:journal_volume><dc:journal_issue>Supplement_1</dc:journal_issue><dc:page_range_or_elocation>i287 to i296</dc:page_range_or_elocation><dc:issn>1367-4803</dc:issn><dc:isbn/><dc:doi>https://doi.org/10.1093/bioinformatics/btae213</dc:doi><dcq:identifierAwardId>2029552</dcq:identifierAwardId><dc:subject/><dc:version_number/><dc:location/><dc:rights/><dc:institution/><dc:sponsoring_org>National Science Foundation</dc:sponsoring_org></record></records></rdf:RDF>