Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2

Khan, Jamshed (ORCID:0000000251299749); Kokot, Marek (ORCID:0000000264201587); Deorowicz, Sebastian (ORCID:000000029496733X); Patro, Rob (ORCID:0000000184631675)

doi:10.1186/s13059-022-02743-6

Citation Details

Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2

Abstract

The de Bruijn graph is a key data structure in modern computational genomics, and construction of its compacted variant resides upstream of many genomic analyses. As the quantity of genomic data grows rapidly, this often forms a computational bottleneck. We present Cuttlefish 2, significantly advancing the state-of-the-art for this problem. On a commodity server, it reduces the graph construction time for 661K bacterial genomes, of size 2.58Tbp, from 4.5 days to 17–23 h; and it constructs the graph for 1.52Tbp white spruce reads in approximately 10 h, while the closest competitor requires 54–58 h, using considerably more memory.

Award ID(s):: 2029424 1763680

NSF-PAR ID:: 10370769

Author(s) / Creator(s):: Khan, Jamshed; Kokot, Marek; Deorowicz, Sebastian; Patro, Rob

Publisher / Repository:: Springer Science + Business Media

Date Published:: 2022-09-08

Journal Name:: Genome Biology

Volume:: 23

Issue:: 1

ISSN:: 1474-760X

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1186/s13059-022-02743-6

More Like this