Efficiently Merging r-indexes

Oliva, Marco; Rossi, Massimiliano; Siren, Jouni; Manzini, Giovanni; Kahveci, Tamer; Gagie, Travis; Boucher, Christina

doi:10.1109/DCC50243.2021.00028

Citation Details

Efficiently Merging r-indexes

Large sequencing projects, such as GenomeTrakr and MetaSub, are updated frequently (sometimes daily, in the case of GenomeTrakr) with new data. Therefore, it is imperative that any data structure indexing such data supports efficient updates. Toward this goal, Bannai et al. (TCS, 2020) proposed a data structure named dynamic r-index which is suitable for large genome collections and supports incremental construction; however, it is still not powerful enough to support substantial updates. Here, we develop a novel algorithm for updating the r-index, which we refer to as RIMERGE. Fundamental to our algorithm is the combination of the basics of the dynamic r-index with a known algorithm for merging Burrows-Wheeler Transforms (BWTs). As a result, RIMERGE is capable of performing batch updates in a manner that exploits parallelism while keeping the memory overhead small. We compare our method to the dynamic r-index of Bannai et al. using two different datasets, and show that RIMERGE is between 1.88 to 5.34 times faster on reasonably large inputs. more »

Award ID(s):: 2029552

PAR ID:: 10276036

Author(s) / Creator(s):: Oliva, Marco; Rossi, Massimiliano; Siren, Jouni; Manzini, Giovanni; Kahveci, Tamer; Gagie, Travis; Boucher, Christina

Date Published:: 2021-03-01

Journal Name:: 2021 Data Compression Conference (DCC)

Page Range / eLocation ID:: 203 to 212

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/DCC50243.2021.00028

More Like this