Fast, Parallel, and Cache-Friendly Suffix Array Construction

Khan, Jamshed; Rubel, Tobias; Dhulipala, Laxman; Molloy, Erin; Patro, Rob

doi:10.4230/LIPIcs.WABI.2023.16

Citation Details

Fast, Parallel, and Cache-Friendly Suffix Array Construction

String indexes such as the suffix array (SA) and the closely related longest common prefix (LCP) array are fundamental objects in bioinformatics and have a wide variety of applications. Despite their importance in practice, few scalable parallel algorithms for constructing these are known, and the existing algorithms can be highly non-trivial to implement and parallelize. In this paper we present CaPS-SA, a simple and scalable parallel algorithm for constructing these string indexes inspired by samplesort. Due to its design, CaPS-SA has excellent memory-locality and thus incurs fewer cache misses and achieves strong performance on modern multicore systems with deep cache hierarchies. We show that despite its simple design, CaPS-SA outperforms existing state-of-the-art parallel SA and LCP-array construction algorithms on modern hardware. Finally, motivated by applications in modern aligners where the query strings have bounded lengths, we introduce the notion of a bounded-context SA and show that CaPS-SA can easily be extended to exploit this structure to obtain further speedups. more »

Award ID(s):: 1763680

PAR ID:: 10485536

Author(s) / Creator(s):: Khan, Jamshed; Rubel, Tobias; Dhulipala, Laxman; Molloy, Erin; Patro, Rob

Editor(s):: Belazzougui, Djamal; Ouangraoua, Aïda

Publisher / Repository:: Schloss Dagstuhl – Leibniz-Zentrum für Informatik

Date Published:: 2023-08-29

Journal Name:: 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)

Subject(s) / Keyword(s):: Suffix Array Longest Common Prefix Data Structures Indexing Parallel Algorithms Theory of computation → Sorting and searching

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.4230/LIPIcs.WABI.2023.16

More Like this