Tunnel: Parallel-inducing sort for large string analytics

Du, Zhihui; Zhang, Sen; Bader, David A.

doi:10.1016/j.future.2023.08.009

Citation Details

Tunnel: Parallel-inducing sort for large string analytics

The suffix array is a crucial data structure for efficient string analysis. Over the course of twenty-six years, sequential suffix array construction algorithms have achieved O(n) time complexity and in-place sorting. In this paper, we present the Tunnel algorithm, the first large-scale parallel suffix array construction algorithm with a time complexity of O(n/p) based on the parallel random access machine (PRAM) model. The Tunnel algorithm is built on three key ideas: dividing the problem of size O(n) into p sub-problems of reduced size O(n/p) by replacing long suffixes with shorter prefixes of size at most a constant D ; introducing a Tunnel mechanism to efficiently induce the order of a set of suffixes with long common prefixes; developing a strategy to transform a partially ordered suffix set into a total order relation by iteratively applying the Tunnel inducing method. We provide a detailed description of the algorithm, along with a thorough analysis of its time and space complexity, to demonstrate its correctness and efficiency. The proposed Tunnel algorithm exhibits scalable performance, making it suitable for large string analytics on large-scale parallel systems. more »

Award ID(s):: 2109988

PAR ID:: 10477196

Author(s) / Creator(s):: Du, Zhihui; Zhang, Sen; Bader, David A.

Publisher / Repository:: Elsevier

Date Published:: 2023-12-01

Journal Name:: Future Generation Computer Systems

Volume:: 149

Issue:: C

ISSN:: 0167-739X

Page Range / eLocation ID:: 650 to 663

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Journal Article:
https://doi.org/10.1016/j.future.2023.08.009

More Like this