Text Indexing for Faster Gapped Pattern Matching

Hossen, Md Helal; Gibney, Daniel; Thankachan, Sharma V

doi:10.3390/a17120537

Citation Details

Text Indexing for Faster Gapped Pattern Matching

We revisit the following version of the Gapped String Indexing problem, where the goal is to preprocess a text T[1..n] to enable efficient reporting of all occ occurrences of a gapped pattern P=P1[α..β]P2 in T. An occurrence of P in T is defined as a pair (i,j) where substrings T[i..i+|P1|) and T[j..j+|P2|) match P1 and P2, respectively, with a gap j−(i+|P1|) lying within the interval [α..β]. This problem has significant applications in computational biology and text mining. A hardness result on this problem suggests that any index with polylogarithmic query time must occupy near quadratic space. In a recent study [STACS 2024], Bille et al. presented a sub-quadratic space index using space O˜(n2−δ/3), where 0≤δ≤1 is a parameter fixed at the time of index construction. Its query time is O˜(|P1|+|P2|+nδ·(1+occ)), which is sub-linear per occurrence when δ<1. We show how to achieve a gap-sensitive query time of O˜(|P1|+|P2|+nδ·(1+occ1−δ)+∑g∈[α..β]occg·gδ) using the same space, where occg denotes the number of occurrences with gap g. This is faster when there are many occurrences with small gaps. more »

Award ID(s):: 2315822

PAR ID:: 10614559

Author(s) / Creator(s):: Hossen, Md Helal; Gibney, Daniel; Thankachan, Sharma V

Publisher / Repository:: MDPI

Date Published:: 2024-12-01

Journal Name:: Algorithms

Volume:: 17

Issue:: 12

ISSN:: 1999-4893

Page Range / eLocation ID:: 537

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.3390/a17120537

More Like this