Constructing the CDAWG CFG using LCP-Intervals

Cleary, Alan M.; Dood, Jordan

doi:10.1109/DCC55655.2023.00026

Citation Details

Constructing the CDAWG CFG using LCP-Intervals

It is known that a context-free grammar (CFG) that produces a single string can be derived from the compact directed acyclic word graph (CDAWG) for the same string. In this work, we show that the CFG derived from a CDAWG is deeply connected to the maximal repeat content of the string it produces and thus has O(m) rules, where m is the number of maximal repeats in the string. We then provide a generic algorithm based on this insight for constructing the CFG from the LCP-intervals of a string in O(n) time, where n is the length of the string. This includes a novel data-structure to support stabbing queries on LCPintervals in O(1+k) time after O(n) preprocessing time, where k is the number of intervals stabbed. These results connect the CFG to properties of the string it produces and relates it to other string data-structures, allowing it to be studied independently of the CDAWG and providing opportunity for innovation of grammar-based compression algorithms. more »

Award ID(s):: 2105391

PAR ID:: 10429842

Author(s) / Creator(s):: Cleary, Alan M.; Dood, Jordan

Date Published:: 2023-03-01

Journal Name:: 2023 Data Compression Conference

Page Range / eLocation ID:: 178 to 187

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/DCC55655.2023.00026

More Like this