skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Constructing the CDAWG CFG using LCP-Intervals
It is known that a context-free grammar (CFG) that produces a single string can be derived from the compact directed acyclic word graph (CDAWG) for the same string. In this work, we show that the CFG derived from a CDAWG is deeply connected to the maximal repeat content of the string it produces and thus has O(m) rules, where m is the number of maximal repeats in the string. We then provide a generic algorithm based on this insight for constructing the CFG from the LCP-intervals of a string in O(n) time, where n is the length of the string. This includes a novel data-structure to support stabbing queries on LCPintervals in O(1+k) time after O(n) preprocessing time, where k is the number of intervals stabbed. These results connect the CFG to properties of the string it produces and relates it to other string data-structures, allowing it to be studied independently of the CDAWG and providing opportunity for innovation of grammar-based compression algorithms.  more » « less
Award ID(s):
2105391
PAR ID:
10429842
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2023 Data Compression Conference
Page Range / eLocation ID:
178 to 187
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Gørtz, Inge Li; Farach-Colton, Martin; Puglisi, Simon J; Herman, Grzegorz (Ed.)
    We give the first almost-linear time algorithm for computing the maximal k-edge-connected subgraphs of an undirected unweighted graph for any constant k. More specifically, given an n-vertex m-edge graph G = (V,E) and a number k = log^o(1) n, we can deterministically compute in O(m+n^{1+o(1)}) time the unique vertex partition {V_1,… ,V_z} such that, for every i, V_i induces a k-edge-connected subgraph while every superset V'_i ⊃ V_{i} does not. Previous algorithms with linear time work only when k ≤ 2 [Tarjan SICOMP'72], otherwise they all require Ω(m+n√n) time even when k = 3 [Chechik et al. SODA'17; Forster et al. SODA'20]. Our algorithm also extends to the decremental graph setting; we can deterministically maintain the maximal k-edge-connected subgraphs of a graph undergoing edge deletions in m^{1+o(1)} total update time. Our key idea is a reduction to the dynamic algorithm supporting pairwise k-edge-connectivity queries [Jin and Sun FOCS'20]. 
    more » « less
  2. Nissim, K.; Waters, B. (Ed.)
    Recent new constructions of rate-1 OT [Döttling, Garg, Ishai, Malavolta, Mour, and Ostrovsky, CRYPTO 2019] have brought this primitive under the spotlight and the techniques have led to new feasibility results for private-information retrieval, and homomorphic encryption for branching programs. The receiver communication of this construction consists of a quadratic (in the sender's input size) number of group elements for a single instance of rate-1 OT. Recently [Garg, Hajiabadi, Ostrovsky, TCC 2020] improved the receiver communication to a linear number of group elements for a single string-OT. However, most applications of rate-1 OT require executing it multiple times, resulting in large communication costs for the receiver. In this work, we introduce a new technique for amortizing the cost of multiple rate-1 OTs. Specifically, based on standard pairing assumptions, we obtain a two-message rate-1 OT protocol for which the amortized cost per string-OT is asymptotically reduced to only four group elements. Our results lead to significant communication improvements in PSI and PIR, special cases of SFE for branching programs. - PIR: We obtain a rate-1 PIR scheme with client communication cost of $$O(\lambda\cdot\log N)$$ group elements for security parameter $$\lambda$$ and database size $$N$$. Notably, after a one-time setup (or one PIR instance), any following PIR instance only requires communication cost $$O(\log N)$$ number of group elements. - PSI with unbalanced inputs: We apply our techniques to private set intersection with unbalanced set sizes (where the receiver has a smaller set) and achieve receiver communication of $$O((m+\lambda) \log N)$$ group elements where $m, N$ are the sizes of the receiver and sender sets, respectively. Similarly, after a one-time setup (or one PSI instance), any following PSI instance only requires communication cost $$O(m \cdot \log N)$$ number of group elements. All previous sublinear-communication non-FHE based PSI protocols for the above unbalanced setting were also based on rate-1 OT, but incurred at least $$O(\lambda^2 m \log N)$$ group elements. 
    more » « less
  3. Mutzel, Petra and (Ed.)
    Grammar compression is, next to Lempel-Ziv (LZ77) and run-length Burrows-Wheeler transform (RLBWT), one of the most flexible approaches to representing and processing highly compressible strings. The main idea is to represent a text as a context-free grammar whose language is precisely the input string. This is called a straight-line grammar (SLG). An AVL grammar, proposed by Rytter [Theor. Comput. Sci., 2003] is a type of SLG that additionally satisfies the AVL property: the heights of parse trees for children of every nonterminal differ by at most one. In contrast to other SLG constructions, AVL grammars can be constructed from the LZ77 parsing in compressed time: 𝒪(z log n) where z is the size of the LZ77 parsing and n is the length of the input text. Despite these advantages, AVL grammars are thought to be too large to be practical. We present a new technique for rapidly constructing a small AVL grammar from an LZ77 or LZ77-like parse. Our algorithm produces grammars that are always at least five times smaller than those produced by the original algorithm, and usually not more than double the size of grammars produced by the practical Re-Pair compressor [Larsson and Moffat, Proc. IEEE, 2000]. Our algorithm also achieves low peak RAM usage. By combining this algorithm with recent advances in approximating the LZ77 parsing, we show that our method has the potential to construct a run-length BWT in about one third of the time and peak RAM required by other approaches. Overall, we show that AVL grammars are surprisingly practical, opening the door to much faster construction of key compressed data structures 
    more » « less
  4. A Simple Temporal Network with Uncertainty (STNU) is a data structure for reasoning about time constraints on actions that may have uncertain durations. An STNU is dispatchable if it can be executed in real-time with minimal computation 1) satisfying all constraints no matter how the uncertain durations play out and 2) retaining maximum flexibility. The fastest known algorithm for converting STNUs into dispatchable form runs in O(n3) time, where n is the number of timepoints. This paper presents a faster algorithm that runs in O(mn + kn2 + n2 logn) time, where m is the number of edges and k is the number of uncertain durations. This performance is particularly meaningful in fields like Business Process Management, where sparse STNUs can represent temporal processes or plans. For sparse STNUs, our algorithm generates dispatchable forms in time O(n2 logn), a significant improvement over the O(n3)-time previous fastest algorithm. 
    more » « less
  5. We present new algorithms for computing many faces in arrangements of lines and segments. Given a set $$S$$ of $$n$$ lines (resp., segments) and a set $$P$$ of $$m$$ points in the plane, the problem is to compute the faces of the arrangements of $$S$$ that contain at least one point of $$P$$. For the line case, we give a deterministic algorithm of $$O(m^{2/3}n^{2/3}\log^{2/3} (n/\sqrt{m})+(m+n)\log n)$$ time. This improves the previously best deterministic algorithm [Agarwal, 1990] by a factor of $$\log^{2.22}n$$ and improves the previously best randomized algorithm [Agarwal, Matoušek, and Schwarzkopf, 1998] by a factor of $$\log^{1/3}n$$ in certain cases (e.g., when $$m=\Theta(n)$$). For the segment case, we present a deterministic algorithm of $$O(n^{2/3}m^{2/3}\log n+\tau(n\alpha^2(n)+n\log m+m)\log n)$$ time, where $$\tau=\min\{\log m,\log (n/\sqrt{m})\}$$ and $$\alpha(n)$$ is the inverse Ackermann function. This improves the previously best deterministic algorithm [Agarwal, 1990] by a factor of $$\log^{2.11}n$$ and improves the previously best randomized algorithm [Agarwal, Matoušek, and Schwarzkopf, 1998] by a factor of $$\log n$$ in certain cases (e.g., when $$m=\Theta(n)$$). We also give a randomized algorithm of $$O(m^{2/3}K^{1/3}\log n+\tau(n\alpha(n)+n\log m+m)\log n\log K)$$ expected time, where $$K$$ is the number of intersections of all segments of $$S$$. In addition, we consider the query version of the problem, that is, preprocess $$S$$ to compute the face of the arrangement of $$S$$ that contains any given query point. We present new results that improve the previous work for both the line and the segment cases. In particulary, for the line case, we build a data structure of $$O(n\log n)$$ space in $$O(n\log n)$$ randomized time, so that the face containing the query point can be obtained in $$O(\sqrt{n\log n})$$ time with high probability (more specifically, the query returns a binary search tree representing the face so that standard binary-search-based queries on the face can be handled in $$O(\log n)$$ time each and the face itself can be output explicitly in time linear in its size). 
    more » « less