In recent years several compressed indexes based on variants of the Burrows-Wheeler transformation have been introduced. Some of these are used to index structures far more complex than a single string, as was originally done with the FM-index [Ferragina and Manzini, J. ACM 2005]. As such, there has been an increasing effort to better understand under which conditions such an indexing scheme is possible. This has led to the introduction of Wheeler graphs [Gagie et al., Theor. Comput. Sci., 2017]. Gagie et al. showed that de Bruijn graphs, generalized compressed suffix arrays, and several other BWT related structures can be represented as Wheeler graphs and that Wheeler graphs can be indexed in a way which is space-efficient. Hence, being able to recognize whether a given graph is a Wheeler graph, or being able to approximate a given graph by a Wheeler graph, could have numerous applications in indexing. Here we resolve the open question of whether there exists an efficient algorithm for recognizing if a given graph is a Wheeler graph. We present - The problem of recognizing whether a given graph G=(V,E) is a Wheeler graph is NP-complete for any edge label alphabet of size sigma >= 2, even when G is a DAG. This holds even on a restricted, subset of graphs called d-NFA's for d >= 5. This is in contrast to recent results demonstrating the problem can be solved in polynomial time for d-NFA's where d <= 2. We also show the recognition problem can be solved in linear time for sigma =1; - There exists an 2^{e log sigma + O(n + e)} time exact algorithm where n = |V| and e = |E|. This algorithm relies on graph isomorphism being computable in strictly sub-exponential time; - We define an optimization variant of the problem called Wheeler Graph Violation, abbreviated WGV, where the aim is to remove the minimum number of edges in order to obtain a Wheeler graph. We show WGV is APX-hard, even when G is a DAG, implying there exists a constant C >= 1 for which there is no C-approximation algorithm (unless P = NP). Also, conditioned on the Unique Games Conjecture, for all C >= 1, it is NP-hard to find a C-approximation; - We define the Wheeler Subgraph problem, abbreviated WS, where the aim is to find the largest subgraph which is a Wheeler Graph (the dual of the WGV). In contrast to WGV, we prove that the WS problem is in APX for sigma=O(1); The above findings suggest that most problems under this theme are computationally difficult. However, we identify a class of graphs for which the recognition problem is polynomial-time solvable, raising the open question of which parameters determine this problem's difficulty.
more »
« less
This content will become publicly available on June 19, 2026
Wheeler Graphs and Wheeler Languages
Suffix sorting stands at the core of the most efficient solutions for indexed pattern matching: the suffix tree, the suffix array, compressed indexes based on the Burrows-Wheeler transform, and so on. In [Gagie, Manzini, Sirén, TCS 2017] this concept was extended to labeled graphs, obtaining the rich class of Wheeler graphs. This work opened a very fruitful line of research, ultimately generating results able to bridge the fields of compressed data structures, graph theory, and regular language theory. In a Wheeler graph, nodes are sorted according to the alphabetic order of their incoming labels, propagating this order through pairs of equally-labeled edges. This apparently-simple definition makes it possible to solve on Wheeler graphs problems (including, but not limited to: compression, subpath queries, NFA equivalence, determinization, minimization) that on general labeled graphs are extremely hard to solve, and induces a rich structure in the class of regular languages (Wheeler languages) recognized by automata whose state transition is a Wheeler graph. The goal of this survey is to provide a summary of (and intuitions behind) the results on Wheeler graphs that appeared in the literature since their introduction, in addition to a discussion of interesting problems that are still open in the field.
more »
« less
- PAR ID:
- 10646074
- Editor(s):
- Ferragina, Paolo; Gagie, Travis; Navarro, Gonzalo
- Publisher / Repository:
- Schloss Dagstuhl – Leibniz-Zentrum für Informatik
- Date Published:
- Volume:
- 131
- ISSN:
- 2190-6807
- Page Range / eLocation ID:
- 12:1-12:28
- Subject(s) / Keyword(s):
- Wheeler languages Wheeler graphs pattern matching indexing compressed data structures Theory of computation → Sorting and searching Theory of computation → Graph algorithms analysis Theory of computation → Pattern matching Theory of computation → Formal languages and automata theory
- Format(s):
- Medium: X Size: 28 pages; 1284983 bytes Other: application/pdf
- Size(s):
- 28 pages 1284983 bytes
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Bringmann, Karl; Grohe, Martin; Puppis, Gabriele; Svensson, Ola (Ed.)We give a randomized algorithm that approximates the number of independent sets in a dense, regular bipartite graph - in the language of approximate counting, we give an FPRAS for #BIS on the class of dense, regular bipartite graphs. Efficient counting algorithms typically apply to "high-temperature" problems on bounded-degree graphs, and our contribution is a notable exception as it applies to dense graphs in a low-temperature setting. Our methods give a counting-focused complement to the long line of work in combinatorial optimization showing that CSPs such as Max-Cut and Unique Games are easy on dense graphs via spectral arguments. Our contributions include a novel extension of the method of graph containers that differs considerably from other recent low-temperature algorithms. The additional key insights come from spectral graph theory and have previously been successful in approximation algorithms. As a result, we can overcome some limitations that seem inherent to the aforementioned class of algorithms. In particular, we exploit the fact that dense, regular graphs exhibit a kind of small-set expansion (i.e., bounded threshold rank), which, via subspace enumeration, lets us enumerate small cuts efficiently.more » « less
-
Domain adaptation has become an attractive learning paradigm, as it can leverage source domains with rich labels to deal with classification tasks in an unlabeled target domain. A few recent studies develop domain adaptation approaches for graph-structured data. In the case of node classification task, current domain adaptation methods only focus on the closed-set setting, where source and target domains share the same label space. A more practical assumption is that the target domain may contain new classes that are not included in the source domain. Therefore, in this paper, we introduce a novel and challenging problem for graphs, i.e., open-set domain adaptive node classification, and propose a new approach to solve it. Specifically, we develop an algorithm for efficient knowledge transfer from a labeled source graph to an unlabeled target graph under a separate domain alignment (SDA) strategy, in order to learn discriminative feature representations for the target graph. Our goal is to not only correctly classify target nodes into the known classes, but also classify unseen types of nodes into an unknown class. Experimental results on real-world datasets show that our method outperforms existing methods on graph domain adaptation.more » « less
-
We compare the performance of the Quantum Approximate Optimization Algorithm (QAOA) with state-of-the-art classical solvers Gurobi and MQLib to solve the MaxCut problem on 3-regular graphs. We identify the minimum noiseless sampling frequency and depthprequired for a quantum device to outperform classical algorithms. There is potential for quantum advantage on hundreds of qubits and moderate depth with a sampling frequency of 10 kHz. We observe, however, that classical heuristic solvers are capable of producing high-quality approximate solutions in linear time complexity. In order to match this quality for large graph sizesN, a quantum device must support depthp > 11. Additionally, multi-shot QAOA is not efficient on large graphs, indicating that QAOAp ≤ 11 does not scale withN. These results limit achieving quantum advantage for QAOA MaxCut on 3-regular graphs. Other problems, such as different graphs, weighted MaxCut, and 3-SAT, may be better suited for achieving quantum advantage on near-term quantum devices.more » « less
-
Many program-analysis problems can be formulated as graph-reachability problems. Interleaved Dyck language reachability ( InterDyck -reachability) is a fundamental framework to express a wide variety of program-analysis problems over edge-labeled graphs. The InterDyck language represents an intersection of multiple matched-parenthesis languages (i.e., Dyck languages). In practice, program analyses typically leverage one Dyck language to achieve context-sensitivity, and other Dyck languages to model data dependencies, such as field-sensitivity and pointer references/dereferences. In the ideal case, an InterDyck -reachability framework should model multiple Dyck languages simultaneously . Unfortunately, precise InterDyck -reachability is undecidable. Any practical solution must over-approximate the exact answer. In the literature, a lot of work has been proposed to over-approximate the InterDyck -reachability formulation. This article offers a new perspective on improving both the precision and the scalability of InterDyck -reachability: we aim at simplifying the underlying input graph G . Our key insight is based on the observation that if an edge is not contributing to any InterDyck -paths, we can safely eliminate it from G . Our technique is orthogonal to the InterDyck -reachability formulation and can serve as a pre-processing step with any over-approximating approach for InterDyck -reachability. We have applied our graph simplification algorithm to pre-processing the graphs from a recent InterDyck -reachability-based taint analysis for Android. Our evaluation of three popular InterDyck -reachability algorithms yields promising results. In particular, our graph-simplification method improves both the scalability and precision of all three InterDyck -reachability algorithms, sometimes dramatically.more » « less
An official website of the United States government
