skip to main content


Title: Structure and Complexity of Bag Consistency
Since the early days of relational databases, it was realized that acyclic hypergraphs give rise to database schemas with desirable structural and algorithmic properties. In a bynow classical paper, Beeri, Fagin, Maier, and Yannakakis established several different equivalent characterizations of acyclicity; in particular, they showed that the sets of attributes of a schema form an acyclic hypergraph if and only if the local-to-global consistency property for relations over that schema holds, which means that every collection of pairwise consistent relations over the schema is globally consistent. Even though real-life databases consist of bags (multisets), there has not been a study of the interplay between local consistency and global consistency for bags. We embark on such a study here and we first show that the sets of attributes of a schema form an acyclic hypergraph if and only if the local-to-global consistency property for bags over that schema holds. After this, we explore algorithmic aspects of global consistency for bags by analyzing the computational complexity of the global consistency problem for bags: given a collection of bags, are these bags globally consistent? We show that this problem is in NP, even when the schema is part of the input. We then establish the following dichotomy theorem for fixed schemas: if the schema is acyclic, then the global consistency problem for bags is solvable in polynomial time, while if the schema is cyclic, then the global consistency problem for bags is NP-complete. The latter result contrasts sharply with the state of affairs for relations, where, for each fixed schema, the global consistency problem for relations is solvable in polynomial time.  more » « less
Award ID(s):
1814152
NSF-PAR ID:
10358313
Author(s) / Creator(s):
;
Date Published:
Journal Name:
ACM SIGMOD Record
Volume:
51
Issue:
1
ISSN:
0163-5808
Page Range / eLocation ID:
78 to 85
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Writing concurrent programs is notoriously hard due to scheduling non-determinism. The most common concurrency bugs are data races, which are accesses to a shared resource that can be executed concurrently. Dynamic data-race prediction is the most standard technique for detecting data races: given an observed, data-race-free trace t, the task is to determine whether t can be reordered to a trace t* that exposes a data-race. Although the problem has received significant practical attention for over three decades, its complexity has remained elusive. In this work, we address this lacuna, identifying sources of intractability and conditions under which the problem is efficiently solvable. Given a trace t of size n over k threads, our main results are as follows. First, we establish a general O(k · n2·(k-1) upper-bound, as well as an O(nk) upper-bound when certain parameters of t are constant. In addition, we show that the problem is NP-hard and even W[1]-hard parameterized by k, and thus unlikely to be fixed-parameter tractable. Second, we study the problem over acyclic communication topologies, such as server-clients hierarchies. We establish an O(k2 · d · n2 · log n) upper-bound, where d is the number of shared variables accessed in t. In addition, we show that even for traces with k = 2 threads, the problem has no O(n2-ϵ) algorithm under the Orthogonal Vectors conjecture. Since any trace with 2 threads defines an acyclic topology, our upper-bound for this case is optimal up to polynomial improvements for up to moderate values of k and d. Finally, motivated by existing heuristics, we study a distance-bounded version of the problem, where the task is to expose a data race by a witness trace that is similar to t. We develop an algorithm that works in O(n) time when certain parameters of t are constant. 
    more » « less
  2. In this work we advance the understanding of the fundamental limits of computation for Binary Polynomial Optimization (BPO), which is the problem of maximizing a given polynomial function over all binary points. In our main result we provide a novel class of BPO that can be solved efficiently both from a theoretical and computational perspective. In fact, we give a strongly polynomial-time algorithm for instances whose corresponding hypergraph is β-acyclic. We note that the β-acyclicity assumption is natural in several applications including relational database schemes and the lifted multicut problem on trees. Due to the novelty of our proving technique, we obtain an algorithm which is interesting also from a practical viewpoint. This is because our algorithm is very simple to implement and the running time is a polynomial of very low degree in the number of nodes and edges of the hypergraph. Our result completely settles the computational complexity of BPO over acyclic hypergraphs, since the problem is NP-hard on α-acyclic instances. Our algorithm can also be applied to any general BPO problem that contains β-cycles. For these problems, the algorithm returns a smaller instance together with a rule to extend any optimal solution of the smaller instance to an optimal solution of the original instance. 
    more » « less
  3. Graph Isomorphism (GI) is one of a small number of natural algorithmic problems with unsettled complexity status in the P / NP theory: not expected to be NP-complete, yet not known to be solvable in polynomial time. Arguably, the GI problem boils down to filling the gap between symmetry and regularity, the former being defined in terms of automorphisms, the latter in terms of equations satisfied by numerical parameters. Recent progress on the complexity of GI relies on a combination of the asymptotic theory of permutation groups and asymptotic properties of highly regular combinatorial structures called coherent configurations. Group theory provides the tools to infer either global symmetry or global irregularity from local information, eliminating the symmetry/regularity gap in the relevant scenario; the resulting global structure is the subject of combinatorial analysis. These structural studies are melded in a divide-and-conquer algorithmic framework pioneered in the GI context by Eugene M. Luks (1980). 
    more » « less
  4. For graphs G and H, we say that G is H-free if it does not contain H as an induced subgraph. Already in the early 1980s Alekseev observed that if H is connected, then the Max Weight Independent Set problem (MWIS) remains NP-hard in H-free graphs, unless H is a path or a subdivided claw, i.e., a graph obtained from the three-leaf star by subdividing each edge some number of times (possibly zero). Since then determining the complexity of MWIS in these remaining cases is one of the most important problems in algorithmic graph theory. A general belief is that the problem is polynomial-time solvable, which is witnessed by algorithmic results for graphs excluding some small paths or subdivided claws. A more conclusive evidence was given by the recent breakthrough result by Gartland and Lokshtanov [FOCS 2020]: They proved that MWIS can be solved in quasipolynomial time in H-free graphs, where H is any fixed path. If H is an arbitrary subdivided claw, we know much less: The problem admits a QPTAS and a subexponential-time algorithm [Chudnovsky et al., SODA 2019]. In this paper we make an important step towards solving the problem by showing that for any subdivided claw H, MWIS is polynomial-time solvable in H-free graphs of bounded degree. 
    more » « less
  5. Abstract Background Cell signaling pathways, which are a series of reactions that start at receptors and end at transcription factors, are basic to systems biology. Properly modeling the reactions in such pathways requires directed hypergraphs , where an edge is now directed between two sets of vertices. Inferring a pathway by the most parsimonious series of reactions corresponds to finding a shortest hyperpath in a directed hypergraph, which is NP-complete. The current state-of-the-art for shortest hyperpaths in cell signaling hypergraphs solves a mixed-integer linear program to find an optimal hyperpath that is restricted to be acyclic, and offers no efficiency guarantees. Results We present, for the first time, a heuristic for general shortest hyperpaths that properly handles cycles , and is guaranteed to be efficient . We show the heuristic finds provably optimal hyperpaths for the class of singleton-tail hypergraphs, and also give a practical algorithm for tractably generating all source-sink hyperpaths. The accuracy of the heuristic is demonstrated through comprehensive experiments on all source-sink instances from the standard NCI-PID and Reactome pathway databases, which show it finds a hyperpath that matches the state-of-the-art mixed-integer linear program on over 99% of all instances that are acyclic. On instances where only cyclic hyperpaths exist, the heuristic surpasses the state-of-the-art, which finds no solution; on every such cyclic instance, enumerating all source-sink hyperpaths shows the solution found by the heuristic was in fact optimal . Conclusions The new shortest hyperpath heuristic is both fast and accurate . This makes finding source-sink hyperpaths, which in general may contain cycles, now practical for real cell signaling networks. Availability Source code for the hyperpath heuristic in a new tool we call  (as well as for hyperpath enumeration, and all dataset instances) is available free for non-commercial use at . 
    more » « less