This content will become publicly available on May 10, 2025
In this paper we investigate the problem of quantifying the contribution of each variable to the satisfying assignments of a Boolean function based on the Shapley value. Our main result is a polynomial-time equivalence between computing Shapley values and model counting for any class of Boolean functions that are closed under substitutions of variables with disjunctions of fresh variables. This result settles an open problem raised in prior work, which sought to connect the Shapley value computation to probabilistic query evaluation.
We show two applications of our result. First, the Shapley values can be computed in polynomial time over deterministic and decomposable circuits, since they are closed under OR-substitutions. Second, there is a polynomial-time equivalence between computing the Shapley value for the tuples contributing to the answer of a Boolean conjunctive query and counting the models in the lineage of the query. This equivalence allows us to immediately recover the dichotomy for Shapley value computation in case of self-join-free Boolean conjunctive queries; in particular, the hardness for non-hierarchical queries can now be shown using a simple reduction from the \#P-hard problem of model counting for lineage in positive bipartite disjunctive normal form.
more » « less- PAR ID:
- 10518839
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Proceedings of the ACM on Management of Data
- Volume:
- 2
- Issue:
- 2
- ISSN:
- 2836-6573
- Page Range / eLocation ID:
- 1 to 23
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
In database-as-a-service platforms, automated ver-ification of query equivalence helps eliminate redundant computation in the form of overlapping sub-queries. Researchers have proposed two pragmatic techniques to tackle this problem. The first approach consists of reducing the queries to algebraic expressions and proving their equivalence using an algebraic theory. The limitations of this technique are threefold. It cannot prove the equivalence of queries with significant differences in the attributes of their relational operators (e.g., predicates in the filter operator). It does not support certain widely-used SQL features (e.g., NULL values). Its verification procedure is computationally intensive. The second approach transforms this problem to a constraint satisfaction problem and leverages a general-purpose solver to determine query equivalence. This technique consists of deriving the symbolic representation of the queries and proving their equivalence by determining the query containment relationship between the symbolic expressions. While the latter approach addresses all the limitations of the former technique, it only proves the equivalence of queries under set semantics (i.e., output tables must not contain duplicate tuples). However, in practice, database applications use bag semantics (i.e., output tables may contain duplicate tuples) In this paper, we introduce a novel symbolic approach for proving query equivalence under bag semantics. We transform the problem of proving query equivalence under bag semantics to that of proving the existence of a bijective, identity map between tuples returned by the queries on all valid inputs. We classify SQL queries into four categories, and propose a set of novel category-specific verification algorithms. We implement this symbolic approach in SPES and demonstrate that it proves the equivalence of a larger set of query pairs (95/232) under bag semantics compared to the SOTA tools based on algebraic (30/232) and symbolic approaches (67/232) under set and bag semantics, respectively. Furthermore, SPES is 3X faster than the symbolic tool that proves equivalence under set semantics.more » « less
-
Cormode, Graham ; Shekelyan, Michael (Ed.)A query algorithm based on homomorphism counts is a procedure for determining whether a given instance satisfies a property by counting homomorphisms between the given instance and finitely many predetermined instances. In a left query algorithm, we count homomorphisms from the predetermined instances to the given instance, while in a right query algorithm we count homomorphisms from the given instance to the predetermined instances. Homomorphisms are usually counted over the semiring ℕ of non-negative integers; it is also meaningful, however, to count homomorphisms over the Boolean semiring 𝔹, in which case the homomorphism count indicates whether or not a homomorphism exists. We first characterize the properties that admit a left query algorithm over 𝔹 by showing that these are precisely the properties that are both first-order definable and closed under homomorphic equivalence. After this, we turn attention to a comparison between left query algorithms over 𝔹 and left query algorithms over ℕ. In general, there are properties that admit a left query algorithm over ℕ but not over 𝔹. The main result of this paper asserts that if a property is closed under homomorphic equivalence, then that property admits a left query algorithm over 𝔹 if and only if it admits a left query algorithm over ℕ. In other words and rather surprisingly, homomorphism counts over ℕ do not help as regards properties that are closed under homomorphic equivalence. Finally, we characterize the properties that admit both a left query algorithm over 𝔹 and a right query algorithm over 𝔹.more » « less
-
The query containment problem is a fundamental algorithmic problem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In particular, it is a long-standing open question whether or not the conjunctive query containment problem under bag semantics is decidable. We unveil tight connections between information theory and the conjunctive query containment under bag semantics. These connections are established using information inequalities, which are considered to be the laws of information theory. Our first main result asserts that deciding the validity of a generalization of information inequalities is many-one equivalent to the restricted case of conjunctive query containment in which the containing query is acyclic; thus, either both these problems are decidable or both are undecidable. Our second main result identifies a new decidable case of the conjunctive query containment problem under bag semantics. Specifically, we give an exponential-time algorithm for conjunctive query containment under bag semantics, provided the containing query is chordal and admits a simple junction tree.more » « less
-
The query containment problem is a fundamental algorithmic prob- lem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In particular, it is a long-standing open question whether or not the conjunctive query containment problem under bag semantics is decidable. We unveil tight connections between information theory and the conjunctive query containment under bag semantics. These connections are established using information inequalities, which are considered to be the laws of information theory. Our first main result asserts that deciding the validity of a generalization of infor- mation inequalities is many-one equivalent to the restricted case of conjunctive query containment in which the containing query is acyclic; thus, either both these problems are decidable or both are undecidable. Our second main result identifies a new decidable case of the conjunctive query containment problem under bag semantics. Specifically, we give an exponential time algorithm for conjunctive query containment under bag semantics, provided the containing query is chordal and admits a simple junction tree.more » « less
-
Promise Constraint Satisfaction Problems (PCSPs) are a generalization ofConstraint Satisfaction Problems (CSPs) where each predicate has a strong and aweak form and given a CSP instance, the objective is to distinguish if thestrong form can be satisfied vs. even the weak form cannot be satisfied. Sincetheir formal introduction by Austrin, Guruswami, and H\aa stad, there has beena flurry of works on PCSPs [BBKO19,KO19,WZ20]. The key tool in studying PCSPsis the algebraic framework developed in the context of CSPs where the closureproperties of the satisfying solutions known as the polymorphisms are analyzed. The polymorphisms of PCSPs are much richer than CSPs. In the Boolean case, westill do not know if dichotomy for PCSPs exists analogous to Schaefer'sdichotomy result for CSPs. In this paper, we study a special case of BooleanPCSPs, namely Boolean Ordered PCSPs where the Boolean PCSPs have the predicate$x \leq y$. In the algebraic framework, this is the special case of BooleanPCSPs when the polymorphisms are monotone functions. We prove that BooleanOrdered PCSPs exhibit a computational dichotomy assuming the Rich 2-to-1Conjecture [BKM21] which is a perfect completeness surrogate of the UniqueGames Conjecture. Assuming the Rich 2-to-1 Conjecture, we prove that a Boolean Ordered PCSP canbe solved in polynomial time if for every $\epsilon>0$, it has polymorphismswhere each coordinate has Shapley value at most $\epsilon$, else it is NP-hard.The algorithmic part of our dichotomy is based on a structural lemma thatBoolean monotone functions with each coordinate having low Shapley value havearbitrarily large threshold functions as minors. The hardness part proceeds byshowing that the Shapley value is consistent under a uniformly random 2-to-1minor. Of independent interest, we show that the Shapley value can beinconsistent under an adversarial 2-to-1 minor.more » « less