skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Title: Model Counting Meets Distinct Elements in a Data Stream
Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSPs and computation of zeroth frequency moments (F0) for data streams.  more » « less
Award ID(s):
2130536 2130608
PAR ID:
10446698
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ACM SIGMOD Record
Volume:
51
Issue:
1
ISSN:
0163-5808
Page Range / eLocation ID:
87 to 94
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSPs and the computation of the number of distinct elements in a data stream, also known as the zeroth frequency moment (F0) of a data stream.

    Our investigations lead us to observe striking similarity in the core techniques employed in the algorithmic frameworks that have evolved separately for model counting and distinct elements computation. We design a recipe for the translation of algorithms developed for distinct elements estimation to that of model counting, resulting in new algorithms for model counting. We then observe that algorithms in the context of distributed streaming can be transformed into distributed algorithms for model counting. We next turn our attention to viewing streaming from the lens of counting and show that framing distinct elements estimation as a special case of #DNF counting allows us to obtain a general recipe for a rich class of streaming problems, which had been subjected to case-specific analysis in prior works.

     
    more » « less
  2. Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSP’s and computation of zeroth frequency moments (F0) for data streams.

    Our investigations lead us to observe a striking similarity in the core techniques employed in the algorithmic frameworks that have evolved separately for model counting andF0computation. We design a recipe for translating algorithms developed forF0estimation to model counting, resulting in new algorithms for model counting. We also provide a recipe for transforming sampling algorithm over streams to constraint sampling algorithms. We then observe that algorithms in the context of distributed streaming can be transformed into distributed algorithms for model counting. We next turn our attention to viewing streaming from the lens of counting and show that framingF0estimation as a special case of #DNF counting allows us to obtain a general recipe for a rich class of streaming problems, which had been subjected to case-specific analysis in prior works. In particular, our view yields an algorithm for multidimensional range efficientF0estimation with a simpler analysis.

     
    more » « less
  3. We study the natural problem of Triplet Reconstruction (also known as Rooted Triplets Consistency or Triplet Clustering), originally motivated by applications in computational biology and relational databases (Aho, Sagiv, Szymanski, and Ullman, 1981) [2]: given n datapoints, we want to embed them onto the n leaves of a rooted binary tree (also known as a hierarchical clustering, or ultrametric embedding) such that a given set of m triplet constraints is satisfied. A triplet constraint i j · k for points i, j, k indicates that 'i, j are more closely related to each other than to k,' (in terms of distances d(i, j) ≤ d(i, k) and d(i, j) ≤ d(j, k)) and we say that a tree satisfies the triplet i j · k if the distance in the tree between i, j is smaller than the distance between i, k (or j, k). Among all possible trees with n leaves, can we efficiently find one that satisfies a large fraction of the m given triplets? Aho et al. (1981) [2] studied the decision version and gave an elegant polynomial-time algorithm that determines whether or not there exists a tree that satisfies all of the m constraints. Moreover, it is straightforward to see that a random binary tree achieves a constant 13-approximation, since there are only 3 distinct triplets i j|k, i k| j, j k · i (each will be satisfied w.p. 13). Unfortunately, despite more than four decades of research by various communities, there is no better approximation algorithm for this basic Triplet Reconstruction problem.Our main theorem-which captures Triplet Reconstruction as a special case-is a general hardness of approximation result about Constraint Satisfaction Problems (CSPs) over infinite domains (CSPs where instead of boolean values {0,1} or a fixed-size domain, the variables can be mapped to any of the n leaves of a tree). Specifically, we prove that assuming the Unique Games Conjecture [57], Triplet Reconstruction and more generally, every Constraint Satisfaction Problem (CSP) over hierarchies is approximation resistant, i.e., there is no polynomial-time algorithm that does asymptotically better than a biased random assignment.Our result settles the approximability not only for Triplet Reconstruction, but for many interesting problems that have been studied by various scientific communities such as the popular Quartet Reconstruction and Subtree/Supertree Aggregation Problems. More broadly, our result significantly extends the list of approximation resistant predicates by pointing to a large new family of hard problems over hierarchies. Our main theorem is a generalization of Guruswami, Håstad, Manokaran, Raghavendra, and Charikar (2011) [36], who showed that ordering CSPs (CSPs over permutations of n elements, e.g., Max Acyclic Subgraph, Betweenness, Non-Betweenness) are approximation resistant. The main challenge in our analyses stems from the fact that trees have topology (in contrast to permutations and ordering CSPs) and it is the tree topology that determines whether a given constraint on the variables is satisfied or not. As a byproduct, we also present some of the first CSPs where their approximation resistance is proved against biased random assignments, instead of uniformly random assignments. 
    more » « less
  4. Chakrabarti, Amit ; Swamy, Chaitanya (Ed.)
    We analyze the sketching approximability of constraint satisfaction problems on Boolean domains, where the constraints are balanced linear threshold functions applied to literals. In particular, we explore the approximability of monarchy-like functions where the value of the function is determined by a weighted combination of the vote of the first variable (the president) and the sum of the votes of all remaining variables. The pure version of this function is when the president can only be overruled by when all remaining variables agree. For every k ≥ 5, we show that CSPs where the underlying predicate is a pure monarchy function on k variables have no non-trivial sketching approximation algorithm in o(√n) space. We also show infinitely many weaker monarchy functions for which CSPs using such constraints are non-trivially approximable by O(log(n)) space sketching algorithms. Moreover, we give the first example of sketching approximable asymmetric Boolean CSPs. Our results work within the framework of Chou, Golovnev, Sudan, and Velusamy (FOCS 2021) that characterizes the sketching approximability of all CSPs. Their framework can be applied naturally to get a computer-aided analysis of the approximability of any specific constraint satisfaction problem. The novelty of our work is in using their work to get an analysis that applies to infinitely many problems simultaneously. 
    more » « less
  5. Promise Constraint Satisfaction Problems (PCSPs) are a generalization ofConstraint Satisfaction Problems (CSPs) where each predicate has a strong and aweak form and given a CSP instance, the objective is to distinguish if thestrong form can be satisfied vs. even the weak form cannot be satisfied. Sincetheir formal introduction by Austrin, Guruswami, and H\aa stad, there has beena flurry of works on PCSPs [BBKO19,KO19,WZ20]. The key tool in studying PCSPsis the algebraic framework developed in the context of CSPs where the closureproperties of the satisfying solutions known as the polymorphisms are analyzed. The polymorphisms of PCSPs are much richer than CSPs. In the Boolean case, westill do not know if dichotomy for PCSPs exists analogous to Schaefer'sdichotomy result for CSPs. In this paper, we study a special case of BooleanPCSPs, namely Boolean Ordered PCSPs where the Boolean PCSPs have the predicate$x \leq y$. In the algebraic framework, this is the special case of BooleanPCSPs when the polymorphisms are monotone functions. We prove that BooleanOrdered PCSPs exhibit a computational dichotomy assuming the Rich 2-to-1Conjecture [BKM21] which is a perfect completeness surrogate of the UniqueGames Conjecture. Assuming the Rich 2-to-1 Conjecture, we prove that a Boolean Ordered PCSP canbe solved in polynomial time if for every $\epsilon>0$, it has polymorphismswhere each coordinate has Shapley value at most $\epsilon$, else it is NP-hard.The algorithmic part of our dichotomy is based on a structural lemma thatBoolean monotone functions with each coordinate having low Shapley value havearbitrarily large threshold functions as minors. The hardness part proceeds byshowing that the Shapley value is consistent under a uniformly random 2-to-1minor. Of independent interest, we show that the Shapley value can beinconsistent under an adversarial 2-to-1 minor. 
    more » « less