skip to main content


Search for: All records

Award ID contains: 1733808

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. A probability distribution over the Boolean cube is monotone if flipping the value of a coordinate from zero to one can only increase the probability of an element. Given samples of an unknown monotone distribution over the Boolean cube, we give (to our knowledge) the first algorithm that learns an approximation of the distribution in statistical distance using a number of samples that is sublinear in the domain. To do this, we develop a structural lemma describing monotone probability distributions. The structural lemma has further implications to the sample complexity of basic testing tasks for analyzing monotone probability distributions over the Boolean cube: We use it to give nontrivial upper bounds on the tasks of estimating the distance of a monotone distribution to uniform and of estimating the support size of a monotone distribution. In the setting of monotone probability distributions over the Boolean cube, our algorithms are the first to have sample complexity lower than known lower bounds for the same testing tasks on arbitrary (not necessarily monotone) probability distributions. One further consequence of our learning algorithm is an improved sample complexity for the task of testing whether a distribution on the Boolean cube is monotone. 
    more » « less
  2. We propose a new setting for testing properties of distributions while receiving samples from several distributions, but few samples per distribution. Given samples from s distributions, p_1, p_2, …, p_s, we design testers for the following problems: (1) Uniformity Testing: Testing whether all the p_i’s are uniform or ε-far from being uniform in ℓ_1-distance (2) Identity Testing: Testing whether all the p_i’s are equal to an explicitly given distribution q or ε-far from q in ℓ_1-distance, and (3) Closeness Testing: Testing whether all the p_i’s are equal to a distribution q which we have sample access to, or ε-far from q in ℓ_1-distance. By assuming an additional natural condition about the source distributions, we provide sample optimal testers for all of these problems. 
    more » « less
  3. Consider an algorithm performing a computation on a huge random object. Is it necessary to generate the entire object up front, or is it possible to provide query access to the object and sample it incrementally "on-the-fly"? Such an implementation should emulate the object by answering queries in a manner consistent with a random instance sampled from the true distribution. Our first set of results focus on undirected graphs with independent edge probabilities, under certain assumptions. Then, we use this to obtain the first efficient implementations for the Erdos-Renyi model and the Stochastic Block model. As in previous local-access implementations for random graphs, we support Vertex-Pair and Next-Neighbor queries. We also introduce a new Random-Neighbor query. Next, we show how to implement random Catalan objects, specifically focusing on Dyck paths (always positive random walks on the integer line). Here, we support Height queries to find the position of the walk, and First-Return queries to find the time when the walk returns to a specified height. This in turn can be used to implement Next-Neighbor queries on random rooted/binary trees, and Matching-Bracket queries on random well bracketed expressions. Finally, we define a new model that: (1) allows multiple independent simultaneous instantiations of the same implementation to be consistent with each other without communication (2) allows us to generate a richer class of random objects that do not have a succinct description. Specifically, we study uniformly random valid q-colorings of an input graph G with max degree Δ. The distribution over valid colorings is specified via a "huge" underlying graph G, that is far too large to be read in sub-linear time. Instead, we access G through local neighborhood probes. We are able to answer queries to the color of any vertex in sub-linear time for q>9Δ. 
    more » « less
  4. There has been significant study on the sample complexity of testing properties of distributions over large domains. For many properties, it is known that the sample complexity can be substantially smaller than the domain size. For example, over a domain of size n, distinguishing the uniform distribution from distributions that are far from uniform in ℓ1-distance uses only O(n−−√) samples. However, the picture is very different in the presence of arbitrary noise, even when the amount of noise is quite small. In this case, one must distinguish if samples are coming from a distribution that is ϵ-close to uniform from the case where the distribution is (1−ϵ)-far from uniform. The latter task requires nearly linear in n samples (Valiant, 2008; Valiant and Valiant, 2017a). In this work, we present a noise model that on one hand is more tractable for the testing problem, and on the other hand represents a rich class of noise families. In our model, the noisy distribution is a mixture of the original distribution and noise, where the latter is known to the tester either explicitly or via sample access; the form of the noise is also known \emph{a priori}. Focusing on the identity and closeness testing problems leads to the following mixture testing question: Given samples of distributions p,q1,q2, can we test if p is a mixture of q1 and q2? We consider this general question in various scenarios that differ in terms of how the tester can access the distributions, and show that indeed this problem is more tractable. Our results show that the sample complexity of our testers are exactly the same as for the classical non-mixture case. 
    more » « less
  5. In this work, we consider the sample complexity required for testing the monotonicity of distributions over partial orders. A distribution p over a poset is monotone if, for any pair of domain elements x and y such that x ⪯ y, p(x) ≤ p(y). To understand the sample complexity of this problem, we introduce a new property called bigness over a finite domain, where the distribution is T-big if the minimum probability for any domain element is at least T. We establish a lower bound of Ω(n/ log n) for testing bigness of distributions on domains of size n. We then build on these lower bounds to give Ω(n/ log n) lower bounds for testing monotonicity over a matching poset of size n and significantly improved lower bounds over the hypercube poset. We give sublinear sample complexity bounds for testing bigness and for testing monotonicity over the matching poset. We then give a number of tools for analyzing upper bounds on the sample complexity of the monotonicity testing problem. The previous lower bound for testing Monotonicity of 
    more » « less
  6. There has been significant study on the sample complexity of testing properties of distributions over large domains. For many properties, it is known that the sample complexity can be substantially smaller than the domain size. For example, over a domain of size n, distinguishing the uniform distribution from distributions that are far from uniform in ℓ1-distance uses only O(n−−√) samples. However, the picture is very different in the presence of arbitrary noise, even when the amount of noise is quite small. In this case, one must distinguish if samples are coming from a distribution that is ϵ-close to uniform from the case where the distribution is (1−ϵ)-far from uniform. The latter task requires nearly linear in n samples (Valiant, 2008; Valiant and Valiant, 2017a). In this work, we present a noise model that on one hand is more tractable for the testing problem, and on the other hand represents a rich class of noise families. In our model, the noisy distribution is a mixture of the original distribution and noise, where the latter is known to the tester either explicitly or via sample access; the form of the noise is also known \emph{a priori}. Focusing on the identity and closeness testing problems leads to the following mixture testing question: Given samples of distributions p,q1,q2, can we test if p is a mixture of q1 and q2? We consider this general question in various scenarios that differ in terms of how the tester can access the distributions, and show that indeed this problem is more tractable. Our results show that the sample complexity of our testers are exactly the same as for the classical non-mixture case. 
    more » « less
  7. Separations: We introduce a monotone variant of Xor-Sat and show it has exponential monotone circuit complexity. Since Xor-Sat is in NC^2, this improves qualitatively on the monotone vs. non-monotone separation of Tardos (1988). We also show that monotone span programs over R can be exponentially more powerful than over finite fields. These results can be interpreted as separating subclasses of TFNP in communication complexity. Characterizations: We show that the communication (resp. query) analogue of PPA (subclass of TFNP) captures span programs over F_2 (resp. Nullstellensatz degree over F_2). Previously, it was known that communication FP captures formulas (Karchmer - Wigderson, 1988) and that communication PLS captures circuits (Razborov, 1995). Characterizations: We show that the communication (resp. query) analogue of PPA (subclass of TFNP) captures span programs over F_2 (resp. Nullstellensatz degree over F_2). Previously, it was known that communication FP captures formulas (Karchmer-Wigderson, 1988) and that communication PLS captures circuits (Razborov, 1995). 
    more » « less
  8. Statistical tests are at the heart of many scientific tasks. To validate their hypothesis, researchers in medical and social sciences use individuals' data. The sensitivity of participants' data requires the design of statistical tests that ensure the privacy of the individuals in the most efficient way. In this paper, we use the framework of property testing to design algorithms to test the properties of the distribution that the data is drawn from with respect to differential privacy. In particular, we investigate testing two fundamental properties of distributions: (1) testing the equivalence of two distributions when we have unequal numbers of samples from the two distributions. (2) Testing independence of two random variables. In both cases, we show that our testers achieve near optimal sample complexity (up to logarithmic factors). Moreover, our dependence on the privacy parameter is an additive term, which indicates that differential privacy can be obtained in most regimes of parameters for free. 
    more » « less
  9. A graph spanner is a fundamental graph structure that faithfully preserves the pairwise distances in the input graph up to a small multiplicative stretch. The common objective in the computation of spanners is to achieve the best-known existential size-stretch trade-off efficiently. Classical models and algorithmic analysis of graph spanners essentially assume that the algorithm can read the input graph, construct the desired spanner, and write the answer to the output tape. However, when considering massive graphs containing millions or even billions of nodes not only the input graph, but also the output spanner might be too large for a single processor to store. To tackle this challenge, we initiate the study of local computation algorithms (LCAs) for graph spanners in general graphs, where the algorithm should locally decide whether a given edge (u,v)∈E belongs to the output spanner. Such LCAs give the user the `illusion' that a specific sparse spanner for the graph is maintained, without ever fully computing it. We present the following results: -For general n-vertex graphs and r∈{2,3}, there exists an LCA for (2r−1)-spanners with O˜(n1+1/r) edges and sublinear probe complexity of O˜(n1−1/2r). These size/stretch tradeoffs are best possible (up to polylogarithmic factors). -For every k≥1 and n-vertex graph with maximum degree Δ, there exists an LCA for O(k2) spanners with O˜(n1+1/k) edges, probe complexity of O˜(Δ4n2/3), and random seed of size polylog(n). This improves upon, and extends the work of [Lenzen-Levi, 2018]. We also complement our results by providing a polynomial lower bound on the probe complexity of LCAs for graph spanners that holds even for the simpler task of computing a sparse connected subgraph with o(m) edges. 
    more » « less
  10. We present O(log logn)-round algorithms in the Massively Parallel Computation (MPC) model, with ˜O(n) memory per machine, that compute a maximal independent set, a 1 + ε approximation of maximum matching, and a 2 + ε approximation of minimum vertex cover, for any n-vertex graph and any constant ε > 0. These improve the state of the art as follows: • Our MIS algorithm leads to a simple O(log log Δ)-round MIS algorithm in the CONGESTED-CLIQUE model of distributed computing, which improves on the ˜O (plog Δ)-round algorithm of Ghaffari [PODC’17]. • OurO(log logn)-round (1+ε)-approximate maximum matching algorithm simplifies or improves on the following prior work: O(log2 logn)-round (1 + ε)-approximation algorithm of Czumaj et al. [STOC’18] and O(log logn)-round (1 + ε)- approximation algorithm of Assadi et al. [arXiv’17]. • Our O(log logn)-round (2+ε)-approximate minimum vertex cover algorithm improves on an O(log logn)-round O(1)- approximation of Assadi et al. [arXiv’17]. 
    more » « less