skip to main content

Title: Monotone probability distributions over the Boolean cube can be learned with sublinear samples
A probability distribution over the Boolean cube is monotone if flipping the value of a coordinate from zero to one can only increase the probability of an element. Given samples of an unknown monotone distribution over the Boolean cube, we give (to our knowledge) the first algorithm that learns an approximation of the distribution in statistical distance using a number of samples that is sublinear in the domain. To do this, we develop a structural lemma describing monotone probability distributions. The structural lemma has further implications to the sample complexity of basic testing tasks for analyzing monotone probability distributions over the Boolean cube: We use it to give nontrivial upper bounds on the tasks of estimating the distance of a monotone distribution to uniform and of estimating the support size of a monotone distribution. In the setting of monotone probability distributions over the Boolean cube, our algorithms are the first to have sample complexity lower than known lower bounds for the same testing tasks on arbitrary (not necessarily monotone) probability distributions. One further consequence of our learning algorithm is an improved sample complexity for the task of testing whether a distribution on the Boolean cube is monotone.
Authors:
;
Award ID(s):
1740751 1733808
Publication Date:
NSF-PAR ID:
10195631
Journal Name:
11th Innovations in Theoretical Computer Science (ITCS 2020)
Sponsoring Org:
National Science Foundation
More Like this
  1. A probability distribution over the Boolean cube is monotone if flipping the value of a coordinate from zero to one can only increase the probability of an element. Given samples of an unknown monotone distribution over the Boolean cube, we give (to our knowledge) the first algorithm that learns an approximation of the distribution in statistical distance using a number of samples that is sublinear in the domain. To do this, we develop a structural lemma describing monotone probability distributions. The structural lemma has further implications to the sample complexity of basic testing tasks for analyzing monotone probability distributions over the Boolean cube: We use it to give nontrivial upper bounds on the tasks of estimating the distance of a monotone distribution to uniform and of estimating the support size of a monotone distribution. In the setting of monotone probability distributions over the Boolean cube, our algorithms are the first to have sample complexity lower than known lower bounds for the same testing tasks on arbitrary (not necessarily monotone) probability distributions. One further consequence of our learning algorithm is an improved sample complexity for the task of testing whether a distribution on the Boolean cube is monotone.
  2. We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution p over n elements, an explicitly given distribution q, and parameters 0< epsilon, delta < 1, we wish to distinguish, with probability at least 1-delta, whether the distributions are identical versus epsilon-far in total variation distance. Most prior work focused on the case that delta = Omega(1), for which the sample complexity of identity testing is known to be Theta(sqrt{n}/epsilon^2). Given such an algorithm, one can achieve arbitrarily small values of delta via black-box amplification, which multiplies the required number of samples by Theta(log(1/delta)). We show that black-box amplification is suboptimal for any delta = o(1), and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is Theta((1/epsilon^2) (sqrt{n log(1/delta)} + log(1/delta))) for any n, epsilon, and delta. For the special case of uniformity testing, where the given distribution is the uniform distribution U_n over the domain, our new tester is surprisingly simple: to test whether p = U_n versus d_{TV} (p, U_n) >= epsilon, wemore »simply threshold d_{TV}({p^}, U_n), where {p^} is the empirical probability distribution. The fact that this simple "plug-in" estimator is sample-optimal is surprising, even in the constant delta case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of epsilon and delta. An important contribution of this work lies in the analysis techniques that we introduce in this context. First, we exploit an underlying strong convexity property to bound from below the expectation gap in the completeness and soundness cases. Second, we give a new, fast method for obtaining provably correct empirical estimates of the true worst-case failure probability for a broad class of uniformity testing statistics over all possible input distributions - including all previously studied statistics for this problem. We believe that our novel analysis techniques will be useful for other distribution testing problems as well.« less
  3. The noise sensitivity of a Boolean function f: {0,1}^n - > {0,1} is one of its fundamental properties. For noise parameter delta, the noise sensitivity is denoted as NS_{delta}[f]. This quantity is defined as follows: First, pick x = (x_1,...,x_n) uniformly at random from {0,1}^n, then pick z by flipping each x_i independently with probability delta. NS_{delta}[f] is defined to equal Pr [f(x) != f(z)]. Much of the existing literature on noise sensitivity explores the following two directions: (1) Showing that functions with low noise-sensitivity are structured in certain ways. (2) Mathematically showing that certain classes of functions have low noise sensitivity. Combined, these two research directions show that certain classes of functions have low noise sensitivity and therefore have useful structure. The fundamental importance of noise sensitivity, together with this wealth of structural results, motivates the algorithmic question of approximating NS_{delta}[f] given an oracle access to the function f. We show that the standard sampling approach is essentially optimal for general Boolean functions. Therefore, we focus on estimating the noise sensitivity of monotone functions, which form an important subclass of Boolean functions, since many functions of interest are either monotone or can be simply transformed into a monotone functionmore »(for example the class of unate functions consists of all the functions that can be made monotone by reorienting some of their coordinates [O'Donnell, 2014]). Specifically, we study the algorithmic problem of approximating NS_{delta}[f] for monotone f, given the promise that NS_{delta}[f] >= 1/n^{C} for constant C, and for delta in the range 1/n <= delta <= 1/2. For such f and delta, we give a randomized algorithm performing O((min(1,sqrt{n} delta log^{1.5} n))/(NS_{delta}[f]) poly (1/epsilon)) queries and approximating NS_{delta}[f] to within a multiplicative factor of (1 +/- epsilon). Given the same constraints on f and delta, we also prove a lower bound of Omega((min(1,sqrt{n} delta))/(NS_{delta}[f] * n^{xi})) on the query complexity of any algorithm that approximates NS_{delta}[f] to within any constant factor, where xi can be any positive constant. Thus, our algorithm's query complexity is close to optimal in terms of its dependence on n. We introduce a novel descending-ascending view of noise sensitivity, and use it as a central tool for the analysis of our algorithm. To prove lower bounds on query complexity, we develop a technique that reduces computational questions about query complexity to combinatorial questions about the existence of "thin" functions with certain properties. The existence of such "thin" functions is proved using the probabilistic method. These techniques also yield new lower bounds on the query complexity of approximating other fundamental properties of Boolean functions: the total influence and the bias.« less
  4. In this work, we consider the sample complexity required for testing the monotonicity of distributions over partial orders. A distribution p over a poset is monotone if, for any pair of domain elements x and y such that x ⪯ y, p(x) ≤ p(y). To understand the sample complexity of this problem, we introduce a new property called bigness over a finite domain, where the distribution is T-big if the minimum probability for any domain element is at least T. We establish a lower bound of Ω(n/ log n) for testing bigness of distributions on domains of size n. We then build on these lower bounds to give Ω(n/ log n) lower bounds for testing monotonicity over a matching poset of size n and significantly improved lower bounds over the hypercube poset. We give sublinear sample complexity bounds for testing bigness and for testing monotonicity over the matching poset. We then give a number of tools for analyzing upper bounds on the sample complexity of the monotonicity testing problem. The previous lower bound for testing Monotonicity of
  5. In this work, we consider the sample complexity required for testing the monotonicity of distributions over partial orders. A distribution p over a poset is {\em monotone} if, for any pair of domain elements x and y such that x⪯y, p(x)≤p(y). To understand the sample complexity of this problem, we introduce a new property called \emph{bigness} over a finite domain, where the distribution is T-big if the minimum probability for any domain element is at least T. We establish a lower bound of Ω(n/logn) for testing bigness of distributions on domains of size n. We then build on these lower bounds to give Ω(n/logn) lower bounds for testing monotonicity over a matching poset of size n and significantly improved lower bounds over the hypercube poset. We give sublinear sample complexity bounds for testing bigness and for testing monotonicity over the matching poset. We then give a number of tools for analyzing upper bounds on the sample complexity of the monotonicity testing problem.