We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution p over n elements, an explicitly given distribution q, and parameters 0< epsilon, delta < 1, we wish to distinguish, with probability at least 1delta, whether the distributions are identical versus epsilonfar in total variation distance. Most prior work focused on the case that delta = Omega(1), for which the sample complexity of identity testing is known to be Theta(sqrt{n}/epsilon^2). Given such an algorithm, one can achieve arbitrarily small values of delta via blackbox amplification, which multiplies the required number of samples by Theta(log(1/delta)). We show that blackbox amplification is suboptimal for any delta = o(1), and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is Theta((1/epsilon^2) (sqrt{n log(1/delta)} + log(1/delta))) for any n, epsilon, and delta. For the special case of uniformity testing, where the given distribution is the uniform distribution U_n over the domain, our new tester is surprisingly simple: to test whether p = U_n versus d_{TV} (p, U_n) >= epsilon, wemore »
Approximating the Distance to Monotonicity of Boolean Functions
We design a nonadaptive algorithm that, given a Boolean function f: {0, 1}^n → {0, 1} which is αfar from monotone, makes poly(n, 1/α) queries and returns an estimate that, with high probability, is an Otilde(\sqrt{n})approximation to the distance of f to monotonicity. Furthermore, we show that for any constant k > 0, approximating the distance to monotonicity up to n^(1/2−k)factor requires 2^{n^k} nonadaptive queries, thereby ruling out a poly(n, 1/α)query nonadaptive algorithm for such approximations. This answers a question of Seshadhri (Property Testing Review, 2014) for the case of nonadaptive algorithms. Approximating the distance to a property is closely related to tolerantly testing that property. Our lower bound stands in contrast to standard (nontolerant) testing of monotonicity that can be done nonadaptively with Otilde(n/ε^2) queries.
We obtain our lower bound by proving an analogous bound for erasureresilient testers. An αerasureresilient tester for a desired property gets oracle access to a function that has at most an α fraction of values erased. The tester has to accept (with probability at least 2/3) if the erasures can be filled in to ensure that the resulting function has the property and to reject (with probability at least 2/3) if every completion of erasures more »
 Award ID(s):
 1909612
 Publication Date:
 NSFPAR ID:
 10185525
 Journal Name:
 Proceedings of the 2020 ACMSIAM Symposium on Discrete Algorithms, SODA 2020
 Page Range or eLocationID:
 19952009
 Sponsoring Org:
 National Science Foundation
More Like this


We study the classic set cover problem from the perspective of sublinear algorithms. Given access to a collection of m sets over n elements in the query model, we show that sublinear algorithms derived from existing techniques have almost tight query complexities. On one hand, first we show an adaptation of the streaming algorithm presented in [17] to the sublinear query model, that returns an αapproximate cover using Õ(m(n/k)^1/(α–1) + nk) queries to the input, where k denotes the value of a minimum set cover. We then complement this upper bound by proving that for lower values of k, the required number of queries is , even for estimating the optimal cover size. Moreover, we prove that even checking whether a given collection of sets covers all the elements would require Ω(nk) queries. These two lower bounds provide strong evidence that the upper bound is almost tight for certain values of the parameter k. On the other hand, we show that this bound is not optimal for larger values of the parameter k, as there exists a (1 + ε)approximation algorithm with Õ(mn/kε^2) queries. We show that this bound is essentially tight for sufficiently small constant ε, by establishing amore »

We study the problem of estimating the value of sums of the form Sp≜∑(xip) when one has the ability to sample xi≥0 with probability proportional to its magnitude. When p=2, this problem is equivalent to estimating the selectivity of a selfjoin query in database systems when one can sample rows randomly. We also study the special case when {xi} is the degree sequence of a graph, which corresponds to counting the number of pstars in a graph when one has the ability to sample edges randomly. Our algorithm for a (1±ε)multiplicative approximation of Sp has query and time complexities O(mloglognϵ2S1/pp). Here, m=∑xi/2 is the number of edges in the graph, or equivalently, half the number of records in the database table. Similarly, n is the number of vertices in the graph and the number of unique values in the database table. We also provide tight lower bounds (up to polylogarithmic factors) in almost all cases, even when {xi} is a degree sequence and one is allowed to use the structure of the graph to try to get a better estimate. We are not aware of any prior lower bounds on the problem of join selectivity estimation. For the graph problem,more »

The approximate degree of a Boolean function f is the least degree of a real polynomial that approximates f pointwise to error at most 1/3. The approximate degree of f is known to be a lower bound on the quantum query complexity of f (Beals et al., FOCS 1998 and J. ACM 2001). We find tight or nearly tight bounds on the approximate degree and quantum query complexities of several basic functions. Specifically, we show the following. kDistinctness: For any constant k, the approximate degree and quantum query complexity of the kdistinctness function is Ω(n3/4−1/(2k)). This is nearly tight for large k, as Belovs (FOCS 2012) has shown that for any constant k, the approximate degree and quantum query complexity of kdistinctness is O(n3/4−1/(2k+2−4)). Image size testing: The approximate degree and quantum query complexity of testing the size of the image of a function [n]→[n] is Ω~(n1/2). This proves a conjecture of Ambainis et al. (SODA 2016), and it implies tight lower bounds on the approximate degree and quantum query complexity of the following natural problems. kJunta testing: A tight Ω~(k1/2) lower bound for kjunta testing, answering the main open question of Ambainis et al. (SODA 2016). Statistical distance frommore »

The noise sensitivity of a Boolean function f: {0,1}^n  > {0,1} is one of its fundamental properties. For noise parameter delta, the noise sensitivity is denoted as NS_{delta}[f]. This quantity is defined as follows: First, pick x = (x_1,...,x_n) uniformly at random from {0,1}^n, then pick z by flipping each x_i independently with probability delta. NS_{delta}[f] is defined to equal Pr [f(x) != f(z)]. Much of the existing literature on noise sensitivity explores the following two directions: (1) Showing that functions with low noisesensitivity are structured in certain ways. (2) Mathematically showing that certain classes of functions have low noise sensitivity. Combined, these two research directions show that certain classes of functions have low noise sensitivity and therefore have useful structure. The fundamental importance of noise sensitivity, together with this wealth of structural results, motivates the algorithmic question of approximating NS_{delta}[f] given an oracle access to the function f. We show that the standard sampling approach is essentially optimal for general Boolean functions. Therefore, we focus on estimating the noise sensitivity of monotone functions, which form an important subclass of Boolean functions, since many functions of interest are either monotone or can be simply transformed into a monotone functionmore »