In this work, we consider the sample complexity required for testing the monotonicity of distributions over partial orders. A distribution p over a poset is monotone if, for any pair of domain elements x and y such that x ⪯ y, p(x) ≤ p(y).
To understand the sample complexity of this problem, we introduce a new property
called bigness over a finite domain, where the distribution is Tbig if the minimum probability for any domain element is at least T. We establish a lower bound of Ω(n/ log n) for testing bigness of distributions on domains of size n. We then build on these lower bounds to give Ω(n/ log n) lower bounds for testing monotonicity over a matching poset of size n and significantly improved lower bounds over the hypercube poset. We give sublinear sample complexity bounds for testing bigness and for testing monotonicity over the matching poset. We then give a number of tools for analyzing upper bounds on the sample complexity of the monotonicity testing problem. The previous lower bound for testing Monotonicity of
more »
« less
Towards Testing Monotonicity of Distributions Over General Posets
In this work, we consider the sample complexity required for testing the monotonicity of distributions over partial orders. A distribution p over a poset is {\em monotone} if, for any pair of domain elements x and y such that x⪯y, p(x)≤p(y). To understand the sample complexity of this problem, we introduce a new property called \emph{bigness} over a finite domain, where the distribution is Tbig if the minimum probability for any domain element is at least T. We establish a lower bound of Ω(n/logn) for testing bigness of distributions on domains of size n. We then build on these lower bounds to give Ω(n/logn) lower bounds for testing monotonicity over a matching poset of size n and significantly improved lower bounds over the hypercube poset. We give sublinear sample complexity bounds for testing bigness and for testing monotonicity over the matching poset. We then give a number of tools for analyzing upper bounds on the sample complexity of the monotonicity testing problem.
more »
« less
 Award ID(s):
 1740751
 NSFPAR ID:
 10108381
 Date Published:
 Journal Name:
 Proceedings of the ThirtySecond Conference on Learning Theory, PMLR
 Volume:
 99
 Page Range / eLocation ID:
 3482
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution p over n elements, an explicitly given distribution q, and parameters 0< epsilon, delta < 1, we wish to distinguish, with probability at least 1delta, whether the distributions are identical versus epsilonfar in total variation distance. Most prior work focused on the case that delta = Omega(1), for which the sample complexity of identity testing is known to be Theta(sqrt{n}/epsilon^2). Given such an algorithm, one can achieve arbitrarily small values of delta via blackbox amplification, which multiplies the required number of samples by Theta(log(1/delta)). We show that blackbox amplification is suboptimal for any delta = o(1), and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is Theta((1/epsilon^2) (sqrt{n log(1/delta)} + log(1/delta))) for any n, epsilon, and delta. For the special case of uniformity testing, where the given distribution is the uniform distribution U_n over the domain, our new tester is surprisingly simple: to test whether p = U_n versus d_{TV} (p, U_n) >= epsilon, we simply threshold d_{TV}({p^}, U_n), where {p^} is the empirical probability distribution. The fact that this simple "plugin" estimator is sampleoptimal is surprising, even in the constant delta case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of epsilon and delta. An important contribution of this work lies in the analysis techniques that we introduce in this context. First, we exploit an underlying strong convexity property to bound from below the expectation gap in the completeness and soundness cases. Second, we give a new, fast method for obtaining provably correct empirical estimates of the true worstcase failure probability for a broad class of uniformity testing statistics over all possible input distributions  including all previously studied statistics for this problem. We believe that our novel analysis techniques will be useful for other distribution testing problems as well.more » « less

We study the question of testing structured properties (classes) of discrete distributions. Specifically, given sample access to an arbitrary distribution D over [n] and a property P, the goal is to distinguish between D ∈ P and ℓ1(D, P) > ε. We develop a general algorithm for this question, which applies to a large range of “shapeconstrained” properties, including monotone, logconcave, tmodal, piecewisepolynomial, and Poisson Binomial distributions. Moreover, for all cases considered, our algorithm has nearoptimal sample complexity with regard to the domain size and is computationally efficient. For most of these classes, we provide the first nontrivial tester in the literature. In addition, we also describe a generic method to prove lower bounds for this problem, and use it to show our upper bounds are nearly tight. Finally, we extend some of our techniques to tolerant testing, deriving nearly–tight upper and lower bounds for the corresponding questions.more » « less

Lysyanskaya, Anna ; Handschuh, Helena (Ed.)We study the blackbox function inversion problem, which is the problem of finding x[N] such that f(x)=y, given as input some challenge point y in the image of a function f:[N][N], using T oracle queries to f and preprocessed advice 01S depending on f. We prove a number of new results about this problem, as follows. 1. We show an algorithm that works for any T and S satisfying TS2maxST=(N3) . In the important setting when ST, this improves on the celebrated algorithm of Fiat and Naor [STOC, 1991], which requires TS3N3. E.g., Fiat and Naor's algorithm is only nontrivial for SN23 , while our algorithm gives a nontrivial tradeoff for any SN12 . (Our algorithm and analysis are quite simple. As a consequence of this, we also give a selfcontained and simple proof of Fiat and Naor's original result, with certain optimizations left out for simplicity.) 2. We show a nonadaptive algorithm (i.e., an algorithm whose ith query xi is chosen based entirely on and y, and not on the f(x1)f(xi−1)) that works for any T and S satisfying S=(Nlog(NT)) giving the first nontrivial nonadaptive algorithm for this problem. E.g., setting T=Npolylog(N) gives S=(NloglogN). This answers a question due to CorriganGibbs and Kogan [TCC, 2019], who asked whether it was possible for a nonadaptive algorithm to work with parameters T and S satisfying T+SlogNo(N) . We also observe that our nonadaptive algorithm is what we call a guessandcheck algorithm, that is, it is nonadaptive and its final output is always one of the oracle queries x1xT. For guessandcheck algorithms, we prove a matching lower bound, therefore completely characterizing the achievable parameters (ST) for this natural class of algorithms. (CorriganGibbs and Kogan showed that any such lower bound for arbitrary nonadaptive algorithms would imply new circuit lower bounds.) 3. We show equivalence between function inversion and a natural decision version of the problem in both the worst case and the average case, and similarly for functions f:[N][M] with different ranges. All of the above results are most naturally described in a model with shared randomness (i.e., random coins shared between the preprocessing algorithm and the online algorithm). However, as an additional contribution, we show (using a technique from communication complexity due to Newman [IPL, 1991]) how to generically convert any algorithm that uses shared randomness into one that does not.more » « less

A Boolean {\em $k$monotone} function defined over a finite poset domain ${\cal D}$ alternates between the values $0$ and $1$ at most $k$ times on any ascending chain in ${\cal D}$. Therefore, $k$monotone functions are natural generalizations of the classical {\em monotone} functions, which are the {\em $1$monotone} functions. Motivated by the recent interest in $k$monotone functions in the context of circuit complexity and learning theory, and by the central role that monotonicity testing plays in the context of property testing, we initiate a systematic study of $k$monotone functions, in the property testing model. In this model, the goal is to distinguish functions that are $k$monotone (or are close to being $k$monotone) from functions that are far from being $k$monotone. Our results include the following: \begin{enumerate} \item We demonstrate a separation between testing $k$monotonicity and testing monotonicity, on the hypercube domain $\{0,1\}^d$, for $k\geq 3$; \item We demonstrate a separation between testing and learning on $\{0,1\}^d$, for $k=\omega(\log d)$: testing $k$monotonicity can be performed with $2^{O(\sqrt d \cdot \log d\cdot \log{1/\eps})}$ queries, while learning $k$monotone functions requires $2^{\Omega(k\cdot \sqrt d\cdot{1/\eps})}$ queries (Blais et al. (RANDOM 2015)). \item We present a tolerant test for functions $f\colon[n]^d\to \{0,1\}$ with complexity independent of $n$, which makes progress on a problem left open by Berman et al. (STOC 2014). \end{enumerate} Our techniques exploit the testingbylearning paradigm, use novel applications of Fourier analysis on the grid $[n]^d$, and draw connections to distribution testing techniques.more » « less