The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 13 until 2:00 AM ET on Saturday, September 14 due to maintenance. We apologize for the inconvenience.

Determinantal point processes (DPPs) are popular probabilistic models of diversity. In this paper, we investigate DPPs from a new perspective: property testing of distributions. Given sample access to an unknown distribution q over the subsets of a ground set, we aim to distinguish whether q is a DPP distribution or ϵ-far from all DPP distributions in ℓ1-distance. In this work, we propose the first algorithm for testing DPPs. Furthermore, we establish a matching lower bound on the sample complexity of DPP testing. This lower bound also extends to showing a new hardness result for the problem of testing the more general class of log-submodular distributions more »« less

We propose a new setting for testing properties of distributions while receiving samples from several distributions, but few samples per distribution. Given samples from s distributions, p_1, p_2, …, p_s, we design testers for the following problems: (1) Uniformity Testing: Testing whether all the p_i’s are uniform or ε-far from being uniform in ℓ_1-distance (2) Identity Testing: Testing whether all the p_i’s are equal to an explicitly given distribution q or ε-far from q in ℓ_1-distance, and (3) Closeness Testing: Testing whether all the p_i’s are equal to a distribution q which we have sample access to, or ε-far from q in ℓ_1-distance. By assuming an additional natural condition about the source distributions, we provide sample optimal testers for all of these problems.

We propose a new setting for testing properties of distributions while receiving samples from several distributions, but few samples per distribution. Given samples from s distributions, p_1, p_2, …, p_s, we design testers for the following problems: (1) Uniformity Testing: Testing whether all the p_i’s are uniform or ε-far from being uniform in ℓ_1-distance (2) Identity Testing: Testing whether all the p_i’s are equal to an explicitly given distribution q or ε-far from q in ℓ_1-distance, and (3) Closeness Testing: Testing whether all the p_i’s are equal to a distribution q which we have sample access to, or ε-far from q in ℓ_1-distance. By assuming an additional natural condition about the source distributions, we provide sample optimal testers for all of these problems.

Alaikbarpour, A.; Gouleakis, T.; Peebles, J.; Rubinfeld, R.; Yodpinyanee, A(
, Proceedings of the Thirty-Second Conference on Learning Theory (COLT 2019))

In this work, we consider the sample complexity required for testing the monotonicity of distributions over partial orders. A distribution p over a poset is monotone if, for any pair of domain elements x and y such that x ⪯ y, p(x) ≤ p(y).
To understand the sample complexity of this problem, we introduce a new property
called bigness over a finite domain, where the distribution is T-big if the minimum probability for any domain element is at least T. We establish a lower bound of Ω(n/ log n) for testing bigness of distributions on domains of size n. We then build on these lower bounds to give Ω(n/ log n) lower bounds for testing monotonicity over a matching poset of size n and significantly improved lower bounds over the hypercube poset. We give sublinear sample complexity bounds for testing bigness and for testing monotonicity over the matching poset. We then give a number of tools for analyzing upper bounds on the sample complexity of the monotonicity testing problem. The previous lower bound for testing Monotonicity of

Alaikbarpour, Maryam; Gouleakis, Themis; Peebles, John; Rubinfeld, Ronitt; Yodpinyanee, Anak(
, Proceedings of the Thirty-Second Conference on Learning Theory, PMLR)

In this work, we consider the sample complexity required for testing the monotonicity of distributions over partial orders. A distribution p over a poset is {\em monotone} if, for any pair of domain elements x and y such that x⪯y, p(x)≤p(y). To understand the sample complexity of this problem, we introduce a new property called \emph{bigness} over a finite domain, where the distribution is T-big if the minimum probability for any domain element is at least T. We establish a lower bound of Ω(n/logn) for testing bigness of distributions on domains of size n. We then build on these lower bounds to give Ω(n/logn) lower bounds for testing monotonicity over a matching poset of size n and significantly improved lower bounds over the hypercube poset. We give sublinear sample complexity bounds for testing bigness and for testing monotonicity over the matching poset. We then give a number of tools for analyzing upper bounds on the sample complexity of the monotonicity testing problem.

Diakonikolas, Ilias; Gouleakis, Themis; Peebles, John; Price, Eric(
, 45th International Colloquium on Automata, Languages, and Automata)

We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution p over n elements, an explicitly given distribution q, and parameters 0< epsilon, delta < 1, we wish to distinguish, with probability at least 1-delta, whether the distributions are identical versus epsilon-far in total variation distance. Most prior work focused on the case that delta = Omega(1), for which the sample complexity of identity testing is known to be Theta(sqrt{n}/epsilon^2). Given such an algorithm, one can achieve arbitrarily small values of delta via black-box amplification, which multiplies the required number of samples by Theta(log(1/delta)). We show that black-box amplification is suboptimal for any delta = o(1), and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is Theta((1/epsilon^2) (sqrt{n log(1/delta)} + log(1/delta))) for any n, epsilon, and delta. For the special case of uniformity testing, where the given distribution is the uniform distribution U_n over the domain, our new tester is surprisingly simple: to test whether p = U_n versus d_{TV} (p, U_n) >= epsilon, we simply threshold d_{TV}({p^}, U_n), where {p^} is the empirical probability distribution. The fact that this simple "plug-in" estimator is sample-optimal is surprising, even in the constant delta case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of epsilon and delta. An important contribution of this work lies in the analysis techniques that we introduce in this context. First, we exploit an underlying strong convexity property to bound from below the expectation gap in the completeness and soundness cases. Second, we give a new, fast method for obtaining provably correct empirical estimates of the true worst-case failure probability for a broad class of uniformity testing statistics over all possible input distributions - including all previously studied statistics for this problem. We believe that our novel analysis techniques will be useful for other distribution testing problems as well.

Gatmiry, Khashayar, Aliakbarpour, Maryam, and Jegelka, Stefanie. Testing Determinantal Point Processes. Retrieved from https://par.nsf.gov/biblio/10221377. Annual Conference on Neural Information Processing Systems (NeurIPS 2020) .

Gatmiry, Khashayar, Aliakbarpour, Maryam, & Jegelka, Stefanie. Testing Determinantal Point Processes. Annual Conference on Neural Information Processing Systems (NeurIPS 2020), (). Retrieved from https://par.nsf.gov/biblio/10221377.

Gatmiry, Khashayar, Aliakbarpour, Maryam, and Jegelka, Stefanie.
"Testing Determinantal Point Processes". Annual Conference on Neural Information Processing Systems (NeurIPS 2020) (). Country unknown/Code not available. https://par.nsf.gov/biblio/10221377.

@article{osti_10221377,
place = {Country unknown/Code not available},
title = {Testing Determinantal Point Processes},
url = {https://par.nsf.gov/biblio/10221377},
abstractNote = {Determinantal point processes (DPPs) are popular probabilistic models of diversity. In this paper, we investigate DPPs from a new perspective: property testing of distributions. Given sample access to an unknown distribution q over the subsets of a ground set, we aim to distinguish whether q is a DPP distribution or ϵ-far from all DPP distributions in ℓ1-distance. In this work, we propose the first algorithm for testing DPPs. Furthermore, we establish a matching lower bound on the sample complexity of DPP testing. This lower bound also extends to showing a new hardness result for the problem of testing the more general class of log-submodular distributions},
journal = {Annual Conference on Neural Information Processing Systems (NeurIPS 2020)},
author = {Gatmiry, Khashayar and Aliakbarpour, Maryam and Jegelka, Stefanie},
editor = {null}
}

Warning: Leaving National Science Foundation Website

You are now leaving the National Science Foundation website to go to a non-government website.

Website:

NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.