A number of applications require two-sample testing on ranked preference data. For instance, in crowdsourcing, there is a long-standing question of whether pairwise comparison data provided by people is distributed similar to ratings-converted-to-comparisons. Other examples include sports data analysis and peer grading. In this paper, we design two-sample tests for pairwise comparison data and ranking data. For our two-sample test for pairwise comparison data, we establish an upper bound on the sample complexity required to correctly distinguish between the distributions of the two sets of samples. Our test requires essentially no assumptions on the distributions. We then prove complementary lowermore »
Testing Shape Restrictions of Discrete Distributions
We study the question of testing structured properties (classes) of discrete distributions. Specifically, given sample access to an arbitrary distribution D over [n] and a property P, the goal is to distinguish between D ∈ P and ℓ1(D, P) > ε. We develop a general algorithm for this question, which applies to a large range of “shape-constrained” properties, including monotone, log-concave, t-modal, piecewise-polynomial, and Poisson Binomial distributions. Moreover, for all cases considered, our algorithm has near-optimal sample complexity with regard to the domain size and is computationally efficient. For most of these classes, we provide the first non-trivial tester in the literature. In addition, we also describe a generic method to prove lower bounds for this problem, and use it to show our upper bounds are nearly tight. Finally, we extend some of our techniques to tolerant testing, deriving nearly–tight upper and lower bounds for the corresponding questions.
- Award ID(s):
- 1650733
- Publication Date:
- NSF-PAR ID:
- 10078631
- Journal Name:
- Theory of computing systems
- Volume:
- 62
- Issue:
- 1
- Page Range or eLocation-ID:
- 4-62
- ISSN:
- 1433-0490
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The approximate degree of a Boolean function f is the least degree of a real polynomial that approximates f pointwise to error at most 1/3. The approximate degree of f is known to be a lower bound on the quantum query complexity of f (Beals et al., FOCS 1998 and J. ACM 2001). We find tight or nearly tight bounds on the approximate degree and quantum query complexities of several basic functions. Specifically, we show the following. k-Distinctness: For any constant k, the approximate degree and quantum query complexity of the k-distinctness function is Ω(n3/4−1/(2k)). This is nearly tight formore »
-
We study the fundamental problems of identity testing (goodness of fit), and closeness testing (two sample test) of distributions over k elements, under differential privacy. While the problems have a long history in statistics, finite sample bounds for these problems have only been established recently. In this work, we derive upper and lower bounds on the sample complexity of both the problems under (epsilon, delta)-differential privacy. We provide sample optimal algorithms for identity testing problem for all parameter ranges, and the first results for closeness testing. Our closeness testing bounds are optimal in the sparse regime where the number ofmore »
-
We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution p over n elements, an explicitly given distribution q, and parameters 0< epsilon, delta < 1, we wish to distinguish, with probability at least 1-delta, whether the distributions are identical versus epsilon-far in total variation distance. Most prior work focused on the case that delta = Omega(1), for which the sample complexity of identity testing is known to be Theta(sqrt{n}/epsilon^2). Given such an algorithm, one can achieve arbitrarily small values of delta viamore »
-
We study the problems of identity and closeness testing of n-dimensional product distributions. Prior works of Canonne et al. (2017) and Daskalakis and Pan (2017) have established tight sample complexity bounds for non-tolerant testing over a binary alphabet: given two product distributions P and Q over a binary alphabet, distinguish between the cases P = Q and dTV(P;Q) > epsilon . We build on this prior work to give a more comprehensive map of the complexity of testing of product distributions by investigating tolerant testing with respect to several natural distance measures and over an arbitrary alphabet. Our study givesmore »