Total variation distance (TV distance) is a fundamental notion of distance between probability distributions. In this work, we introduce and study the problem of computing the TV distance of two product distributions over the domain {0,1}^n. In particular, we establish the following results.1. The problem of exactly computing the TV distance of two product distributions is #Pcomplete. This is in stark contrast with other distance measures such as KL, Chisquare, and Hellinger which tensorize over the marginals leading to efficient algorithms.2. There is a fully polynomialtime deterministic approximation scheme (FPTAS) for computing the TV distance of two product distributions P and Q where Q is the uniform distribution. This result is extended to the case where Q has a constant number of distinct marginals. In contrast, we show that when P and Q are Bayes net distributions the relative approximation of their TV distance is NPhard.
more » « less NSFPAR ID:
 10447252
 Date Published:
 Journal Name:
 International Joint Conference on Artificial Intelligence
 Page Range / eLocation ID:
 3479 to 3487
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

We show that the square Hellinger distance between two Bayesian networks on the same directed graph, G, is subadditive with respect to the neighborhoods of G. Namely, if P and Q are the probability distributions defined by two Bayesian networks on the same DAG, our inequality states that the square Hellinger distance, H2(P,Q), between P and Q is upper bounded by the sum, ∑vH2(P{v}∪Πv,Q{v}∪Πv), of the square Hellinger distances between the marginals of P and Q on every node v and its parents Πv in the DAG. Importantly, our bound does not involve the conditionals but the marginals of P and Q. We derive a similar inequality for more general Markov Random Fields. As an application of our inequality, we show that distinguishing whether two Bayesian networks P and Q on the same (but potentially unknown) DAG satisfy P=Q vs dTV(P,Q)>ϵ can be performed from Õ (Σ3/4(d+1)⋅n/ϵ2) samples, where d is the maximum indegree of the DAG and Σ the domain of each variable of the Bayesian networks. If P and Q are defined on potentially different and potentially unknown trees, the sample complexity becomes Õ (Σ4.5n/ϵ2), whose dependence on n,ϵ is optimal up to logarithmic factors. Lastly, if P and Q are product distributions over {0,1}n and Q is known, the sample complexity becomes O(n‾√/ϵ2), which is optimal up to constant factors.more » « less

Megow, Nicole ; Smith, Adam (Ed.)We revisit the classic Pandora’s Box (PB) problem under correlated distributions on the box values. Recent work of [Shuchi Chawla et al., 2020] obtained constant approximate algorithms for a restricted class of policies for the problem that visit boxes in a fixed order. In this work, we study the complexity of approximating the optimal policy which may adaptively choose which box to visit next based on the values seen so far. Our main result establishes an approximationpreserving equivalence of PB to the well studied Uniform Decision Tree (UDT) problem from stochastic optimization and a variant of the MinSum Set Cover (MSSC_f) problem. For distributions of support m, UDT admits a log m approximation, and while a constant factor approximation in polynomial time is a longstanding open problem, constant factor approximations are achievable in subexponential time [Ray Li et al., 2020]. Our main result implies that the same properties hold for PB and MSSC_f. We also study the case where the distribution over values is given more succinctly as a mixture of m product distributions. This problem is again related to a noisy variant of the Optimal Decision Tree which is significantly more challenging. We give a constantfactor approximation that runs in time n^Õ(m²/ε²) when the mixture components on every box are either identical or separated in TV distance by ε.more » « less

null (Ed.)We study the problems of identity and closeness testing of ndimensional product distributions. Prior works of Canonne et al. (2017) and Daskalakis and Pan (2017) have established tight sample complexity bounds for nontolerant testing over a binary alphabet: given two product distributions P and Q over a binary alphabet, distinguish between the cases P = Q and dTV(P;Q) > epsilon . We build on this prior work to give a more comprehensive map of the complexity of testing of product distributions by investigating tolerant testing with respect to several natural distance measures and over an arbitrary alphabet. Our study gives a finegrained understanding of how the sample complexity of tolerant testing varies with the distance measures for product distributions. In addition, we also extend one of our upper bounds on product distributions to boundeddegree Bayes nets.more » « less

null (Ed.)The 2Wasserstein distance (or RMS distance) is a useful measure of similarity between probability distributions with exciting applications in machine learning. For discrete distributions, the problem of computing this distance can be expressed in terms of finding a minimumcost perfect matching on a complete bipartite graph given by two multisets of points A, B ⊂ ℝ2, with A = B = n, where the ground distance between any two points is the squared Euclidean distance between them. Although there is a nearlinear time relative ∊approximation algorithm for the case where the ground distance is Euclidean (Sharathkumar and Agarwal, JACM 2020), all existing relative ∊approximation algorithms for the RMS distance take Ω(n3/2) time. This is primarily because, unlike Euclidean distance, squared Euclidean distance is not a metric. In this paper, for the RMS distance, we present a new ∊approximation algorithm that runs in O(n^5/4 poly{log n, 1/∊}) time. Our algorithm is inspired by a recent approach for finding a minimumcost perfect matching in bipartite planar graphs (Asathulla et al, TALG 2020). Their algorithm depends heavily on the existence of sublinear sized vertex separators as well as shortest path data structures that require planarity. Surprisingly, we are able to design a similar algorithm for a complete geometric graph that is far from planar and does not have any vertex separators. Central components of our algorithm include a quadtreebased distance that approximates the squared Euclidean distance and a data structure that supports both Hungarian search and augmentation in sublinear time.more » « less

We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution p over n elements, an explicitly given distribution q, and parameters 0< epsilon, delta < 1, we wish to distinguish, with probability at least 1delta, whether the distributions are identical versus epsilonfar in total variation distance. Most prior work focused on the case that delta = Omega(1), for which the sample complexity of identity testing is known to be Theta(sqrt{n}/epsilon^2). Given such an algorithm, one can achieve arbitrarily small values of delta via blackbox amplification, which multiplies the required number of samples by Theta(log(1/delta)). We show that blackbox amplification is suboptimal for any delta = o(1), and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is Theta((1/epsilon^2) (sqrt{n log(1/delta)} + log(1/delta))) for any n, epsilon, and delta. For the special case of uniformity testing, where the given distribution is the uniform distribution U_n over the domain, our new tester is surprisingly simple: to test whether p = U_n versus d_{TV} (p, U_n) >= epsilon, we simply threshold d_{TV}({p^}, U_n), where {p^} is the empirical probability distribution. The fact that this simple "plugin" estimator is sampleoptimal is surprising, even in the constant delta case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of epsilon and delta. An important contribution of this work lies in the analysis techniques that we introduce in this context. First, we exploit an underlying strong convexity property to bound from below the expectation gap in the completeness and soundness cases. Second, we give a new, fast method for obtaining provably correct empirical estimates of the true worstcase failure probability for a broad class of uniformity testing statistics over all possible input distributions  including all previously studied statistics for this problem. We believe that our novel analysis techniques will be useful for other distribution testing problems as well.more » « less