skip to main content


Title: Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing.
We show that the square Hellinger distance between two Bayesian networks on the same directed graph, G, is subadditive with respect to the neighborhoods of G. Namely, if P and Q are the probability distributions defined by two Bayesian networks on the same DAG, our inequality states that the square Hellinger distance, H2(P,Q), between P and Q is upper bounded by the sum, ∑vH2(P{v}∪Πv,Q{v}∪Πv), of the square Hellinger distances between the marginals of P and Q on every node v and its parents Πv in the DAG. Importantly, our bound does not involve the conditionals but the marginals of P and Q. We derive a similar inequality for more general Markov Random Fields. As an application of our inequality, we show that distinguishing whether two Bayesian networks P and Q on the same (but potentially unknown) DAG satisfy P=Q vs dTV(P,Q)>ϵ can be performed from Õ (|Σ|3/4(d+1)⋅n/ϵ2) samples, where d is the maximum in-degree of the DAG and Σ the domain of each variable of the Bayesian networks. If P and Q are defined on potentially different and potentially unknown trees, the sample complexity becomes Õ (|Σ|4.5n/ϵ2), whose dependence on n,ϵ is optimal up to logarithmic factors. Lastly, if P and Q are product distributions over {0,1}n and Q is known, the sample complexity becomes O(n‾√/ϵ2), which is optimal up to constant factors.  more » « less
Award ID(s):
1650733
NSF-PAR ID:
10086311
Author(s) / Creator(s):
;
Date Published:
Journal Name:
30th Annual Conference on Learning Theory
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider testing and learning problems on causal Bayesian networks as defined by Pearl (Pearl, 2009). Given a causal Bayesian network  on a graph with n discrete variables and bounded in-degree and bounded `confounded components', we show that O(logn) interventions on an unknown causal Bayesian network  on the same graph, and Õ (n/ϵ2) samples per intervention, suffice to efficiently distinguish whether = or whether there exists some intervention under which  and  are farther than ϵ in total variation distance. We also obtain sample/time/intervention efficient algorithms for: (i) testing the identity of two unknown causal Bayesian networks on the same graph; and (ii) learning a causal Bayesian network on a given graph. Although our algorithms are non-adaptive, we show that adaptivity does not help in general: Ω(logn) interventions are necessary for testing the identity of two unknown causal Bayesian networks on the same graph, even adaptively. Our algorithms are enabled by a new subadditivity inequality for the squared Hellinger distance between two causal Bayesian networks. 
    more » « less
  2. We consider testing and learning problems on causal Bayesian networks as defined by Pearl (Pearl, 2009). Given a causal Bayesian network  on a graph with n discrete variables and bounded in-degree and bounded `confounded components', we show that O(logn) interventions on an unknown causal Bayesian network  on the same graph, and Õ (n/ϵ2) samples per intervention, suffice to efficiently distinguish whether = or whether there exists some intervention under which  and  are farther than ϵ in total variation distance. We also obtain sample/time/intervention efficient algorithms for: (i) testing the identity of two unknown causal Bayesian networks on the same graph; and (ii) learning a causal Bayesian network on a given graph. Although our algorithms are non-adaptive, we show that adaptivity does not help in general: Ω(logn) interventions are necessary for testing the identity of two unknown causal Bayesian networks on the same graph, even adaptively. Our algorithms are enabled by a new subadditivity inequality for the squared Hellinger distance between two causal Bayesian networks. 
    more » « less
  3. Total variation distance (TV distance) is a fundamental notion of distance between probability distributions. In this work, we introduce and study the problem of computing the TV distance of two product distributions over the domain {0,1}^n. In particular, we establish the following results.1. The problem of exactly computing the TV distance of two product distributions is #P-complete. This is in stark contrast with other distance measures such as KL, Chi-square, and Hellinger which tensorize over the marginals leading to efficient algorithms.2. There is a fully polynomial-time deterministic approximation scheme (FPTAS) for computing the TV distance of two product distributions P and Q where Q is the uniform distribution. This result is extended to the case where Q has a constant number of distinct marginals. In contrast, we show that when P and Q are Bayes net distributions the relative approximation of their TV distance is NP-hard.

     
    more » « less
  4. Kohayakawa, Y. ; Miyazawa, F.K. (Ed.)
    In this work we are interested in the problem of testing quantum entanglement. More specifically, we study the separability problem in quantum property testing, where one is given n copies of an unknown mixed quantum state ϱ on Cd⊗Cd , and one wants to test whether ϱ is separable or ϵ -far from all separable states in trace distance. We prove that n=Ω(d2/ϵ2) copies are necessary to test separability, assuming ϵ is not too small, viz. ϵ=Ω(1/d−−√) . We also study completely positive distributions on the grid [d]×[d] , as a classical analogue of separable states. We analogously prove that Ω(d/ϵ2) samples from an unknown distribution p are necessary to decide whether p is completely positive or ϵ -far from all completely positive distributions in total variation distance. 
    more » « less
  5. The Expectation-Maximization (EM) algorithm is a widely used method for maximum likelihood estimation in models with latent variables. For estimating mixtures of Gaussians, its iteration can be viewed as a soft version of the k-means clustering algorithm. Despite its wide use and applications, there are essentially no known convergence guarantees for this method. We provide global convergence guarantees for mixtures of two Gaussians with known covariance matrices. We show that the population version of EM, where the algorithm is given access to infinitely many samples from the mixture, converges geometrically to the correct mean vectors, and provide simple, closed-form expressions for the convergence rate. As a simple illustration, we show that, in one dimension, ten steps of the EM algorithm initialized at infinity result in less than 1\% error estimation of the means. In the finite sample regime, we show that, under a random initialization, Õ (d/ϵ2) samples suffice to compute the unknown vectors to within ϵ in Mahalanobis distance, where d is the dimension. In particular, the error rate of the EM based estimator is Õ (dn‾‾√) where n is the number of samples, which is optimal up to logarithmic factors. 
    more » « less