We study the problem of learning mixtures of Gaussians with censored data. Statistical learning with censored data is a classical problem, with numerous practical applications, however, finitesample guarantees for even simple latent variable models such as Gaussian mixtures are missing. Formally, we are given censored data from a mixture of univariate Gaussians $$\sum_{i=1}^k w_i \mathcal{N}(\mu_i,\sigma^2),$$ i.e. the sample is observed only if it lies inside a set $S$. The goal is to learn the weights $w_i$ and the means $\mu_i$. We propose an algorithm that takes only $\frac{1}{\varepsilon^{O(k)}}$ samples to estimate the weights $w_i$ and the means $\mu_i$ within $\varepsilon$ error.
more »
« less
Ten Steps of EM Suffice for Mixtures of Two Gaussians
The ExpectationMaximization (EM) algorithm is a widely used method for maximum likelihood estimation in models with latent variables. For estimating mixtures of Gaussians, its iteration can be viewed as a soft version of the kmeans clustering algorithm. Despite its wide use and applications, there are essentially no known convergence guarantees for this method. We provide global convergence guarantees for mixtures of two Gaussians with known covariance matrices. We show that the population version of EM, where the algorithm is given access to infinitely many samples from the mixture, converges geometrically to the correct mean vectors, and provide simple, closedform expressions for the convergence rate. As a simple illustration, we show that, in one dimension, ten steps of the EM algorithm initialized at infinity result in less than 1\% error estimation of the means. In the finite sample regime, we show that, under a random initialization, Õ (d/ϵ2) samples suffice to compute the unknown vectors to within ϵ in Mahalanobis distance, where d is the dimension. In particular, the error rate of the EM based estimator is Õ (dn‾‾√) where n is the number of samples, which is optimal up to logarithmic factors.
more »
« less
 Award ID(s):
 1650733
 NSFPAR ID:
 10086309
 Date Published:
 Journal Name:
 30th Annual Conference on Learning Theory
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We consider a symmetric mixture of linear regressions with random samples from the pairwise comparison design, which can be seen as a noisy version of a type of Euclidean distance geometry problem. We analyze the expectationmaximization (EM) algorithm locally around the ground truth and establish that the sequence converges linearly, providing an $\ell_\infty$norm guarantee on the estimation error of the iterates. Furthermore, we show that the limit of the EM sequence achieves the sharp rate of estimation in the $\ell_2$norm, matching the informationtheoretically optimal constant. We also argue through simulation that convergence from a random initialization is much more delicate in this setting, and does not appear to occur in general. Our results show that the EM algorithm can exhibit several unique behaviors when the covariate distribution is suitably structured.more » « less

Estimating the volume of a convex body is a central problem in convex geometry and can be viewed as a continuous version of counting. We present a quantum algorithm that estimates the volume of an ndimensional convex body within multiplicative error ϵ using Õ (n3+n2.5/ϵ) queries to a membership oracle and Õ (n5+n4.5/ϵ) additional arithmetic operations. For comparison, the best known classical algorithm uses Õ (n4+n3/ϵ2) queries and Õ (n6+n5/ϵ2) additional arithmetic operations. To the best of our knowledge, this is the first quantum speedup for volume estimation. Our algorithm is based on a refined framework for speeding up simulated annealing algorithms that might be of independent interest. This framework applies in the setting of "Chebyshev cooling", where the solution is expressed as a telescoping product of ratios, each having bounded variance. We develop several novel techniques when implementing our framework, including a theory of continuousspace quantum walks with rigorous bounds on discretization error.more » « less

We study the convergence properties of the ExpectationMaximization algorithm in the Naive Bayes model. We show that EM can get stuck in regions of slow convergence, even when the features are binary and i.i.d. conditioning on the class label, and even under random (i.e. non worstcase) initialization. In turn, we show that EM can be bootstrapped in a pretraining step that computes a good initialization. From this initialization we show theoretically and experimentally that EM converges exponentially fast to the true model parameters. Our bootstrapping method amounts to running the EM algorithm on appropriately centered iterates of small magnitude, which as we show corresponds to effectively performing power iteration on the covariance matrix of the mixture model, although power iteration is performed under the hood by EM itself. As such, we call our bootstrapping approach “power EM.” Specifically for the case of two binary features, we show global exponentially fast convergence of EM, even without bootstrapping. Finally, as the Naive Bayes model is quite expressive, we show as corollaries of our convergence results that the EM algorithm globally converges to the true model parameters for mixtures of two Gaussians, recovering recent results of Xu et al.’2016 and Daskalakis et al. 2017.more » « less

We consider the problem of spherical Gaussian Mixture models with 𝑘≥3 components when the components are well separated. A fundamental previous result established that separation of Ω(log𝑘‾‾‾‾‾√) is necessary and sufficient for identifiability of the parameters with \textit{polynomial} sample complexity (Regev and Vijayaraghavan, 2017). In the same context, we show that 𝑂̃ (𝑘𝑑/𝜖2) samples suffice for any 𝜖≲1/𝑘, closing the gap from polynomial to linear, and thus giving the first optimal sample upper bound for the parameter estimation of wellseparated Gaussian mixtures. We accomplish this by proving a new result for the ExpectationMaximization (EM) algorithm: we show that EM converges locally, under separation Ω(log𝑘‾‾‾‾‾√). The previous bestknown guarantee required Ω(𝑘‾‾√) separation (Yan, et al., 2017). Unlike prior work, our results do not assume or use prior knowledge of the (potentially different) mixing weights or variances of the Gaussian components. Furthermore, our results show that the finitesample error of EM does not depend on nonuniversal quantities such as pairwise distances between means of Gaussian components.more » « less