We study the problem of learning mixtures of Gaussians with censored data. Statistical learning with censored data is a classical problem, with numerous practical applications, however, finite-sample guarantees for even simple latent variable models such as Gaussian mixtures are missing. Formally, we are given censored data from a mixture of univariate Gaussians $$\sum_{i=1}^k w_i \mathcal{N}(\mu_i,\sigma^2),$$ i.e. the sample is observed only if it lies inside a set $$S$$. The goal is to learn the weights $$w_i$$ and the means $$\mu_i$$. We propose an algorithm that takes only $$\frac{1}{\varepsilon^{O(k)}}$$ samples to estimate the weights $$w_i$$ and the means $$\mu_i$$ within $$\varepsilon$$ error.
more »
« less
Ten Steps of EM Suffice for Mixtures of Two Gaussians
The Expectation-Maximization (EM) algorithm is a widely used method for maximum likelihood estimation in models with latent variables. For estimating mixtures of Gaussians, its iteration can be viewed as a soft version of the k-means clustering algorithm. Despite its wide use and applications, there are essentially no known convergence guarantees for this method. We provide global convergence guarantees for mixtures of two Gaussians with known covariance matrices. We show that the population version of EM, where the algorithm is given access to infinitely many samples from the mixture, converges geometrically to the correct mean vectors, and provide simple, closed-form expressions for the convergence rate. As a simple illustration, we show that, in one dimension, ten steps of the EM algorithm initialized at infinity result in less than 1\% error estimation of the means. In the finite sample regime, we show that, under a random initialization, Õ (d/ϵ2) samples suffice to compute the unknown vectors to within ϵ in Mahalanobis distance, where d is the dimension. In particular, the error rate of the EM based estimator is Õ (dn‾‾√) where n is the number of samples, which is optimal up to logarithmic factors.
more »
« less
- Award ID(s):
- 1650733
- PAR ID:
- 10086309
- Date Published:
- Journal Name:
- 30th Annual Conference on Learning Theory
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider a symmetric mixture of linear regressions with random samples from the pairwise comparison design, which can be seen as a noisy version of a type of Euclidean distance geometry problem. We analyze the expectation-maximization (EM) algorithm locally around the ground truth and establish that the sequence converges linearly, providing an $$\ell_\infty$$-norm guarantee on the estimation error of the iterates. Furthermore, we show that the limit of the EM sequence achieves the sharp rate of estimation in the $$\ell_2$$-norm, matching the information-theoretically optimal constant. We also argue through simulation that convergence from a random initialization is much more delicate in this setting, and does not appear to occur in general. Our results show that the EM algorithm can exhibit several unique behaviors when the covariate distribution is suitably structured.more » « less
-
Classical Mixtures of Experts (MoE) are Machine Learning models that involve partitioning the input space, with a separate "expert" model trained on each partition. Recently, MoE-based model architectures have become popular as a means to reduce training and inference costs. There, the partitioning function and the experts are both learnt jointly via gradient descent-type methods on the log-likelihood. In this paper we study theoretical guarantees of the Expectation Maximization (EM) algorithm for the training of MoE models. We first rigorously analyze EM for MoE where the conditional distribution of the target and latent variable conditioned on the feature variable belongs to an exponential family of distributions and show its equivalence to projected Mirror Descent with unit step size and a Kullback-Leibler Divergence regularizer. This perspective allows us to derive new convergence results and identify conditions for local linear convergence; In the special case of mixture of 2 linear or logistic experts, we additionally provide guarantees for linear convergence based on the signal-to-noise ratio. Experiments on synthetic and (small-scale) real-world data supports that EM outperforms the gradient descent algorithm both in terms of convergence rate and the achieved accuracy.more » « less
-
Recent works have shown that diffusion models can learn essentially any distribution provided one can perform score estimation. Yet it remains poorly understood under what settings score estimation is possible, let alone when practical gradient-based algorithms for this task can provably succeed. In this work, we give the first provably efficient results along these lines for one of the most fundamental distribution families, Gaussian mixture models. We prove that gradient descent on the denoising diffusion probabilistic model (DDPM) objective can efficiently recover the ground truth parameters of the mixture model in the following two settings: 1) We show gradient descent with random initialization learns mixtures of two spherical Gaussians in d dimensions with 1/poly(d)-separated centers. 2) We show gradient descent with a warm start learns mixtures of K spherical Gaussians with Ω(log(min(K,d)))-separated centers. A key ingredient in our proofs is a new connection between score-based methods and two other approaches to distribution learning, the EM algorithm and spectral methodsmore » « less
-
Estimating the volume of a convex body is a central problem in convex geometry and can be viewed as a continuous version of counting. We present a quantum algorithm that estimates the volume of an n-dimensional convex body within multiplicative error ϵ using Õ (n3+n2.5/ϵ) queries to a membership oracle and Õ (n5+n4.5/ϵ) additional arithmetic operations. For comparison, the best known classical algorithm uses Õ (n4+n3/ϵ2) queries and Õ (n6+n5/ϵ2) additional arithmetic operations. To the best of our knowledge, this is the first quantum speedup for volume estimation. Our algorithm is based on a refined framework for speeding up simulated annealing algorithms that might be of independent interest. This framework applies in the setting of "Chebyshev cooling", where the solution is expressed as a telescoping product of ratios, each having bounded variance. We develop several novel techniques when implementing our framework, including a theory of continuous-space quantum walks with rigorous bounds on discretization error.more » « less
An official website of the United States government

