skip to main content


Title: Learning Mixtures of Smooth Product Distributions: Identifiability and Algorithm
We study the problem of learning a mixture model of non-parametric product distributions. The problem of learning a mixture model is that of finding the component distributions along with the mixing weights using observed samples generated from the mixture. The problem is well-studied in the parametric setting, i.e., when the component distributions are members of a parametric family - such as Gaussian distributions. In this work, we focus on multivariate mixtures of non-parametric product distributions and propose a two-stage approach which recovers the component distributions of the mixture under a smoothness condition. Our approach builds upon the identifiability properties of the canonical polyadic (low-rank) decomposition of tensors, in tandem with Fourier and Shannon-Nyquist sampling staples from signal processing. We demonstrate the effectiveness of the approach on synthetic and real datasets.  more » « less
Award ID(s):
1704074 1447788
NSF-PAR ID:
10106338
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings AISTATS
Volume:
89
Page Range / eLocation ID:
388--396
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Belkin, M ; Kpotufe, S (Ed.)
    We give an algorithm for source identification of a mixture of k product distributions on n bits. This is a fundamental problem in machine learning with many applications. Our algorithm identifies the source parameters of an identifiable mixture, given, as input, approximate values of multilinear moments (derived, for instance, from a sufficiently large sample), using 2^O(k^2) n^O(k) arithmetic operations. Our result is the first explicit bound on the computational complexity of source identification of such mixtures. The running time improves previous results by Feldman, O’Donnell, and Servedio (FOCS 2005) and Chen and Moitra (STOC 2019) that guaranteed only learning the mixture (without parametric identification of the source). Our analysis gives a quantitative version of a qualitative characterization of identifiable sources that is due to Tahmasebi, Motahari, and Maddah-Ali (ISIT 2018). 
    more » « less
  2. We study the problem of learning nonparametric distributions in a finite mixture, and establish tight bounds on the sample complexity for learning the component distributions in such models.Namely, we are given i.i.d. samples from a pdf f where f=w1f1+w2f2,w1+w2=1,w1,w2>0 and we are interested in learning each component fi .Without any assumptions on fi , this problem is ill-posed.In order to identify the components fi , we assume that each fi can be written as a convolution of a Gaussian and a compactly supported density νi with supp(ν1)∩supp(ν2)=∅ .Our main result shows that (1ε)Ω(loglog1ε) samples are required for estimating each fi . The proof relies on a quantitative Tauberian theorem that yields a fast rate of approximation with Gaussians, which may be of independent interest. To show this is tight, we also propose an algorithm that uses (1ε)O(loglog1ε) samples to estimate each fi . Unlike existing approaches to learning latent variable models based on moment-matching and tensor methods, our proof instead involves a delicate analysis of an ill-conditioned linear system via orthogonal functions.Combining these bounds, we conclude that the optimal sample complexity of this problem properly lies in between polynomial and exponential, which is not common in learning theory. 
    more » « less
  3. Summary

    Motivated by an imaging study, the paper develops a non-parametric testing procedure for testing the null hypothesis that two samples of curves observed at discrete grids and with noise have the same underlying distribution. The objective is to compare formally white matter tract profiles between healthy individuals and multiple-sclerosis patients, as assessed by conventional diffusion tensor imaging measures. We propose to decompose the curves by using functional principal component analysis of a mixture process, which we refer to as marginal functional principal component analysis. This approach reduces the dimension of the testing problem in a way that enables the use of traditional non-parametric univariate testing procedures. The procedure is computationally efficient and accommodates different sampling designs. Numerical studies are presented to validate the size and power properties of the test in many realistic scenarios. In these cases, the test proposed has been found to be more powerful than its primary competitor. Application to the diffusion tensor imaging data reveals that all the tracts studied are associated with multiple sclerosis and the choice of the diffusion tensor image measurement is important when assessing axonal disruption.

     
    more » « less
  4. Structured point process data harvested from various platforms poses new challenges to the machine learning community. To cluster repeatedly observed marked point processes, we propose a novel mixture model of multi-level marked point processes for identifying potential heterogeneity in the observed data. Specifically, we study a matrix whose entries are marked log-Gaussian Cox processes and cluster rows of such a matrix. An efficient semi-parametric Expectation-Solution (ES) algorithm combined with functional principal component analysis (FPCA) of point processes is proposed for model estimation. The effectiveness of the proposed framework is demonstrated through simulation studies and real data analyses. 
    more » « less
  5. Zhang, Aidong ; Rangwala, Huzefa (Ed.)
    Zero-inflated, heavy-tailed spatiotemporal data is common across science and engineering, from climate science to meteorology and seismology. A central modeling objective in such settings is to forecast the intensity, frequency, and timing of extreme and non-extreme events; yet in the context of deep learning, this objective presents several key challenges. First, a deep learning framework applied to such data must unify a mixture of distributions characterizing the zero events, moderate events, and extreme events. Second, the framework must be capable of enforcing parameter constraints across each component of the mixture distribution. Finally, the framework must be flexible enough to accommodate for any changes in the threshold used to define an extreme event after training. To address these challenges, we propose Deep Extreme Mixture Model (DEMM), fusing a deep learning-based hurdle model with extreme value theory to enable point and distribution prediction of zero-inflated, heavy-tailed spatiotemporal variables. The framework enables users to dynamically set a threshold for defining extreme events at inference-time without the need for retraining. We present an extensive experimental analysis applying DEMM to precipitation forecasting, and observe significant improvements in point and distribution prediction. All code is available at https://github.com/andrewmcdonald27/DeepExtremeMixtureModel. 
    more » « less