skip to main content


Title: Recovering Joint PMF from Pairwise Marginals
To overcome the curse of dimensionality in joint probability learning, recent work has proposed to recover the joint probability mass function (PMF) of an arbitrary number of random variables (RVs) from three-dimensional marginals, exploiting the uniqueness of tensor decomposition and the (unknown) dependence among the RVs. Nonetheless, accurately estimating three-dimensional marginals is still costly in terms of sample complexity. Tensor decomposition also poses a computationally intensive optimization problem. This work puts forth a new framework that learns the joint PMF using pairwise marginals that are relatively easy to acquire. The method is built upon nonnegative matrix factorization (NMF) theory, and features a Gram–Schmidt-like economical algorithm that works provably well under realistic conditions. Theoretical analysis of a recently proposed expectation maximization (EM) algorithm for joint PMF recovery is also presented. In particular, the EM algorithm is shown to provably improve upon the proposed pairwise marginal-based approach. Synthetic and real-data experiments are employed to showcase the effectiveness of the proposed approach.  more » « less
Award ID(s):
1808159
NSF-PAR ID:
10287503
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Asilomar Conference on Signals, Systems, and Computers
Page Range / eLocation ID:
356 to 360
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Coupled tensor decomposition aims at factoring a number of tensors that share some of their latent factors. Existing algorithms for coupled canonical polyadic decomposition (CPD) face serious scalablity challenges, especially when the number of tensors is large. However, a large amount of coupled tensors naturally arise in timely applications such as statistical learning, e.g., when estimating the joint probability mass function (PMF) of many random variables from marginal PMFs. Stochastic algorithms that admit lightweight updates exist for coupled decomposition, but these algorithms cannot handle complex constraints (e.g., the probability simplex constraint that is important in statistical learning) due to their sampling patterns. This work puts forth a simple data-sampling and block variable-updating strategy for simultaneously factoring a large number of coupled tensors. The proposed algorithm enjoys low per-iteration complexity and can easily handle constraints on latent factors. We also show that this multi-block algorithm admits a nice connection to the classic single-block stochastic proximal gradient (SPG), and thus it naturally inherits convergence properties of SPG. Synthetic and real-data experiments show that the proposed algorithm is very promising for statistical learning problems. 
    more » « less
  2. Mixture-of-Experts (MoE) is a widely popular model for ensemble learning and is a basic building block of highly successful modern neural networks as well as a component in Gated Recurrent Units (GRU) and Attention networks. However, present algorithms for learning MoE, including the EM algorithm and gradient descent, are known to get stuck in local optima. From a theoretical viewpoint, finding an efficient and provably consistent algorithm to learn the parameters remains a long standing open problem for more than two decades. In this paper, we introduce the first algorithm that learns the true parameters of a MoE model for a wide class of non-linearities with global consistency guarantees. While existing algorithms jointly or iteratively estimate the expert parameters and the gating parameters in the MoE, we propose a novel algorithm that breaks the deadlock and can directly estimate the expert parameters by sensing its echo in a carefully designed cross-moment tensor between the inputs and the output. Once the experts are known, the recovery of gating parameters still requires an EM algorithm; however, we show that the EM algorithm for this simplified problem, unlike the joint EM algorithm, converges to the true parameters. We empirically validate our algorithm on both the synthetic and real data sets in a variety of settings, and show superior performance to standard baselines. 
    more » « less
  3. Mixture-of-Experts (MoE) is a widely popular model for ensemble learning and is a basic building block of highly successful modern neural networks as well as a component in Gated Recurrent Units (GRU) and Attention networks. However, present algorithms for learning MoE, including the EM algorithm and gradient descent, are known to get stuck in local optima. From a theoretical viewpoint, finding an efficient and provably consistent algorithm to learn the parameters remains a long standing open problem for more than two decades. In this paper, we introduce the first algorithm that learns the true parameters of a MoE model for a wide class of non-linearities with global consistency guarantees. While existing algorithms jointly or iteratively estimate the expert parameters and the gating parameters in the MoE, we propose a novel algorithm that breaks the deadlock and can directly estimate the expert parameters by sensing its echo in a carefully designed cross-moment tensor between the inputs and the output. Once the experts are known, the recovery of gating parameters still requires an EM algorithm; however, we show that the EM algorithm for this simplified problem, unlike the joint EM algorithm, converges to the true parameters. We empirically validate our algorithm on both the synthetic and real data sets in a variety of settings, and show superior performance to standard baselines. 
    more » « less
  4. The data deluge comes with high demands for data labeling. Crowdsourcing (or, more generally, ensemble learning) techniques aim to produce accurate labels via integrating noisy, non-expert labeling from annotators. The classic Dawid-Skene estimator and its accompanying expectation maximization (EM) algorithm have been widely used, but the theoretical properties are not fully understood. Tensor methods were proposed to guarantee identification of the Dawid-Skene model, but the sample complexity is a hurdle for applying such approaches---since the tensor methods hinge on the availability of third-order statistics that are hard to reliably estimate given limited data. In this paper, we propose a framework using pairwise co-occurrences of the annotator responses, which naturally admits lower sample complexity. We show that the approach can identify the Dawid-Skene model under realistic conditions. We propose an algebraic algorithm reminiscent of convex geometry-based structured matrix factorization to solve the model identification problem efficiently, and an identifiability-enhanced algorithm for handling more challenging and critical scenarios. Experiments show that the proposed algorithms outperform the state-of-art algorithms under a variety of scenarios. 
    more » « less
  5. Abstract Energy 3D printing processes have enabled energy storage devices with complex structures, high energy density, and high power density. Among these processes, Freeze Nano Printing (FNP) has risen as a promising process. However, quality problems are among the biggest barriers for FNP. Particularly, the droplet solidification time in FNP governs thermal distribution, and subsequently determines product solidification, formation, and quality. To describe the solidification time, physical-based heat transfer model is built. But it is computationally intensive. The objective of this work is to build an efficient emulator for the physical model. There are several challenges unaddressed: 1) the solidification time at various locations, which is a multi-dimensional array response, needs to be modeled; 2) the construction and evaluation of the emulator at new process settings need to be quick and accurate. We integrate joint tensor decomposition and Nearest Neighbor Gaussian Process (NNGP) to construct an efficient multi-dimensional array response emulator with process settings as inputs. Specifically, structured joint tensor decomposition decomposes the multi-dimensional array responses at various process settings into the setting-specific core tensors and shared low dimensional factorization matrices. Then, each independent entry of the core tensor is modeled with an NNGP, which addresses the computationally intensive model estimation problem by sampling the nearest neighborhood samples. Finally, tensor reconstruction is performed to make predictions of solidification time for new process settings. The proposed framework is demonstrated by emulating the physical model of FNP, and compared with alternative tensor (multi-dimensional array) regression models. 
    more » « less