Coupled tensor decomposition aims at factoring a number of tensors that share some of their latent factors. Existing algorithms for coupled canonical polyadic decomposition (CPD) face serious scalablity challenges, especially when the number of tensors is large. However, a large amount of coupled tensors naturally arise in timely applications such as statistical learning, e.g., when estimating the joint probability mass function (PMF) of many random variables from marginal PMFs. Stochastic algorithms that admit lightweight updates exist for coupled decomposition, but these algorithms cannot handle complex constraints (e.g., the probability simplex constraint that is important in statistical learning) due to their sampling patterns. This work puts forth a simple data-sampling and block variable-updating strategy for simultaneously factoring a large number of coupled tensors. The proposed algorithm enjoys low per-iteration complexity and can easily handle constraints on latent factors. We also show that this multi-block algorithm admits a nice connection to the classic single-block stochastic proximal gradient (SPG), and thus it naturally inherits convergence properties of SPG. Synthetic and real-data experiments show that the proposed algorithm is very promising for statistical learning problems.
more »
« less
Recovering Joint PMF from Pairwise Marginals
To overcome the curse of dimensionality in joint probability learning, recent work has proposed to recover the joint probability mass function (PMF) of an arbitrary number of random variables (RVs) from three-dimensional marginals, exploiting the uniqueness of tensor decomposition and the (unknown) dependence among the RVs. Nonetheless, accurately estimating three-dimensional marginals is still costly in terms of sample complexity. Tensor decomposition also poses a computationally intensive optimization problem. This work puts forth a new framework that learns the joint PMF using pairwise marginals that are relatively easy to acquire. The method is built upon nonnegative matrix factorization (NMF) theory, and features a Gram–Schmidt-like economical algorithm that works provably well under realistic conditions. Theoretical analysis of a recently proposed expectation maximization (EM) algorithm for joint PMF recovery is also presented. In particular, the EM algorithm is shown to provably improve upon the proposed pairwise marginal-based approach. Synthetic and real-data experiments are employed to showcase the effectiveness of the proposed approach.
more »
« less
- Award ID(s):
- 1808159
- NSF-PAR ID:
- 10287503
- Date Published:
- Journal Name:
- Asilomar Conference on Signals, Systems, and Computers
- Page Range / eLocation ID:
- 356 to 360
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Mixture-of-Experts (MoE) is a widely popular model for ensemble learning and is a basic building block of highly successful modern neural networks as well as a component in Gated Recurrent Units (GRU) and Attention networks. However, present algorithms for learning MoE, including the EM algorithm and gradient descent, are known to get stuck in local optima. From a theoretical viewpoint, finding an efficient and provably consistent algorithm to learn the parameters remains a long standing open problem for more than two decades. In this paper, we introduce the first algorithm that learns the true parameters of a MoE model for a wide class of non-linearities with global consistency guarantees. While existing algorithms jointly or iteratively estimate the expert parameters and the gating parameters in the MoE, we propose a novel algorithm that breaks the deadlock and can directly estimate the expert parameters by sensing its echo in a carefully designed cross-moment tensor between the inputs and the output. Once the experts are known, the recovery of gating parameters still requires an EM algorithm; however, we show that the EM algorithm for this simplified problem, unlike the joint EM algorithm, converges to the true parameters. We empirically validate our algorithm on both the synthetic and real data sets in a variety of settings, and show superior performance to standard baselines.more » « less
-
Mixture-of-Experts (MoE) is a widely popular model for ensemble learning and is a basic building block of highly successful modern neural networks as well as a component in Gated Recurrent Units (GRU) and Attention networks. However, present algorithms for learning MoE, including the EM algorithm and gradient descent, are known to get stuck in local optima. From a theoretical viewpoint, finding an efficient and provably consistent algorithm to learn the parameters remains a long standing open problem for more than two decades. In this paper, we introduce the first algorithm that learns the true parameters of a MoE model for a wide class of non-linearities with global consistency guarantees. While existing algorithms jointly or iteratively estimate the expert parameters and the gating parameters in the MoE, we propose a novel algorithm that breaks the deadlock and can directly estimate the expert parameters by sensing its echo in a carefully designed cross-moment tensor between the inputs and the output. Once the experts are known, the recovery of gating parameters still requires an EM algorithm; however, we show that the EM algorithm for this simplified problem, unlike the joint EM algorithm, converges to the true parameters. We empirically validate our algorithm on both the synthetic and real data sets in a variety of settings, and show superior performance to standard baselines.more » « less
-
Coarse-grained (CG) molecular dynamics can be a powerful method for probing complex processes. However, most CG force fields use pairwise nonbonded interaction potentials sets, which can limit their ability to capture complex multi-body phenomena such as the hydrophobic effect. As the hydrophobic effect primarily manifests itself due to the nonpolar solute affecting the nearby hydrogen bonding network in water, capturing such effects using a simple one CG site or “bead” water model is a challenge. In this work, we systematically test the ability of CG one site water models for capturing critical features of the solvent environment around a hydrophobe as well as the potential of mean force (PMF) of neopentane association. We study two bottom-up models: a simple pairwise (SP) force-matched water model constructed using the multiscale coarse-graining method and the Bottom-Up Many-Body Projected Water (BUMPer) model, which has implicit three-body correlations. We also test the top-down monatomic (mW) and the Machine Learned mW (ML-mW) water models. The mW models perform well in capturing structural correlations but not the energetics of the PMF. BUMPer outperforms SP in capturing structural correlations and also gives an accurate PMF in contrast to the two mW models. Our study highlights the importance of including three-body interactions in CG water models, either explicitly or implicitly, while in general highlighting the applicability of bottom-up CG water models for studying hydrophobic effects in a quantitative fashion. This assertion comes with a caveat, however, regarding the accuracy of the enthalpy–entropy decomposition of the PMF of hydrophobe association.more » « less
-
The data deluge comes with high demands for data labeling. Crowdsourcing (or, more generally, ensemble learning) techniques aim to produce accurate labels via integrating noisy, non-expert labeling from annotators. The classic Dawid-Skene estimator and its accompanying expectation maximization (EM) algorithm have been widely used, but the theoretical properties are not fully understood. Tensor methods were proposed to guarantee identification of the Dawid-Skene model, but the sample complexity is a hurdle for applying such approaches---since the tensor methods hinge on the availability of third-order statistics that are hard to reliably estimate given limited data. In this paper, we propose a framework using pairwise co-occurrences of the annotator responses, which naturally admits lower sample complexity. We show that the approach can identify the Dawid-Skene model under realistic conditions. We propose an algebraic algorithm reminiscent of convex geometry-based structured matrix factorization to solve the model identification problem efficiently, and an identifiability-enhanced algorithm for handling more challenging and critical scenarios. Experiments show that the proposed algorithms outperform the state-of-art algorithms under a variety of scenarios.more » « less