skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Title: Stochastic Optimization for Coupled Tensor Decomposition with Applications in Statistical Learning
Coupled tensor decomposition aims at factoring a number of tensors that share some of their latent factors. Existing algorithms for coupled canonical polyadic decomposition (CPD) face serious scalablity challenges, especially when the number of tensors is large. However, a large amount of coupled tensors naturally arise in timely applications such as statistical learning, e.g., when estimating the joint probability mass function (PMF) of many random variables from marginal PMFs. Stochastic algorithms that admit lightweight updates exist for coupled decomposition, but these algorithms cannot handle complex constraints (e.g., the probability simplex constraint that is important in statistical learning) due to their sampling patterns. This work puts forth a simple data-sampling and block variable-updating strategy for simultaneously factoring a large number of coupled tensors. The proposed algorithm enjoys low per-iteration complexity and can easily handle constraints on latent factors. We also show that this multi-block algorithm admits a nice connection to the classic single-block stochastic proximal gradient (SPG), and thus it naturally inherits convergence properties of SPG. Synthetic and real-data experiments show that the proposed algorithm is very promising for statistical learning problems.  more » « less
Award ID(s):
1808159
PAR ID:
10105977
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2019 IEEE Data Science Workshop (DSW)
Page Range / eLocation ID:
300 to 304
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This work focuses on canonical polyadic decomposition (CPD) for large-scale tensors. Many prior works rely on data sparsity to develop scalable CPD algorithms, which are not suitable for handling dense tensor, while dense tensors often arise in applications such as image and video processing. As an alternative, stochastic algorithms utilize data sampling to reduce per-iteration complexity and thus are very scalable, even when handling dense tensors. However, existing stochastic CPD algorithms are facing some challenges. For example, some algorithms are based on randomly sampled tensor entries, and thus each iteration can only updates a small portion of the latent factors. This may result in slow improvement of the estimation accuracy of the latent factors. In addition, the convergence properties of many stochastic CPD algorithms are unclear, perhaps because CPD poses a hard nonconvex problem and is challenging for analysis under stochastic settings. In this work, we propose a stochastic optimization strategy that can effectively circumvent the above challenges. The proposed algorithm updates a whole latent factor at each iteration using sampled fibers of a tensor, which can quickly increase the estimation accuracy. The algorithm is flexible-many commonly used regularizers and constraints can be easily incorporated in the computational framework. The algorithm is also backed by a rigorous convergence theory. Simulations on large-scale dense tensors are employed to showcase the effectiveness of the algorithm. 
    more » « less
  2. null (Ed.)
    Canonical polyadic decomposition (CPD) has been a workhorse for multimodal data analytics. This work puts forth a stochastic algorithmic framework for CPD under β-divergence, which is well-motivated in statistical learning—where the Euclidean distance is typically not preferred. Despite the existence of a series of prior works addressing this topic, pressing computational and theoretical challenges, e.g., scalability and convergence issues, still remain. In this paper, a unified stochastic mirror descent framework is developed for large-scale β-divergence CPD. Our key contribution is the integrated design of a tensor fiber sampling strategy and a flexible stochastic Bregman divergence-based mirror descent iterative procedure, which significantly reduces the computation and memory cost per iteration for various β. Leveraging the fiber sampling scheme and the multilinear algebraic structure of low-rank tensors, the proposed lightweight algorithm also ensures global convergence to a stationary point under mild conditions. Numerical results on synthetic and real data show that our framework attains significant computational saving compared with state-of-the-art methods. 
    more » « less
  3. We propose a new fast streaming algorithm for the tensor completion problem of imputing missing entries of a lowtubal-rank tensor using the tensor singular value decomposition (t-SVD) algebraic framework. We show the t-SVD is a specialization of the well-studied block-term decomposition for third-order tensors, and we present an algorithm under this model that can track changing free submodules from incomplete streaming 2-D data. The proposed algorithm uses principles from incremental gradient descent on the Grassmann manifold of subspaces to solve the tensor completion problem with linear complexity and constant memory in the number of time samples. We provide a local expected linear convergence result for our algorithm. Our empirical results are competitive in accuracy but much faster in compute time than state-of-the-art tensor completion algorithms on real applications to recover temporal chemo-sensing and MRI data under limited sampling. 
    more » « less
  4. Abstract

    Decomposition‐based solution algorithms for optimization problems depend on the underlying latent block structure of the problem. Methods for detecting this structure are currently lacking. In this article, we propose stochastic blockmodeling (SBM) as a systematic framework for learning the underlying block structure in generic optimization problems. SBM is a generative graph model in which nodes belong to some blocks and the interconnections among the nodes are stochastically dependent on their block affiliations. Hence, through parametric statistical inference, the interconnection patterns underlying optimization problems can be estimated. For benchmark optimization problems, we show that SBM can reveal the underlying block structure and that the estimated blocks can be used as the basis for decomposition‐based solution algorithms which can reach an optimum or bound estimates in reduced computational time. Finally, we present a general software platform for automated block structure detection and decomposition‐based solution following distributed and hierarchical optimization approaches.

     
    more » « less
  5. Bach, Francis ; Blei, David ; Scholkopf, Bernhard (Ed.)
    This paper investigates the asymptotic behaviors of gradient descent algorithms (particularly accelerated gradient descent and stochastic gradient descent) in the context of stochastic optimization arising in statistics and machine learning, where objective functions are estimated from available data. We show that these algorithms can be computationally modeled by continuous-time ordinary or stochastic differential equations. We establish gradient flow central limit theorems to describe the limiting dynamic behaviors of these computational algorithms and the large-sample performances of the related statistical procedures, as the number of algorithm iterations and data size both go to infinity, where the gradient flow central limit theorems are governed by some linear ordinary or stochastic differential equations, like time-dependent Ornstein-Uhlenbeck processes. We illustrate that our study can provide a novel unified framework for a joint computational and statistical asymptotic analysis, where the computational asymptotic analysis studies the dynamic behaviors of these algorithms with time (or the number of iterations in the algorithms), the statistical asymptotic analysis investigates the large-sample behaviors of the statistical procedures (like estimators and classifiers) that are computed by applying the algorithms; in fact, the statistical procedures are equal to the limits of the random sequences generated from these iterative algorithms, as the number of iterations goes to infinity. The joint analysis results based on the obtained gradient flow central limit theorems lead to the identification of four factors—learning rate, batch size, gradient covariance, and Hessian—to derive new theories regarding the local minima found by stochastic gradient descent for solving non-convex optimization problems. 
    more » « less