skip to main content

Title: CP decomposition for tensors via alternating least squares with QR decomposition
The CP tensor decomposition is used in applications such as machine learning and signal processing to discover latent low-rank structure in multidimensional data. Computing a CP decomposition via an alternating least squares (ALS) method reduces the problem to several linear least squares problems. The standard way to solve these linear least squares subproblems is to use the normal equations, which inherit special tensor structure that can be exploited for computational efficiency. However, the normal equations are sensitive to numerical ill-conditioning, which can compromise the results of the decomposition. In this paper, we develop versions of the CP-ALS algorithm using the QR decomposition and the singular value decomposition, which are more numerically stable than the normal equations, to solve the linear least squares problems. Our algorithms utilize the tensor structure of the CP-ALS subproblems efficiently, have the same complexity as the standard CP-ALS algorithm when the input is dense and the rank is small, and are shown via examples to produce more stable results when ill-conditioning is present. Our MATLAB implementation achieves the same running time as the standard algorithm for small ranks, and we show that the new methods can obtain lower approximation error.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Numerical Linear Algebra with Applications
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Canonical polyadic (CP) decomposition of a tensor is a basic operation in a lot of applications such as data mining and video foreground/background separation. However, existing algorithms for CP decomposition require users to provide a rank of the target tensor data as part of the input and finding the rank of a tensor is an NP-hard problem. Currently, to perform CP decomposition, users are required to make an informed guess of a proper tensor rank based on the data at hand, and the result may still be suboptimal. In this paper, we propose to conduct CP decomposition and tensor rank approximation together, so that users do not have to provide the proper rank beforehand, and the decomposition algorithm will find the proper rank and return a high-quality result. We formulate an optimization problem with an objective function consisting of a least-squares Tikhonov regularization and a sparse L1-regularization term. We also test its effectiveness over real applications with moving object videos. 
    more » « less
  2. We present a data structure to randomly sample rows from the Khatri-Rao product of several matrices according to the exact distribution of its leverage scores. Our proposed sampler draws each row in time logarithmic in the height of the Khatri-Rao product and quadratic in its column count, with persistent space overhead at most the size of the input matrices. As a result, it tractably draws samples even when the matrices forming the Khatri-Rao product have tens of millions of rows each. When used to sketch the linear least squares problems arising in CANDECOMP / PARAFAC tensor decomposition, our method achieves lower asymptotic complexity per solve than recent state-of-the-art methods. Experiments on billion-scale sparse tensors validate our claims, with our algorithm achieving higher accuracy than competing methods as the decomposition rank grows. 
    more » « less
  3. The decomposition of multi-subject fMRI data using rank- (L,L,1,1) block term decomposition (BTD) can preserve higher-way data structure and is more robust to noise effects by decomposing shared spatial maps (SMs) into a product of two rank-L loading matrices. However, since the number of whole-brain voxels is very large and rank L is larger than 1, the rank-(L,L,1,1) BTD requires high computation and memory. Therefore, we propose an accelerated rank- (L,L,1,1) BTD algorithm based upon the method of alternating least squares (ALS). We speed up updates of loading matrices by reducing fMRI data into subspaces, and add an orthonormality constraint on shared SMs to improve the performance. Moreover, we evaluate the rank-L effect on the proposed method for actual task-related fMRI data. The proposed method shows better performance when L=35. Meanwhile, experimental comparison results verify that the proposed method largely reduced (17.36 times) computation time compared to ALS while also providing satisfying separation performance. 
    more » « less
  4. The CP tensor decomposition is a low-rank approximation of a tensor. We present a distributed-memory parallel algorithm and implementation of an alternating optimization method for computing a CP decomposition of dense tensors that can enforce nonnegativity of the computed low-rank factors. The principal task is to parallelize the Matricized-Tensor Times Khatri-Rao Product (MTTKRP) bottleneck subcomputation. The algorithm is computation efficient, using dimension trees to avoid redundant computation across MTTKRPs within the alternating method. Our approach is also communication efficient, using a data distribution and parallel algorithm across a multidimensional processor grid that can be tuned to minimize communication. We benchmark our software on synthetic as well as hyperspectral image and neuroscience dynamic functional connectivity data, demonstrating that our algorithm scales well to 100s of nodes (up to 4096 cores) and is faster and more general than the currently available parallel software. 
    more » « less
  5. null (Ed.)

    Electrospinning is a promising process to fabricate functional parts from macrofibers and nanofibers of bio-compatible materials including collagen, polylactide (PLA), and polyacrylonitrile (PAN). However, the functionality of the produced parts highly rely on quality, repeatability, and uniformity of the electrospun fibers. Due to the variations in material composition, process settings, and ambient conditions, the process suffers from large variations. In particular, the fiber formation in the stable regime (i.e., Taylor cone and jet) and its propagation to the substrate plays the most significant role in the process stability. This work aims to designing a fast process monitoring tool from scratch for monitoring the dynamic electrospinning process based on the Taylor cone and jet videos. Nevertheless, this is challenging since the videos are of high frequency and high dimension, and the monitoring statistics may not have a parametric distribution. To achieve this goal, a framework integrating image analysis, sketch-based tensor decomposition, and non-parametric monitoring, is proposed. In particular, we use Tucker tensor-sketch (Tucker-TS) based tensor decomposition to extract the sparse structure representations of the videos. Additionally, the extracted monitoring variables are non-normally distributed, hence non-parametric bootstrap Hotelling T2 control chart is deployed to handle this issue during the monitoring. The framework is demonstrated by electrospinning a PAN-based polymeric solution. Finally, it is demonstrated that the proposed framework, which uses Tucker-TS, largely outperformed the computational speed of the alternating least squares (ALS) approach for the Tucker tensor decomposition, i.e., Tucker-ALS, in various anomaly detection tasks while keeping the comparable anomaly detection accuracy.

    more » « less