Canonical polyadic (CP) decomposition of a tensor is a basic operation in a lot of applications such as data mining and video foreground/background separation. However, existing algorithms for CP decomposition require users to provide a rank of the target tensor data as part of the input and finding the rank of a tensor is an NP-hard problem. Currently, to perform CP decomposition, users are required to make an informed guess of a proper tensor rank based on the data at hand, and the result may still be suboptimal. In this paper, we propose to conduct CP decomposition and tensor rank approximation together, so that users do not have to provide the proper rank beforehand, and the decomposition algorithm will find the proper rank and return a high-quality result. We formulate an optimization problem with an objective function consisting of a least-squares Tikhonov regularization and a sparse L1-regularization term. We also test its effectiveness over real applications with moving object videos. 
                        more » 
                        « less   
                    
                            
                            CP decomposition for tensors via alternating least squares with QR decomposition
                        
                    
    
            The CP tensor decomposition is used in applications such as machine learning and signal processing to discover latent low-rank structure in multidimensional data. Computing a CP decomposition via an alternating least squares (ALS) method reduces the problem to several linear least squares problems. The standard way to solve these linear least squares subproblems is to use the normal equations, which inherit special tensor structure that can be exploited for computational efficiency. However, the normal equations are sensitive to numerical ill-conditioning, which can compromise the results of the decomposition. In this paper, we develop versions of the CP-ALS algorithm using the QR decomposition and the singular value decomposition, which are more numerically stable than the normal equations, to solve the linear least squares problems. Our algorithms utilize the tensor structure of the CP-ALS subproblems efficiently, have the same complexity as the standard CP-ALS algorithm when the input is dense and the rank is small, and are shown via examples to produce more stable results when ill-conditioning is present. Our MATLAB implementation achieves the same running time as the standard algorithm for small ranks, and we show that the new methods can obtain lower approximation error. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1942892
- PAR ID:
- 10425106
- Date Published:
- Journal Name:
- Numerical Linear Algebra with Applications
- ISSN:
- 1070-5325
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            We present a data structure to randomly sample rows from the Khatri-Rao product of several matrices according to the exact distribution of its leverage scores. Our proposed sampler draws each row in time logarithmic in the height of the Khatri-Rao product and quadratic in its column count, with persistent space overhead at most the size of the input matrices. As a result, it tractably draws samples even when the matrices forming the Khatri-Rao product have tens of millions of rows each. When used to sketch the linear least squares problems arising in CANDECOMP / PARAFAC tensor decomposition, our method achieves lower asymptotic complexity per solve than recent state-of-the-art methods. Experiments on billion-scale sparse tensors validate our claims, with our algorithm achieving higher accuracy than competing methods as the decomposition rank grows.more » « less
- 
            The decomposition of multi-subject fMRI data using rank- (L,L,1,1) block term decomposition (BTD) can preserve higher-way data structure and is more robust to noise effects by decomposing shared spatial maps (SMs) into a product of two rank-L loading matrices. However, since the number of whole-brain voxels is very large and rank L is larger than 1, the rank-(L,L,1,1) BTD requires high computation and memory. Therefore, we propose an accelerated rank- (L,L,1,1) BTD algorithm based upon the method of alternating least squares (ALS). We speed up updates of loading matrices by reducing fMRI data into subspaces, and add an orthonormality constraint on shared SMs to improve the performance. Moreover, we evaluate the rank-L effect on the proposed method for actual task-related fMRI data. The proposed method shows better performance when L=35. Meanwhile, experimental comparison results verify that the proposed method largely reduced (17.36 times) computation time compared to ALS while also providing satisfying separation performance.more » « less
- 
            In various applications within signal processing, system identification, pattern recognition, and scientific computing, the canonical polyadic decomposition (CPD) of a higher-order tensor is only known via general linear measurements. In this paper, we show that the computation of such a CPD can be reformulated as a sum of CPDs with linearly constrained factor matrices by assuming that the measurement matrix can be approximated by a sum of a (small) number of Kronecker products. By properly exploiting the hypothesized structure, we can derive an efficient non-linear least squares algorithm, allowing us to tackle large-scale problems.more » « less
- 
            The CP tensor decomposition is a low-rank approximation of a tensor. We present a distributed-memory parallel algorithm and implementation of an alternating optimization method for computing a CP decomposition of dense tensors that can enforce nonnegativity of the computed low-rank factors. The principal task is to parallelize the Matricized-Tensor Times Khatri-Rao Product (MTTKRP) bottleneck subcomputation. The algorithm is computation efficient, using dimension trees to avoid redundant computation across MTTKRPs within the alternating method. Our approach is also communication efficient, using a data distribution and parallel algorithm across a multidimensional processor grid that can be tuned to minimize communication. We benchmark our software on synthetic as well as hyperspectral image and neuroscience dynamic functional connectivity data, demonstrating that our algorithm scales well to 100s of nodes (up to 4096 cores) and is faster and more general than the currently available parallel software.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    