Tucker decomposition is a low-rank tensor approximation that generalizes a truncated matrix singular value decomposition (SVD). Existing parallel software has shown that Tucker decomposition is particularly effective at compressing terabyte-sized multidimensional scientific simulation datasets, computing reduced representations that satisfy a specified approximation error. The general approach is to get a low-rank approximation of the input data by performing a sequence of matrix SVDs of tensor unfoldings, which tend to be short-fat matrices. In the existing approach, the SVD is performed by computing the eigendecomposition of the Gram matrix of the unfolding. This method sacrifices some numerical stability in exchange for lower computation costs and easier parallelization. We propose using a more numerically stable though more computationally expensive way to compute the SVD by pre- processing with a QR decomposition step and computing an SVD of only the small triangular factor. The more numerically stable approach allows us to achieve the same accuracy with half the working precision (for example, single rather than double precision). We demonstrate that our method scales as well as the existing approach, and the use of lower precision leads to an overall reduction in running time of up to a factor of 2 when using 10s to 1000s of processors. Using the same working precision, we are also able to compute Tucker decompositions with much smaller approximation error. 
                        more » 
                        « less   
                    This content will become publicly available on December 1, 2025
                            
                            DeepTensor: Low-Rank Tensor Decomposition With Deep Network Priors
                        
                    
    
            DeepTensor is a computationally efficient framework for low-rank decomposition of matrices and tensors using deep generative networks. We decompose a tensor as the product of low-rank tensor factors where each low-rank tensor is generated by a deep network (DN) that is trained in a self-supervised manner to minimize the mean-square approximation error. Our key observation is that the implicit regularization inherent in DNs enables them to capture nonlinear signal structures that are out of the reach of classical linear methods like the singular value decomposition (SVD) and principal components analysis (PCA). We demonstrate that the performance of DeepTensor is robust to a wide range of distributions and a computationally efficient drop-in replacement for the SVD, PCA, nonnegative matrix factorization (NMF), and similar decompositions by exploring a range of real-world applications, including hyperspectral image denoising, 3D MRI tomography, and image classification. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10570477
- Publisher / Repository:
- IEEE Computer Society
- Date Published:
- Journal Name:
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- Volume:
- 46
- Issue:
- 12
- ISSN:
- 0162-8828
- Page Range / eLocation ID:
- 10337 to 10348
- Subject(s) / Keyword(s):
- Tensor Decomposition Matrix Factorization Low-Rank Completion Deep Network Self-Supervised Learning
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            In this paper, we propose a conservative low rank tensor method to approximate nonlinear Vlasov solutions. The low rank approach is based on our earlier work [W. Guo and J.-M. Qiu, A Low Rank Tensor Representation of Linear Transport and Nonlinear Vlasov Solutions and Their Associated Flow Maps, preprint, https://arxiv.org/abs/2106.08834, 2021]. It takes advantage of the fact that the differential operators in the Vlasov equation are tensor friendly, based on which we propose to dynamically and adaptively build up low rank solution basis by adding new basis functions from discretization of the differential equation, and removing basis from a singular value decomposition (SVD)-type truncation procedure. For the discretization, we adopt a high order finite difference spatial discretization together with a second order strong stability preserving multistep time discretization. While the SVD truncation will remove the redundancy in representing the high dimensional Vlasov solution, it will destroy the conservation properties of the associated full conservative scheme. In this paper, we develop a conservative truncation procedure with conservation of mass, momentum, and kinetic energy densities. The conservative truncation is achieved by an orthogonal projection onto a subspace spanned by 1, 𝑣, and 𝑣2 in the velocity space associated with a weighted inner product. Then the algorithm performs a weighted SVD truncation of the remainder, which involves a scaling, followed by the standard SVD truncation and rescaling back. The algorithm is further developed in high dimensions with hierarchical Tucker tensor decomposition of high dimensional Vlasov solutions, overcoming the curse of dimensionality. An extensive set of nonlinear Vlasov examples are performed to show the effectiveness and conservation property of proposed conservative low rank approach. Comparison is performed against the nonconservative low rank tensor approach on conservation history of mass, momentum, and energy.more » « less
- 
            Recently, a wide range of memory-efficient LLM training algorithms have gained substantial popularity. These methods leverage the low-rank structure of gradients to project optimizer states into a subspace using a projection matrix found by singular value decomposition (SVD). However, convergence of these algorithms is highly dependent on the update rules of their projection matrix. This work provides the first convergence guarantee for arbitrary update rules of projection matrices, generally applicable to optimizers that can be analyzed with Hamiltonian Descent, including common ones such as LION and Adam. Inspired by this theoretical understanding, the authors propose Online Subspace Descent, a new family of subspace descent optimizers that do not rely on SVD. Instead of updating the projection matrix with eigenvectors, Online Subspace Descent updates it with online PCA. This approach is flexible and introduces minimal overhead to training. Experiments show that for pretraining LLaMA models ranging from 60M to 7B parameters on the C4 dataset, Online Subspace Descent achieves lower perplexity and better downstream task performance than state-of-the-art low-rank training methods across settings, narrowing the gap with full-rank baselines.more » « less
- 
            Abstract This paper introduces a general framework of Semi-parametric TEnsor Factor Analysis (STEFA) that focuses on the methodology and theory of low-rank tensor decomposition with auxiliary covariates. Semi-parametric TEnsor Factor Analysis models extend tensor factor models by incorporating auxiliary covariates in the loading matrices. We propose an algorithm of iteratively projected singular value decomposition (IP-SVD) for the semi-parametric estimation. It iteratively projects tensor data onto the linear space spanned by the basis functions of covariates and applies singular value decomposition on matricized tensors over each mode. We establish the convergence rates of the loading matrices and the core tensor factor. The theoretical results only require a sub-exponential noise distribution, which is weaker than the assumption of sub-Gaussian tail of noise in the literature. Compared with the Tucker decomposition, IP-SVD yields more accurate estimators with a faster convergence rate. Besides estimation, we propose several prediction methods with new covariates based on the STEFA model. On both synthetic and real tensor data, we demonstrate the efficacy of the STEFA model and the IP-SVD algorithm on both the estimation and prediction tasks.more » « less
- 
            The CP tensor decomposition is a low-rank approximation of a tensor. We present a distributed-memory parallel algorithm and implementation of an alternating optimization method for computing a CP decomposition of dense tensors that can enforce nonnegativity of the computed low-rank factors. The principal task is to parallelize the Matricized-Tensor Times Khatri-Rao Product (MTTKRP) bottleneck subcomputation. The algorithm is computation efficient, using dimension trees to avoid redundant computation across MTTKRPs within the alternating method. Our approach is also communication efficient, using a data distribution and parallel algorithm across a multidimensional processor grid that can be tuned to minimize communication. We benchmark our software on synthetic as well as hyperspectral image and neuroscience dynamic functional connectivity data, demonstrating that our algorithm scales well to 100s of nodes (up to 4096 cores) and is faster and more general than the currently available parallel software.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
