Abstract There exist linear relations among tensor entries of low rank tensors. These linear relations can be expressed by multi-linear polynomials, which are called generating polynomials. We use generating polynomials to compute tensor rank decompositions and low rank tensor approximations. We prove that this gives a quasi-optimal low rank tensor approximation if the given tensor is sufficiently close to a low rank one. 
                        more » 
                        « less   
                    This content will become publicly available on December 10, 2025
                            
                            Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices
                        
                    
    
            Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts focused on a small number of hand-crafted structured matrices and neglected to investigate whether these structures can surpass dense layers in terms of compute-optimal scaling laws when both the model size and training examples are optimally allocated. In this work, we present a unifying framework that enables searching among all linear operators expressible via an Einstein summation. This framework encompasses many previously proposed structures, such as low-rank, Kronecker, Tensor-Train, Block Tensor-Train (BTT), and Monarch, along with many novel structures. To analyze the framework, we develop a taxonomy of all such operators based on their computational and algebraic properties and show that differences in the compute-optimal scaling laws are mostly governed by a small number of variables that we introduce. Namely, a small ω (which measures parameter sharing) and large ψ (which measures the rank) reliably led to better scaling laws. Guided by the insight that full-rank structures that maximize parameters per unit of compute perform the best, we propose BTT-MoE, a novel Mixture-of-Experts (MoE) architecture obtained by sparsifying computation in the BTT structure. In contrast to the standard sparse MoE for each entire feed-forward network, BTT-MoE learns an MoE in every single linear layer of the model, including the projection matrices in the attention blocks. We find BTT-MoE provides a substantial compute-efficiency gain over dense layers and standard MoE. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2118310
- PAR ID:
- 10625641
- Publisher / Repository:
- Advances in Neural Information Processing Systems
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            The CP tensor decomposition is used in applications such as machine learning and signal processing to discover latent low-rank structure in multidimensional data. Computing a CP decomposition via an alternating least squares (ALS) method reduces the problem to several linear least squares problems. The standard way to solve these linear least squares subproblems is to use the normal equations, which inherit special tensor structure that can be exploited for computational efficiency. However, the normal equations are sensitive to numerical ill-conditioning, which can compromise the results of the decomposition. In this paper, we develop versions of the CP-ALS algorithm using the QR decomposition and the singular value decomposition, which are more numerically stable than the normal equations, to solve the linear least squares problems. Our algorithms utilize the tensor structure of the CP-ALS subproblems efficiently, have the same complexity as the standard CP-ALS algorithm when the input is dense and the rank is small, and are shown via examples to produce more stable results when ill-conditioning is present. Our MATLAB implementation achieves the same running time as the standard algorithm for small ranks, and we show that the new methods can obtain lower approximation error.more » « less
- 
            Tensor contractions are ubiquitous in computational chemistry andphysics, where tensors generally represent states or operators andcontractions express the algebra of these quantities. In this context,the states and operators often preserve physical conservation laws,which are manifested as group symmetries in the tensors. These groupsymmetries imply that each tensor has block sparsity and can be storedin a reduced form. For nontrivial contractions, the memory footprint andcost are lowered, respectively, by a linear and a quadratic factor inthe number of symmetry sectors. State-of-the-art tensor contractionsoftware libraries exploit this opportunity by iterating over blocks orusing general block-sparse tensor representations. Both approachesentail overhead in performance and code complexity. With intuition aidedby tensor diagrams, we present a technique, irreducible representationalignment, which enables efficient handling of Abelian group symmetriesvia only dense tensors, by using contraction-specific reduced forms.This technique yields a general algorithm for arbitrary group symmetriccontractions, which we implement in Python and apply to a variety ofrepresentative contractions from quantum chemistry and tensor networkmethods. As a consequence of relying on only dense tensor contractions,we can easily make use of efficient batched matrix multiplication viaIntel’s MKL and distributed tensor contraction via the Cyclops library,achieving good efficiency and parallel scalability on up to 4096 KnightsLanding cores of a supercomputer.more » « less
- 
            DeepTensor is a computationally efficient framework for low-rank decomposition of matrices and tensors using deep generative networks. We decompose a tensor as the product of low-rank tensor factors where each low-rank tensor is generated by a deep network (DN) that is trained in a self-supervised manner to minimize the mean-square approximation error. Our key observation is that the implicit regularization inherent in DNs enables them to capture nonlinear signal structures that are out of the reach of classical linear methods like the singular value decomposition (SVD) and principal components analysis (PCA). We demonstrate that the performance of DeepTensor is robust to a wide range of distributions and a computationally efficient drop-in replacement for the SVD, PCA, nonnegative matrix factorization (NMF), and similar decompositions by exploring a range of real-world applications, including hyperspectral image denoising, 3D MRI tomography, and image classification.more » « less
- 
            Byrka, Jaroslaw; Meka, Raghu (Ed.)In this work, we prove new relations between the bias of multilinear forms, the correlation between multilinear forms and lower degree polynomials, and the rank of tensors over F₂. We show the following results for multilinear forms and tensors. Correlation bounds. We show that a random d-linear form has exponentially low correlation with low-degree polynomials. More precisely, for d = 2^{o(k)}, we show that a random d-linear form f(X₁,X₂, … , X_d) : (F₂^{k}) ^d → F₂ has correlation 2^{-k(1-o(1))} with any polynomial of degree at most d/2 with high probability. This result is proved by giving near-optimal bounds on the bias of a random d-linear form, which is in turn proved by giving near-optimal bounds on the probability that a sum of t random d-dimensional rank-1 tensors is identically zero. Tensor rank vs Bias. We show that if a 3-dimensional tensor has small rank then its bias, when viewed as a 3-linear form, is large. More precisely, given any 3-dimensional tensor T: [k]³ → F₂ of rank at most t, the bias of the 3-linear form f_T(X₁, X₂, X₃) : = ∑_{(i₁, i₂, i₃) ∈ [k]³} T(i₁, i₂, i₃)⋅ X_{1,i₁}⋅ X_{2,i₂}⋅ X_{3,i₃} is at least (3/4)^t. This bias vs tensor-rank connection suggests a natural approach to proving nontrivial tensor-rank lower bounds. In particular, we use this approach to give a new proof that the finite field multiplication tensor has tensor rank at least 3.52 k, which is the best known rank lower bound for any explicit tensor in three dimensions over F₂. Moreover, this relation between bias and tensor rank holds for d-dimensional tensors for any fixed d.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
