skip to main content

Title: Understanding Deflation Process in Over-parametrized Tensor Decomposition
In this paper we study the training dynamics for gradient flow on overparametrized tensor decomposition problems. Empirically, such training process often first fits larger components and then discovers smaller components, which is similar to a tensor deflation process that is commonly used in tensor decomposition algorithms. We prove that for orthogonally decomposable tensor, a slightly modified version of gradient flow would follow a tensor deflation process and recover all the tensor components. Our proof suggests that for orthogonal tensors, gradient flow dynamics works similarly as greedy low-rank learning in the matrix setting, which is a first step towards understanding the implicit regularization effect of over-parametrized models for low-rank tensors.  more » « less
Award ID(s):
1845171 1704656 2031849
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Thirty-fifth Conference on Neural Information Processing Systems
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Higher-order tensors can represent scores in a rating system, frames in a video, and images of the same subject. In practice, the measurements are often highly quantized due to the sampling strategies or the quality of devices. Existing works on tensor recovery have focused on data losses and random noises. Only a few works consider tensor recovery from quantized measurements but are restricted to binary measurements. This paper, for the first time, addresses the problem of tensor recovery from multi-level quantized measurements by leveraging the low CANDECOMP/PARAFAC (CP) rank property. We study the recovery of both general low-rank tensors and tensors that have tensor singular value decomposition (TSVD) by solving nonconvex optimization problems. We provide the theoretical upper bounds of the recovery error, which diminish to zero when the sizes of dimensions increase to infinity. We further characterize the fundamental limit of any recovery algorithm and show that our recovery error is nearly order-wise optimal. A tensor-based alternating proximal gradient descent algorithm with a convergence guarantee and a TSVD-based projected gradient descent algorithm are proposed to solve the nonconvex problems. Our recovery methods can also handle data losses and do not necessarily need the information of the quantization rule. The methods are validated on synthetic data, image datasets, and music recommender datasets. 
    more » « less
  2. Oceanic mesoscale currents (‘eddies’) can have significant effects on the distributions of passive tracers. The associated inhomogeneous and anisotropic eddy fluxes are traditionally parametrised using a transport tensor (K-tensor), which contains both diffusive and advective components. In this study, we analyse the eddy transport tensor in a quasigeostrophic double-gyre flow. First, the flow and passive tracer fields are decomposed into large- and small-scale (eddy) components by spatial filtering, and the resulting eddy forcing includes an eddy tracer flux representing advection by eddies and non-advective terms. Second, we use the flux-gradient relation between the eddy fluxes and the large-scale tracer gradient to estimate the associated K-tensors in their entire structural, spatial and temporal complexity, without making any additional assumptions or simplifications. The divergent components of the eddy tracer fluxes are extracted via the Helmholtz decomposition, which yields a divergent tensor. The remaining rotational flux does not affect the tracer evolution, but dominates the total tracer flux, affecting both its magnitude and spatial structure. However, in terms of estimating the eddy forcing, the transport tensor prevails over its divergent counterpart because of the significant numerical errors induced by the Helmholtz decomposition. Our analyses demonstrate that, in general, the K-tensor for the eddy forcing is not unique, that is, it is tracer-dependent. Our study raises serious questions on how to interpret and use various estimates of K-tensors obtained from either observations or eddy-resolving model solutions. 
    more » « less
  3. Abstract

    Let$M_{\langle \mathbf {u},\mathbf {v},\mathbf {w}\rangle }\in \mathbb C^{\mathbf {u}\mathbf {v}}{\mathord { \otimes } } \mathbb C^{\mathbf {v}\mathbf {w}}{\mathord { \otimes } } \mathbb C^{\mathbf {w}\mathbf {u}}$denote the matrix multiplication tensor (and write$M_{\langle \mathbf {n} \rangle }=M_{\langle \mathbf {n},\mathbf {n},\mathbf {n}\rangle }$), and let$\operatorname {det}_3\in (\mathbb C^9)^{{\mathord { \otimes } } 3}$denote the determinant polynomial considered as a tensor. For a tensorT, let$\underline {\mathbf {R}}(T)$denote its border rank. We (i) give the first hand-checkable algebraic proof that$\underline {\mathbf {R}}(M_{\langle 2\rangle })=7$, (ii) prove$\underline {\mathbf {R}}(M_{\langle 223\rangle })=10$and$\underline {\mathbf {R}}(M_{\langle 233\rangle })=14$, where previously the only nontrivial matrix multiplication tensor whose border rank had been determined was$M_{\langle 2\rangle }$, (iii) prove$\underline {\mathbf {R}}( M_{\langle 3\rangle })\geq 17$, (iv) prove$\underline {\mathbf {R}}(\operatorname {det}_3)=17$, improving the previous lower bound of$12$, (v) prove$\underline {\mathbf {R}}(M_{\langle 2\mathbf {n}\mathbf {n}\rangle })\geq \mathbf {n}^2+1.32\mathbf {n}$for all$\mathbf {n}\geq 25$, where previously only$\underline {\mathbf {R}}(M_{\langle 2\mathbf {n}\mathbf {n}\rangle })\geq \mathbf {n}^2+1$was known, as well as lower bounds for$4\leq \mathbf {n}\leq 25$, and (vi) prove$\underline {\mathbf {R}}(M_{\langle 3\mathbf {n}\mathbf {n}\rangle })\geq \mathbf {n}^2+1.6\mathbf {n}$for all$\mathbf {n} \ge 18$, where previously only$\underline {\mathbf {R}}(M_{\langle 3\mathbf {n}\mathbf {n}\rangle })\geq \mathbf {n}^2+2$was known. The last two results are significant for two reasons: (i) they are essentially the first nontrivial lower bounds for tensors in an “unbalanced” ambient space and (ii) they demonstrate that the methods we use (border apolarity) may be applied to sequences of tensors.

    The methods used to obtain the results are new and “nonnatural” in the sense of Razborov and Rudich, in that the results are obtained via an algorithm that cannot be effectively applied to generic tensors. We utilize a new technique, calledborder apolaritydeveloped by Buczyńska and Buczyński in the general context of toric varieties. We apply this technique to develop an algorithm that, given a tensorTand an integerr, in a finite number of steps, either outputs that there is no border rankrdecomposition forTor produces a list of all normalized ideals which could potentially result from a border rank decomposition. The algorithm is effectively implementable whenThas a large symmetry group, in which case it outputs potential decompositions in a natural normal form. The algorithm is based on algebraic geometry and representation theory.

    more » « less
  4. The deformation of elementary fluid volumes by velocity gradients is a key process for scalar mixing, chemical reactions and biological processes in flows. Whilst fluid deformation in unsteady, turbulent flow has gained much attention over the past half-century, deformation in steady random flows with complex structure – such as flow through heterogeneous porous media – has received significantly less attention. In contrast to turbulent flow, the steady nature of these flows constrains fluid deformation to be anisotropic with respect to the fluid velocity, with significant implications for e.g. longitudinal and transverse mixing and dispersion. In this study we derive an ab initio coupled continuous-time random walk (CTRW) model of fluid deformation in random steady three-dimensional flow that is based upon a streamline coordinate transform which renders the velocity gradient and fluid deformation tensors upper triangular. We apply this coupled CTRW model to several model flows and find that these exhibit a remarkably simple deformation structure in the streamline coordinate frame, facilitating solution of the stochastic deformation tensor components. These results show that the evolution of longitudinal and transverse fluid deformation for chaotic flows is governed by both the Lyapunov exponent and power-law exponent of the velocity probability distribution function at small velocities, whereas algebraic deformation in non-chaotic flows arises from the intermittency of shear events following similar dynamics as that for steady two-dimensional flow. 
    more » « less
  5. Event detection is gaining increasing attention in smart cities research. Large-scale mobility data serves as an important tool to uncover the dynamics of urban transportation systems, and more often than not the dataset is incomplete. In this article, we develop a method to detect extreme events in large traffic datasets, and to impute missing data during regular conditions. Specifically, we propose a robust tensor recovery problem to recover low-rank tensors under fiber-sparse corruptions with partial observations, and use it to identify events, and impute missing data under typical conditions. Our approach is scalable to large urban areas, taking full advantage of the spatio-temporal correlations in traffic patterns. We develop an efficient algorithm to solve the tensor recovery problem based on the alternating direction method of multipliers (ADMM) framework. Compared with existing l 1 norm regularized tensor decomposition methods, our algorithm can exactly recover the values of uncorrupted fibers of a low-rank tensor and find the positions of corrupted fibers under mild conditions. Numerical experiments illustrate that our algorithm can achieve exact recovery and outlier detection even with missing data rates as high as 40% under 5% gross corruption, depending on the tensor size and the Tucker rank of the low rank tensor. Finally, we apply our method on a real traffic dataset corresponding to downtown Nashville, TN and successfully detect the events like severe car crashes, construction lane closures, and other large events that cause significant traffic disruptions. 
    more » « less