Modern deep neural networks (DNNs) often require high memory consumption and large computational loads. In order to deploy DNN algorithms efficiently on edge or mobile devices, a series of DNN compression algorithms have been explored, including factorization methods. Factorization methods approximate the weight matrix of a DNN layer with the multiplication of two or multiple low-rank matrices. However, it is hard to measure the ranks of DNN layers during the training process. Previous works mainly induce low-rank through implicit approximations or via costly singular value decomposition (SVD) process on every training step. The former approach usually induces a high accuracy loss while the latter has a low efficiency. In this work, we propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step. SVD training first decomposes each layer into the form of its full-rank SVD, then performs training directly on the decomposed weights. We add orthogonality regularization to the singular vectors, which ensure the valid form of SVD and avoid gradient vanishing/exploding. Low-rank is encouraged by applying sparsity-inducing regularizers on the singular values of each layer. Singular value pruning is applied at the end to explicitly reach a low-rank model. We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy, comparing to not only previous factorization methods but also state-of-the-art filter pruning methods.
more »
« less
Operator SVD with Neural Networks via Nested Low-Rank Approximation
This paper proposes an optimization-based method to learn the singular value decomposition (SVD) of a compact operator with ordered singular functions. The proposed objective function is based on Schmidt’s low-rank approximation theorem (1907) that characterizes a truncated SVD as a solution minimizing the mean squared error, accompanied with a technique called nesting to learn the ordered structure. When the optimization space is parameterized by neural networks, we refer to the proposed method as NeuralSVD. The implementation does not require sophisticated optimization tricks unlike existing approaches.
more »
« less
- Award ID(s):
- 1816209
- PAR ID:
- 10483351
- Publisher / Repository:
- Proceedings of the NeurIPS-2023 Workshop on Machine Learning for the Physical Sciences (ML4PS)
- Date Published:
- Journal Name:
- Proc. NeurIPS 2023 Workshop on Machine Learning and the Physical Sciences (ML4PS)
- Format(s):
- Medium: X
- Location:
- New Orleans, LA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Learning a dynamical system from input/output data is a fundamental task in the control design pipeline. In the partially observed setting there are two components to identification: parameter estimation to learn the Markov parameters, and system realization to obtain a state space model. In both sub-problems it is implicitly assumed that standard numerical algorithms such as the singular value decomposition (SVD) can be easily and reliably computed. When trying to fit a high-dimensional model to data, even computing an SVD may be intractable. In this work we show that an approximate matrix factorization obtained using randomized methods can replace the standard SVD in the realization algorithm while maintaining the finite-sample performance and robustness guarantees of classical methods.more » « less
-
null (Ed.)In order to compute fast approximations to the singular value decompositions (SVD) of very large matrices, randomized sketching algorithms have become a leading approach. However, a key practical difficulty of sketching an SVD is that the user does not know how far the sketched singular vectors/values are from the exact ones. Indeed, the user may be forced to rely on analytical worst-case error bounds, which may not account for the unique structure of a given problem. As a result, the lack of tools for error estimation often leads to much more computation than is really necessary. To overcome these challenges, this paper develops a fully data-driven bootstrap method that numerically estimates the actual error of sketched singular vectors/values. Furthermore, the method is computationally inexpensive, because it operates only on sketched objects, and hence it requires no extra passes over the full matrix being factored.more » « less
-
We discuss the use of the regularized linear discriminant analysis (LDA) as a model reduction technique combined with particle swarm optimization (PSO) in protein tertiary structure prediction, followed by structure refinement based on singular value decomposition (SVD) and PSO. The algorithm presented in this paper corresponds to the category of template-based modeling. The algorithm performs a preselection of protein templates before constructing a lower dimensional subspace via a regularized LDA. The protein coordinates in the reduced spaced are sampled using a highly explorative optimization algorithm, regressive–regressive PSO (RR-PSO). The obtained structure is then projected onto a reduced space via singular value decomposition and further optimized via RR-PSO to carry out a structure refinement. The final structures are similar to those predicted by best structure prediction tools, such as Rossetta and Zhang servers. The main advantage of our methodology is that alleviates the ill-posed character of protein structure prediction problems related to high dimensional optimization. It is also capable of sampling a wide range of conformational space due to the application of a regularized linear discriminant analysis, which allows us to expand the differences over a reduced basis set.more » « less
-
We present a method of detecting bifurcations by locating zeros of a signed version of the smallest singular value of the Jacobian. This enables the use of quadratically convergent root-bracketing techniques or Chebyshev interpolation to locate bifurcation points. Only positive singular values have to be computed, though the method relies on the existence of an analytic or smooth singular value decomposition (SVD). The sign of the determinant of the Jacobian, computed as part of the bidiagonal reduction in the SVD algorithm, eliminates slope discontinuities at the zeros of the smallest singular value. We use the method to search for spatially quasi-periodic traveling water waves that bifurcate from large-amplitude periodic waves. The water wave equations are formulated in a conformal mapping framework to facilitate the computation of the quasi-periodic Dirichlet-Neumann operator. We find examples of pure gravity waves with zero surface tension and overhanging gravity-capillary waves. In both cases, the waves have two spatial quasi-periods whose ratio is irrational. We follow the secondary branches via numerical continuation beyond the realm of linearization about solutions on the primary branch to obtain traveling water waves that extend over the real line with no two crests or troughs of exactly the same shape. The pure gravity wave problem is of relevance to ocean waves, where capillary effects can be neglected. Such waves can only exist through secondary bifurcation as they do not persist to zero amplitude. The gravity-capillary wave problem demonstrates the effectiveness of using the signed smallest singular value as a test function for multi-parameter bifurcation problems. This test function becomes mesh independent once the mesh is fine enough.more » « less
An official website of the United States government

