 Award ID(s):
 1934568
 NSFPAR ID:
 10282005
 Date Published:
 Journal Name:
 Proceedings of the 37th International Conference on Machine Learningg
 Volume:
 119
 Page Range / eLocation ID:
 63826392
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Modern deep neural networks (DNNs) often require high memory consumption and large computational loads. In order to deploy DNN algorithms efficiently on edge or mobile devices, a series of DNN compression algorithms have been explored, including factorization methods. Factorization methods approximate the weight matrix of a DNN layer with the multiplication of two or multiple lowrank matrices. However, it is hard to measure the ranks of DNN layers during the training process. Previous works mainly induce lowrank through implicit approximations or via costly singular value decomposition (SVD) process on every training step. The former approach usually induces a high accuracy loss while the latter has a low efficiency. In this work, we propose SVD training, the first method to explicitly achieve lowrank DNNs during training without applying SVD on every step. SVD training first decomposes each layer into the form of its fullrank SVD, then performs training directly on the decomposed weights. We add orthogonality regularization to the singular vectors, which ensure the valid form of SVD and avoid gradient vanishing/exploding. Lowrank is encouraged by applying sparsityinducing regularizers on the singular values of each layer. Singular value pruning is applied at the end to explicitly reach a lowrank model. We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy, comparing to not only previous factorization methods but also stateoftheart filter pruning methods.more » « less

This paper introduces arithmetic sketching, an abstraction of a primitive that several previous works use to achieve lightweight, lowcommunication zeroknowledge verification of secretshared vectors. An arithmetic sketching scheme for a language L ⊆ F^n consists of (1) a randomized linear function compressing a long input x to a short “sketch,” and (2) a small arithmetic circuit that accepts the sketch if and only if x ∈ L, up to some small error. If the language L has an arithmetic sketching scheme with short sketches, then it is possible to test membership in L using an arithmetic circuit with few multiplication gates. Since multiplications are the dominant cost in protocols for computation on secretshared, encrypted, and committed data, arithmetic sketching schemes give rise to lightweight protocols in each of these settings. Beyond the formalization of arithmetic sketching, our contributions are: – A general framework for constructing arithmetic sketching schemes from algebraic varieties. This framework unifies schemes from prior work and gives rise to schemes for useful new languages and with improved soundness error. – The first arithmetic sketching schemes for languages of sparse vectors: vectors with bounded Hamming weight, bounded L1 norm, and vectors whose few nonzero values satisfy a given predicate. – A method for “compiling” any arithmetic sketching scheme for a language L into a lowcommunication malicioussecure multiserver protocol for securely testing that a clientprovided secretshared vector is in L. We also prove the first nontrivial lower bounds showing limits on the sketch size for certain languages (e.g., vectors of Hammingweight one) and proving the nonexistence of arithmetic sketching schemes for others (e.g., the language of all vectors that contain a specific value).more » « less

Abstract We develop deterministic perturbation bounds for singular values and vectors of orthogonally decomposable tensors, in a spirit similar to classical results for matrices such as those due to Weyl, Davis, Kahan and Wedin. Our bounds demonstrate intriguing differences between matrices and higher order tensors. Most notably, they indicate that for higher order tensors perturbation affects each essential singular value/vector in isolation, and its effect on an essential singular vector does not depend on the multiplicity of its corresponding singular value or its distance from other singular values. Our results can be readily applied and provide a unified treatment to many different problems involving higher order orthogonally decomposable tensors. In particular, we illustrate the implications of our bounds through connected yet seemingly different highdimensional data analysis tasks: the unsupervised learning scenario of tensor SVD and the supervised task of tensor regression, leading to new insights in both of these settings.

Tucker decomposition is a lowrank tensor approximation that generalizes a truncated matrix singular value decomposition (SVD). Existing parallel software has shown that Tucker decomposition is particularly effective at compressing terabytesized multidimensional scientific simulation datasets, computing reduced representations that satisfy a specified approximation error. The general approach is to get a lowrank approximation of the input data by performing a sequence of matrix SVDs of tensor unfoldings, which tend to be shortfat matrices. In the existing approach, the SVD is performed by computing the eigendecomposition of the Gram matrix of the unfolding. This method sacrifices some numerical stability in exchange for lower computation costs and easier parallelization. We propose using a more numerically stable though more computationally expensive way to compute the SVD by pre processing with a QR decomposition step and computing an SVD of only the small triangular factor. The more numerically stable approach allows us to achieve the same accuracy with half the working precision (for example, single rather than double precision). We demonstrate that our method scales as well as the existing approach, and the use of lower precision leads to an overall reduction in running time of up to a factor of 2 when using 10s to 1000s of processors. Using the same working precision, we are also able to compute Tucker decompositions with much smaller approximation error.more » « less

We present a method of detecting bifurcations by locating zeros of a signed version of the smallest singular value of the Jacobian. This enables the use of quadratically convergent rootbracketing techniques or Chebyshev interpolation to locate bifurcation points. Only positive singular values have to be computed, though the method relies on the existence of an analytic or smooth singular value decomposition (SVD). The sign of the determinant of the Jacobian, computed as part of the bidiagonal reduction in the SVD algorithm, eliminates slope discontinuities at the zeros of the smallest singular value. We use the method to search for spatially quasiperiodic traveling water waves that bifurcate from largeamplitude periodic waves. The water wave equations are formulated in a conformal mapping framework to facilitate the computation of the quasiperiodic DirichletNeumann operator. We find examples of pure gravity waves with zero surface tension and overhanging gravitycapillary waves. In both cases, the waves have two spatial quasiperiods whose ratio is irrational. We follow the secondary branches via numerical continuation beyond the realm of linearization about solutions on the primary branch to obtain traveling water waves that extend over the real line with no two crests or troughs of exactly the same shape. The pure gravity wave problem is of relevance to ocean waves, where capillary effects can be neglected. Such waves can only exist through secondary bifurcation as they do not persist to zero amplitude. The gravitycapillary wave problem demonstrates the effectiveness of using the signed smallest singular value as a test function for multiparameter bifurcation problems. This test function becomes mesh independent once the mesh is fine enough.more » « less