skip to main content

Title: Error Estimation for Sketched SVD via the Bootstrap
In order to compute fast approximations to the singular value decompositions (SVD) of very large matrices, randomized sketching algorithms have become a leading approach. However, a key practical difficulty of sketching an SVD is that the user does not know how far the sketched singular vectors/values are from the exact ones. Indeed, the user may be forced to rely on analytical worst-case error bounds, which may not account for the unique structure of a given problem. As a result, the lack of tools for error estimation often leads to much more computation than is really necessary. To overcome these challenges, this paper develops a fully data-driven bootstrap method that numerically estimates the actual error of sketched singular vectors/values. Furthermore, the method is computationally inexpensive, because it operates only on sketched objects, and hence it requires no extra passes over the full matrix being factored.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 37th International Conference on Machine Learningg
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Modern deep neural networks (DNNs) often require high memory consumption and large computational loads. In order to deploy DNN algorithms efficiently on edge or mobile devices, a series of DNN compression algorithms have been explored, including factorization methods. Factorization methods approximate the weight matrix of a DNN layer with the multiplication of two or multiple low-rank matrices. However, it is hard to measure the ranks of DNN layers during the training process. Previous works mainly induce low-rank through implicit approximations or via costly singular value decomposition (SVD) process on every training step. The former approach usually induces a high accuracy loss while the latter has a low efficiency. In this work, we propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step. SVD training first decomposes each layer into the form of its full-rank SVD, then performs training directly on the decomposed weights. We add orthogonality regularization to the singular vectors, which ensure the valid form of SVD and avoid gradient vanishing/exploding. Low-rank is encouraged by applying sparsity-inducing regularizers on the singular values of each layer. Singular value pruning is applied at the end to explicitly reach a low-rank model. We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy, comparing to not only previous factorization methods but also state-of-the-art filter pruning methods. 
    more » « less
  2. This paper introduces arithmetic sketching, an abstraction of a primitive that several previous works use to achieve lightweight, low-communication zero-knowledge verification of secret-shared vectors. An arithmetic sketching scheme for a language L ⊆ F^n consists of (1) a randomized linear function compressing a long input x to a short “sketch,” and (2) a small arithmetic circuit that accepts the sketch if and only if x ∈ L, up to some small error. If the language L has an arithmetic sketching scheme with short sketches, then it is possible to test membership in L using an arithmetic circuit with few multiplication gates. Since multiplications are the dominant cost in protocols for computation on secret-shared, encrypted, and committed data, arithmetic sketching schemes give rise to lightweight protocols in each of these settings. Beyond the formalization of arithmetic sketching, our contributions are: – A general framework for constructing arithmetic sketching schemes from algebraic varieties. This framework unifies schemes from prior work and gives rise to schemes for useful new languages and with improved soundness error. – The first arithmetic sketching schemes for languages of sparse vectors: vectors with bounded Hamming weight, bounded L1 norm, and vectors whose few non-zero values satisfy a given predicate. – A method for “compiling” any arithmetic sketching scheme for a language L into a low-communication malicious-secure multi-server protocol for securely testing that a client-provided secret-shared vector is in L. We also prove the first nontrivial lower bounds showing limits on the sketch size for certain languages (e.g., vectors of Hamming-weight one) and proving the non-existence of arithmetic sketching schemes for others (e.g., the language of all vectors that contain a specific value). 
    more » « less
  3. Abstract

    We develop deterministic perturbation bounds for singular values and vectors of orthogonally decomposable tensors, in a spirit similar to classical results for matrices such as those due to Weyl, Davis, Kahan and Wedin. Our bounds demonstrate intriguing differences between matrices and higher order tensors. Most notably, they indicate that for higher order tensors perturbation affects each essential singular value/vector in isolation, and its effect on an essential singular vector does not depend on the multiplicity of its corresponding singular value or its distance from other singular values. Our results can be readily applied and provide a unified treatment to many different problems involving higher order orthogonally decomposable tensors. In particular, we illustrate the implications of our bounds through connected yet seemingly different high-dimensional data analysis tasks: the unsupervised learning scenario of tensor SVD and the supervised task of tensor regression, leading to new insights in both of these settings.

    more » « less
  4. Tucker decomposition is a low-rank tensor approximation that generalizes a truncated matrix singular value decomposition (SVD). Existing parallel software has shown that Tucker decomposition is particularly effective at compressing terabyte-sized multidimensional scientific simulation datasets, computing reduced representations that satisfy a specified approximation error. The general approach is to get a low-rank approximation of the input data by performing a sequence of matrix SVDs of tensor unfoldings, which tend to be short-fat matrices. In the existing approach, the SVD is performed by computing the eigendecomposition of the Gram matrix of the unfolding. This method sacrifices some numerical stability in exchange for lower computation costs and easier parallelization. We propose using a more numerically stable though more computationally expensive way to compute the SVD by pre- processing with a QR decomposition step and computing an SVD of only the small triangular factor. The more numerically stable approach allows us to achieve the same accuracy with half the working precision (for example, single rather than double precision). We demonstrate that our method scales as well as the existing approach, and the use of lower precision leads to an overall reduction in running time of up to a factor of 2 when using 10s to 1000s of processors. Using the same working precision, we are also able to compute Tucker decompositions with much smaller approximation error. 
    more » « less
  5. We present a method of detecting bifurcations by locating zeros of a signed version of the smallest singular value of the Jacobian. This enables the use of quadratically convergent root-bracketing techniques or Chebyshev interpolation to locate bifurcation points. Only positive singular values have to be computed, though the method relies on the existence of an analytic or smooth singular value decomposition (SVD). The sign of the determinant of the Jacobian, computed as part of the bidiagonal reduction in the SVD algorithm, eliminates slope discontinuities at the zeros of the smallest singular value. We use the method to search for spatially quasi-periodic traveling water waves that bifurcate from large-amplitude periodic waves. The water wave equations are formulated in a conformal mapping framework to facilitate the computation of the quasi-periodic Dirichlet-Neumann operator. We find examples of pure gravity waves with zero surface tension and overhanging gravity-capillary waves. In both cases, the waves have two spatial quasi-periods whose ratio is irrational. We follow the secondary branches via numerical continuation beyond the realm of linearization about solutions on the primary branch to obtain traveling water waves that extend over the real line with no two crests or troughs of exactly the same shape. The pure gravity wave problem is of relevance to ocean waves, where capillary effects can be neglected. Such waves can only exist through secondary bifurcation as they do not persist to zero amplitude. The gravity-capillary wave problem demonstrates the effectiveness of using the signed smallest singular value as a test function for multi-parameter bifurcation problems. This test function becomes mesh independent once the mesh is fine enough. 
    more » « less