skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Error Estimation for Sketched SVD via the Bootstrap
In order to compute fast approximations to the singular value decompositions (SVD) of very large matrices, randomized sketching algorithms have become a leading approach. However, a key practical difficulty of sketching an SVD is that the user does not know how far the sketched singular vectors/values are from the exact ones. Indeed, the user may be forced to rely on analytical worst-case error bounds, which may not account for the unique structure of a given problem. As a result, the lack of tools for error estimation often leads to much more computation than is really necessary. To overcome these challenges, this paper develops a fully data-driven bootstrap method that numerically estimates the actual error of sketched singular vectors/values. Furthermore, the method is computationally inexpensive, because it operates only on sketched objects, and hence it requires no extra passes over the full matrix being factored.  more » « less
Award ID(s):
1934568
PAR ID:
10282005
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 37th International Conference on Machine Learningg
Volume:
119
Page Range / eLocation ID:
6382-6392
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Modern deep neural networks (DNNs) often require high memory consumption and large computational loads. In order to deploy DNN algorithms efficiently on edge or mobile devices, a series of DNN compression algorithms have been explored, including factorization methods. Factorization methods approximate the weight matrix of a DNN layer with the multiplication of two or multiple low-rank matrices. However, it is hard to measure the ranks of DNN layers during the training process. Previous works mainly induce low-rank through implicit approximations or via costly singular value decomposition (SVD) process on every training step. The former approach usually induces a high accuracy loss while the latter has a low efficiency. In this work, we propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step. SVD training first decomposes each layer into the form of its full-rank SVD, then performs training directly on the decomposed weights. We add orthogonality regularization to the singular vectors, which ensure the valid form of SVD and avoid gradient vanishing/exploding. Low-rank is encouraged by applying sparsity-inducing regularizers on the singular values of each layer. Singular value pruning is applied at the end to explicitly reach a low-rank model. We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy, comparing to not only previous factorization methods but also state-of-the-art filter pruning methods. 
    more » « less
  2. This paper introduces arithmetic sketching, an abstraction of a primitive that several previous works use to achieve lightweight, low-communication zero-knowledge verification of secret-shared vectors. An arithmetic sketching scheme for a language L ⊆ F^n consists of (1) a randomized linear function compressing a long input x to a short “sketch,” and (2) a small arithmetic circuit that accepts the sketch if and only if x ∈ L, up to some small error. If the language L has an arithmetic sketching scheme with short sketches, then it is possible to test membership in L using an arithmetic circuit with few multiplication gates. Since multiplications are the dominant cost in protocols for computation on secret-shared, encrypted, and committed data, arithmetic sketching schemes give rise to lightweight protocols in each of these settings. Beyond the formalization of arithmetic sketching, our contributions are: – A general framework for constructing arithmetic sketching schemes from algebraic varieties. This framework unifies schemes from prior work and gives rise to schemes for useful new languages and with improved soundness error. – The first arithmetic sketching schemes for languages of sparse vectors: vectors with bounded Hamming weight, bounded L1 norm, and vectors whose few non-zero values satisfy a given predicate. – A method for “compiling” any arithmetic sketching scheme for a language L into a low-communication malicious-secure multi-server protocol for securely testing that a client-provided secret-shared vector is in L. We also prove the first nontrivial lower bounds showing limits on the sketch size for certain languages (e.g., vectors of Hamming-weight one) and proving the non-existence of arithmetic sketching schemes for others (e.g., the language of all vectors that contain a specific value). 
    more » « less
  3. This paper proposes an optimization-based method to learn the singular value decomposition (SVD) of a compact operator with ordered singular functions. The proposed objective function is based on Schmidt’s low-rank approximation theorem (1907) that characterizes a truncated SVD as a solution minimizing the mean squared error, accompanied with a technique called nesting to learn the ordered structure. When the optimization space is parameterized by neural networks, we refer to the proposed method as NeuralSVD. The implementation does not require sophisticated optimization tricks unlike existing approaches. 
    more » « less
  4. Abstract Linear regression is a classic method of data analysis. In recent years, sketching—a method of dimension reduction using random sampling, random projections or both—has gained popularity as an effective computational approximation when the number of observations greatly exceeds the number of variables. In this paper, we address the following question: how does sketching affect the statistical properties of the solution and key quantities derived from it? To answer this question, we present a projector-based approach to sketched linear regression that is exact and that requires minimal assumptions on the sketching matrix. Therefore, downstream analyses hold exactly and generally for all sketching schemes. Additionally, a projector-based approach enables derivation of key quantities from classic linear regression that account for the combined model- and algorithm-induced uncertainties. We demonstrate the usefulness of a projector-based approach in quantifying and enabling insight on excess uncertainties and bias-variance decompositions for sketched linear regression. Finally, we demonstrate how the insights from our projector-based analyses can be used to produce practical sketching diagnostics to aid the design of judicious sketching schemes. 
    more » « less
  5. Given a matrix A ∈ ℝn\texttimes{}d and a vector b ∈ ℝn, we consider the regression problem with ℓ∞ guarantees: finding a vector x′ ∈ ℝd such that $$||x'-x^* ||_infty leq frac{epsilon}{sqrt{d}}cdot ||Ax^*-b||_2cdot ||A^dagger||$$, where x* = arg minx∈Rd ||Ax – b||2. One popular approach for solving such ℓ2 regression problem is via sketching: picking a structured random matrix S ∈ ℝm\texttimes{}n with m < n and S A can be quickly computed, solve the "sketched" regression problem arg minx∈ℝd ||S Ax – Sb||2. In this paper, we show that in order to obtain such ℓ∞ guarantee for ℓ2 regression, one has to use sketching matrices that are dense. To the best of our knowledge, this is the first user case in which dense sketching matrices are necessary. On the algorithmic side, we prove that there exists a distribution of dense sketching matrices with m = ε-2d log3(n/δ) such that solving the sketched regression problem gives the ℓ∞ guarantee, with probability at least 1 – δ. Moreover, the matrix S A can be computed in time O(nd log n). Our row count is nearly-optimal up to logarithmic factors, and significantly improves the result in (Price et al., 2017), in which a superlinear in d rows, m = Ω(ε-2d1+γ) for γ ∈ (0, 1) is required. Moreover, we develop a novel analytical framework for ℓ∞ guarantee regression that utilizes the Oblivious Coordinate-wise Embedding (OCE) property introduced in (Song \& Yu, 2021). Our analysis is much simpler and more general than that of (Price et al., 2017). Leveraging this framework, we extend the ℓ∞ guarantee regression result to dense sketching matrices for computing the fast tensor product of vectors. 
    more » « less