skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Limiting Spectrum of Randomized Hadamard Transform and Optimal Iterative Sketching Methods
Random projections or sketching are widely used in many algorithmic and learning contexts. Here we study the performance of iterative Hessian sketch for leastsquares problems. By leveraging and extending recent results from random matrix theory on the limiting spectrum of matrices randomly projected with the subsampled randomized Hadamard transform, and truncated Haar matrices, we can study and compare the resulting algorithms to a level of precision that has not been possible before. Our technical contributions include a novel formula for the second moment of the inverse of projected matrices. We also find simple closed-form expressions for asymptotically optimal step-sizes and convergence rates. These show that the convergence rate for Haar and randomized Hadamard matrices are identical, and asymptotically improve upon Gaussian random projections. These techniques may be applied to other algorithms that employ randomized dimension reduction.  more » « less
Award ID(s):
1838179
PAR ID:
10206899
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Conference on Neural Information Processing Systems
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We provide an exact analysis of a class of randomized algorithms for solving overdetermined least-squares problems. We consider first-order methods, where the gradients are pre-conditioned by an approximation of the Hessian, based on a subspace embedding of the data matrix. This class of algorithms encompasses several randomized methods among the fastest solvers for leastsquares problems. We focus on two classical embeddings, namely, Gaussian projections and subsampled randomized Hadamard transforms (SRHT). Our key technical innovation is the derivation of the limiting spectral density of SRHT embeddings. Leveraging this novel result, we derive the family of normalized orthogonal polynomials of the SRHT density and we find the optimal pre-conditioned first-order method along with its rate of convergence. Our analysis of Gaussian embeddings proceeds similarly, and leverages classical random matrix theory results. In particular, we show that for a given sketch size, SRHT embeddings exhibits a faster rate of convergence than Gaussian embeddings. Then, we propose a new algorithm by optimizing the computational complexity over the choice of the sketching dimension. To our knowledge, our resulting algorithm yields the best known complexity for solving least-squares problems with no condition number dependence. 
    more » « less
  2. In this work, we address the problem of Hessian inversion bias in distributed second-order optimization algorithms. We introduce a novel shrinkage-based estimator for the resolvent of gram matrices that is asymptotically unbiased, and characterize its non-asymptotic convergence rate in the isotropic case. We apply this estimator to bias correction of Newton steps in distributed second-order optimization algorithms, as well as randomized sketching based methods. We examine the bias present in the naive averaging-based distributed Newton’s method using analytical expressions and contrast it with our proposed biasfree approach. Our approach leads to significant improvements in convergence rate compared to standard baselines and recent proposals, as shown through experiments on both real and synthetic datasets. 
    more » « less
  3. Let (kn)n∈N be a sequence of positive integers growing to infinity at a sublinear rate, kn → ∞ and kn/n → 0 as n → ∞. Given a sequence of n-dimensional random vectors {Y (n)}n∈N belonging to a certain class, which includes uniform distributions on suitably scaled ℓnp -balls or ℓnp -spheres, p ≥ 2, and product distributions with sub-Gaussian marginals, we study the large deviations behavior of the corresponding sequence of kn-dimensional orthogonal projections. For almost every sequence of projection matrices, we establish a large deviation principle (LDP) for the corresponding sequence of projections, with a fairly explicit rate function that does not depend on the sequence of projection matrices. As corollaries, we also obtain quenched LDPs for sequences of ℓ2-norms and ℓ∞-norms of the coordinates of the projections. Past work on LDPs for projections with growing dimension has mainly focused on the annealed setting, where one also averages over the random projection matrix, chosen from the Haar measure, in which case the coordinates of the projection are exchangeable. The quenched setting lacks such symmetry properties, and gives rise to significant new challenges in the setting of growing projection dimension. Along the way, we establish new Gaussian approximation results on the Stiefel manifold that may be of independent interest. Such LDPs are of relevance in asymptotic convex geometry, statistical physics and high-dimensional statistics. 
    more » « less
  4. Ultrametric matrices appear in many domains of mathematics and science; nevertheless, they can be large and dense, making them difficult to store and manipulate, unlike large but sparse matrices. In this manuscript, we exploit that ultrametric matrices can be represented as binary trees to sparsify them via an orthonormal base change based on Haar-like wavelets. We show that, with overwhelmingly high probability, only an asymptotically negligible fraction of the off-diagonal entries in random but large ultrametric matrices remain non-zero after the base change; and develop an algorithm to sparsify such matrices directly from their tree representation. We also identify the subclass of matrices diagonalized by the Haar-like wavelets and supply a sufficient condition to approximate the spectrum of ultrametric matrices outside this subclass. Our methods give computational access to a covariance matrix model of the microbiologists’ Tree of Life, which was previously inaccessible due to its size, and motivate introducing a new wavelet-based (beta-diversity) metric to compare microbial environments. Unlike the established metrics, the new metric may be used to identify internal nodes (i.e. splits) in the Tree that link microbial composition and environmental factors in a statistically significant manner. 
    more » « less
  5. This paper considers a random component-wise variant of the unnormalized power method, which is similar to the regular power iteration except that only a random subset of indices is updated in each iteration. For the case of normal matrices, it was previously shown that random component-wise updates converge in the mean-squared sense to an eigenvector of eigenvalue 1 of the underlying matrix even in the case of the matrix having spectral radius larger than unity. In addition to the enlarged convergence regions, this study shows that the eigenvalue gap does not directly a ect the convergence rate of the randomized updates unlike the regular power method. In particular, it is shown that the rate of convergence is a ected by the phase of the eigenvalues in the case of random component-wise updates, and the randomized updates favor negative eigenvalues over positive ones. As an application, this study considers a reformulation of the component-wise updates revealing a randomized algorithm that is proven to converge to the dominant left and right singular vectors of a normalized data matrix. The algorithm is also extended to handle large-scale distributed data when computing an arbitrary rank approximation of an arbitrary data matrix. Numerical simulations verify the convergence of the proposed algorithms under di erent parameter settings. 
    more » « less