skip to main content

Title: Limiting Spectrum of Randomized Hadamard Transform and Optimal Iterative Sketching Methods
Random projections or sketching are widely used in many algorithmic and learning contexts. Here we study the performance of iterative Hessian sketch for leastsquares problems. By leveraging and extending recent results from random matrix theory on the limiting spectrum of matrices randomly projected with the subsampled randomized Hadamard transform, and truncated Haar matrices, we can study and compare the resulting algorithms to a level of precision that has not been possible before. Our technical contributions include a novel formula for the second moment of the inverse of projected matrices. We also find simple closed-form expressions for asymptotically optimal step-sizes and convergence rates. These show that the convergence rate for Haar and randomized Hadamard matrices are identical, and asymptotically improve upon Gaussian random projections. These techniques may be applied to other algorithms that employ randomized dimension reduction.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Conference on Neural Information Processing Systems
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We provide an exact analysis of a class of randomized algorithms for solving overdetermined least-squares problems. We consider first-order methods, where the gradients are pre-conditioned by an approximation of the Hessian, based on a subspace embedding of the data matrix. This class of algorithms encompasses several randomized methods among the fastest solvers for leastsquares problems. We focus on two classical embeddings, namely, Gaussian projections and subsampled randomized Hadamard transforms (SRHT). Our key technical innovation is the derivation of the limiting spectral density of SRHT embeddings. Leveraging this novel result, we derive the family of normalized orthogonal polynomials of the SRHT density and we find the optimal pre-conditioned first-order method along with its rate of convergence. Our analysis of Gaussian embeddings proceeds similarly, and leverages classical random matrix theory results. In particular, we show that for a given sketch size, SRHT embeddings exhibits a faster rate of convergence than Gaussian embeddings. Then, we propose a new algorithm by optimizing the computational complexity over the choice of the sketching dimension. To our knowledge, our resulting algorithm yields the best known complexity for solving least-squares problems with no condition number dependence. 
    more » « less
  2. In this work, we address the problem of Hessian inversion bias in distributed second-order optimization algorithms. We introduce a novel shrinkage-based estimator for the resolvent of gram matrices that is asymptotically unbiased, and characterize its non-asymptotic convergence rate in the isotropic case. We apply this estimator to bias correction of Newton steps in distributed second-order optimization algorithms, as well as randomized sketching based methods. We examine the bias present in the naive averaging-based distributed Newton’s method using analytical expressions and contrast it with our proposed biasfree approach. Our approach leads to significant improvements in convergence rate compared to standard baselines and recent proposals, as shown through experiments on both real and synthetic datasets. 
    more » « less
  3. This paper considers a random component-wise variant of the unnormalized power method, which is similar to the regular power iteration except that only a random subset of indices is updated in each iteration. For the case of normal matrices, it was previously shown that random component-wise updates converge in the mean-squared sense to an eigenvector of eigenvalue 1 of the underlying matrix even in the case of the matrix having spectral radius larger than unity. In addition to the enlarged convergence regions, this study shows that the eigenvalue gap does not directly a ect the convergence rate of the randomized updates unlike the regular power method. In particular, it is shown that the rate of convergence is a ected by the phase of the eigenvalues in the case of random component-wise updates, and the randomized updates favor negative eigenvalues over positive ones. As an application, this study considers a reformulation of the component-wise updates revealing a randomized algorithm that is proven to converge to the dominant left and right singular vectors of a normalized data matrix. The algorithm is also extended to handle large-scale distributed data when computing an arbitrary rank approximation of an arbitrary data matrix. Numerical simulations verify the convergence of the proposed algorithms under di erent parameter settings. 
    more » « less
  4. Ultrametric matrices appear in many domains of mathematics and science; nevertheless, they can be large and dense, making them difficult to store and manipulate, unlike large but sparse matrices. In this manuscript, we exploit that ultrametric matrices can be represented as binary trees to sparsify them via an orthonormal base change based on Haar-like wavelets. We show that, with overwhelmingly high probability, only an asymptotically negligible fraction of the off-diagonal entries in random but large ultrametric matrices remain non-zero after the base change; and develop an algorithm to sparsify such matrices directly from their tree representation. We also identify the subclass of matrices diagonalized by the Haar-like wavelets and supply a sufficient condition to approximate the spectrum of ultrametric matrices outside this subclass. Our methods give computational access to a covariance matrix model of the microbiologists’ Tree of Life, which was previously inaccessible due to its size, and motivate introducing a new wavelet-based (beta-diversity) metric to compare microbial environments. Unlike the established metrics, the new metric may be used to identify internal nodes (i.e. splits) in the Tree that link microbial composition and environmental factors in a statistically significant manner. 
    more » « less
  5. Lipka, Alexander (Ed.)

    Many genetic models (including models for epistatic effects as well as genetic-by-environment) involve covariance structures that are Hadamard products of lower rank matrices. Implementing these models require factorizing large Hadamard product matrices. The available algorithms for factorization do not scale well for big data, making the use of some of these models not feasible with large sample sizes. Here, based on properties of Hadamard products and (related) Kronecker products we propose an algorithm that produces an approximate decomposition that is orders of magnitude faster than the standard eigenvalue decomposition. In this article, we describe the algorithm, show how it can be used to factorize large Hadamard product matrices, present benchmarks, and illustrate the use of the method by presenting an analysis of data from the northern testing locations of the G×E project from the Genomes-to-Fields Initiative (n∼60,000). We implemented the proposed algorithm in the open-source ‘tensorEVD’ R-package.

    more » « less