skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Exploiting sparsity to improve the accuracy of Nyström-based large-scale spectral clustering
The Nyström method is a matrix approximation technique that has shown great promise in speeding up spectral clustering. However, when the input matrix is sparse, we show that the traditional Nyström method requires a prohibitively large number of samples to obtain a good approximation. We propose a novel sampling approach to select the landmark points used to compute the Nyström approximation. We show that the proposed sampling approach obeys the same error bound as in Bouneffouf and Birol (2015). To control sample complexity, we propose a selective densification step based on breadth-first traversal. We show that the proposed densification does not change the optimal clustering. Results on real world datasets show that by combining the proposed sampling and densification schemes, we can obtain better accuracy compared to other techniques used for the Nyström method while using significantly fewer samples.  more » « less
Award ID(s):
1650080 1451954
PAR ID:
10031052
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2017 International Joint Conference on Neural Networks (IJCNN)
Page Range / eLocation ID:
9 to 16
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Nyström approximation is a fast randomized method that rapidly solves kernel ridge regression (KRR) problems through sub-sampling the n-by-n empirical kernel matrix appearing in the objective function. However, the performance of such a sub-sampling method heavily relies on correctly estimating the statistical leverage scores for forming the sampling distribution, which can be as costly as solving the original KRR. In this work, we propose a linear time (modulo poly-log terms) algorithm to accurately approximate the statistical leverage scores in the stationary-kernel-based KRR with theoretical guarantees. Particularly, by analyzing the first-order condition of the KRR objective, we derive an analytic formula, which depends on both the input distribution and the spectral density of stationary kernels, for capturing the non-uniformity of the statistical leverage scores. Numerical experiments demonstrate that with the same prediction accuracy our method is orders of magnitude more efficient than existing methods in selecting the representative sub-samples in the Nyström approximation. 
    more » « less
  2. null (Ed.)
    Building a sketch of an n-by-n empirical kernel matrix is a common approach to accelerate the computation of many kernel methods. In this paper, we propose a unified framework of constructing sketching methods in kernel ridge regression (KRR), which views the sketching matrix S as an accumulation of m rescaled sub-sampling matrices with independent columns. Our framework incorporates two commonly used sketching methods, sub-sampling sketches (known as the Nyström method) and sub-Gaussian sketches, as special cases with m=1 and m=infinity respectively. Under the new framework, we provide a unified error analysis of sketching approximation and show that our accumulation scheme improves the low accuracy of sub-sampling sketches when certain incoherence characteristic is high, and accelerates the more accurate but computationally heavier sub-Gaussian sketches. By optimally choosing the number m of accumulations, we show that a best trade-off between computational efficiency and statistical accuracy can be achieved. In practice, the sketching method can be as efficiently implemented as the sub-sampling sketches, as only minor extra matrix additions are needed. Our empirical evaluations also demonstrate that the proposed method may attain the accuracy close to sub-Gaussian sketches, while is as efficient as sub-sampling-based sketches. 
    more » « less
  3. We present a new class of preconditioned iterative methods for solving linear systems of the form Ax = b. Our methods are based on constructing a low-rank Nyström approximation to A using sparse random matrix sketching. This approximation is used to construct a preconditioner, which itself is inverted quickly using additional levels of random sketching and preconditioning. We prove that the convergence of our methods depends on a natural average condition number of A, which improves as the rank of the Nyström approximation increases. Concretely, this allows us to obtain faster runtimes for a number of fundamental linear algebraic problems: 1. We show how to solve any n × n linear system that is well-conditioned except for k outlying large singular values in Õ (n2.065 + kω) time, improving on a recent result of [Derezmski, Yang, STOC 2024] for all k ≳ n0.78. 2. We give the first Õ (n2 + dλω) time algorithm for solving a regularized linear system (A+λΙ)x = b, where A is positive semidefinite with effective dimension dλ = tr(A(A + λΙ)-1). This problem arises in applications like Gaussian process regression. 3. We give faster algorithms for approximating Schatten p-norms and other matrix norms. For example, for the Schatten 1-norm (nuclear norm), we give an algorithm that runs in Õ (n2.11) time, improving on an Õ (n2.18) method of [Musco et al., ITCS 2018]. All results are proven in the real RAM model of computation. Interestingly, previous state-of-the-art algorithms for most of the problems above relied on stochastic iterative methods, like stochastic coordinate and gradient descent. Our work takes a completely different approach, instead leveraging tools from matrix sketching. 
    more » « less
  4. We present a new class of preconditioned iterative methods for solving linear systems of the form Ax=b. Our methods are based on constructing a low-rank Nyström approximation to A using sparse random matrix sketching. This approximation is used to construct a preconditioner, which itself is inverted quickly using additional levels of random sketching and preconditioning. We prove that the convergence of our methods depends on a natural average condition number of A, which improves as the rank of the Nyström approximation increases. Concretely, this allows us to obtain faster runtimes for a number of fundamental linear algebraic problems: 1. We show how to solve any n×n linear system that is well-conditioned except for k outlying large singular values in O~(n^2.065+k^ω) time, improving on a recent result of [Dereziński, Yang, STOC 2024] for all k≳n^0.78. 2. We give the first O~(n^2+d_λ^ω) time algorithm for solving a regularized linear system (A+λI)x=b, where A is positive semidefinite with effective dimension d_λ=tr(A(A+λI)^{−1}). This problem arises in applications like Gaussian process regression. 3. We give faster algorithms for approximating Schatten p-norms and other matrix norms. For example, for the Schatten 1-norm (nuclear norm), we give an algorithm that runs in O~(n ^{2.11}) time, improving on an O~(n ^{2.18}) method of [Musco et al., ITCS 2018]. All results are proven in the real RAM model of computation. Interestingly, previous state-of-the-art algorithms for most of the problems above relied on stochastic iterative methods, like stochastic coordinate and gradient descent. Our work takes a completely different approach, instead leveraging tools from matrix sketching. 
    more » « less
  5. We describe a randomized algorithm for producing a near-optimal hierarchical off-diagonal low-rank (HODLR) approximation to an n × n matrix A, accessible only though matrix-vector products with A and AT. We prove that, for the rank-k HODLR approximation problem, our method achieves a (1 + β )log(n )-optimal approximation in expected Frobenius norm using O (k log(n )/β3) matrix-vector products. In particular, the algorithm obtains a (1 + ∈ )-optimal approximation with O (k log4(n )/∈3) matrix-vector products, and for any constant c, an nc-optimal approximation with O (k log(n )) matrix-vector products. Apart from matrix-vector products, the additional computational cost of our method is just O (n poly(log(n ), k, β )). We complement the upper bound with a lower bound, which shows that any matrix-vector query algorithm requires at least Ω(k log(n ) + k/ε ) queries to obtain a (1 + ε )-optimal approximation. Our algorithm can be viewed as a robust version of widely used “peeling” methods for recovering HODLR matrices and is, to the best of our knowledge, the first matrix-vector query algorithm to enjoy theoretical worst- case guarantees for approximation by any hierarchical matrix class. To control the propagation of error between levels of hierarchical approximation, we introduce a new perturbation bound for low-rank approximation, which shows that the widely used Generalized Nyström method enjoys inherent stability when implemented with noisy matrix-vector products. We also introduce a novel randomly perforated matrix sketching method to further control the error in the peeling algorithm. 
    more » « less