skip to main content

Title: Faster Kernel Matrix Algebra via Density Estimation.
We study fast algorithms for computing fundamental properties of a positive semidefinite kernel matrix K∈ R^{n*n} corresponding to n points x1,…,xn∈R^d. In particular, we consider estimating the sum of kernel matrix entries, along with its top eigenvalue and eigenvector. We show that the sum of matrix entries can be estimated to 1+ϵ relative error in time sublinear in n and linear in d for many popular kernels, including the Gaussian, exponential, and rational quadratic kernels. For these kernels, we also show that the top eigenvalue (and an approximate eigenvector) can be approximated to 1+ϵ relative error in time subquadratic in n and linear in d. Our algorithms represent significant advances in the best known runtimes for these problems. They leverage the positive definiteness of the kernel matrix, along with a recent line of work on efficient kernel density estimation.
Authors:
; ; ;
Award ID(s):
1763618 2046235
Publication Date:
NSF-PAR ID:
10290577
Journal Name:
International Conference on Machine Learning (ICML)
Sponsoring Org:
National Science Foundation
More Like this
  1. We study fast algorithms for computing basic properties of an n x n positive semidefinite kernel matrix K corresponding to n points x_1,...,x_n in R^d. In particular, we consider the estimating the sum of kernel matrix entries, along with its top eigenvalue and eigenvector. These are some of the most basic problems defined over kernel matrices. We show that the sum of matrix entries can be estimated up to a multiplicative factor of 1+epsilon in time sublinear in n and linear in d for many popular kernel functions, including the Gaussian, exponential, and rational quadratic kernels. For these kernels, we also show that the top eigenvalue (and a witnessing approximate eigenvector) can be approximated to a multiplicative factor of 1+epsilon in time sub-quadratic in n and linear in d. Our algorithms represent significant advances in the best known runtimes for these problems. They leverage the positive definiteness of the kernel matrix, along with a recent line of work on efficient kernel density estimation.
  2. For a graph G on n vertices, naively sampling the position of a random walk of at time t requires work Ω(t). We desire local access algorithms supporting positionG(t) queries, which return the position of a random walk from some fixed start vertex s at time t, where the joint distribution of returned positions is 1/ poly(n) close to those of a uniformly random walk in ℓ1 distance. We first give an algorithm for local access to random walks on a given undirected d-regular graph with eO( 1 1−λ √ n) runtime per query, where λ is the second-largest eigenvalue of the random walk matrix of the graph in absolute value. Since random d-regular graphs G(n, d) are expanders with high probability, this gives an eO(√ n) algorithm for a graph drawn from G(n, d) whp, which improves on the naive method for small numbers of queries. We then prove that no algorithm with subconstant error given probe access to an input d-regular graph can have runtime better than Ω(√ n/ log(n)) per query in expectation when the input graph is drawn from G(n, d), obtaining a nearly matching lower bound. We further show an Ω(n1/4) runtime per query lowermore »bound even with an oblivious adversary (i.e. when the query sequence is fixed in advance). We then show that for families of graphs with additional group theoretic structure, dramatically better results can be achieved. We give local access to walks on small-degree abelian Cayley graphs, including cycles and hypercubes, with runtime polylog(n) per query. This also allows for efficient local access to walks on polylog degree expanders. We show that our techniques apply to graphs with high degree by extending or results to graphs constructed using the tensor product (giving fast local access to walks on degree nϵ graphs for any ϵ ∈ (0, 1]) and Cartesian product.« less
  3. Abstract For $p\geq 1$ and $(g_{ij})_{1\leq i,j\leq n}$ being a matrix of i.i.d. standard Gaussian entries, we study the $n$-limit of the $\ell _p$-Gaussian–Grothendieck problem defined as $$\begin{align*} & \max\Bigl\{\sum_{i,j=1}^n g_{ij}x_ix_j: x\in \mathbb{R}^n,\sum_{i=1}^n |x_i|^p=1\Bigr\}. \end{align*}$$The case $p=2$ corresponds to the top eigenvalue of the Gaussian orthogonal ensemble; when $p=\infty $, the maximum value is essentially the ground state energy of the Sherrington–Kirkpatrick mean-field spin glass model and its limit can be expressed by the famous Parisi formula. In the present work, we focus on the cases $1\leq p<2$ and $2<p<\infty .$ For the former, we compute the limit of the $\ell _p$-Gaussian–Grothendieck problem and investigate the structure of the set of all near optimizers along with stability estimates. In the latter case, we show that this problem admits a Parisi-type variational representation and the corresponding optimizer is weakly delocalized in the sense that its entries vanish uniformly in a polynomial order of $n^{-1}$.
  4. Abstract Covariance matrices are fundamental to the analysis and forecast of economic, physical and biological systems. Although the eigenvalues $\{\lambda _i\}$ and eigenvectors $\{\boldsymbol{u}_i\}$ of a covariance matrix are central to such endeavours, in practice one must inevitably approximate the covariance matrix based on data with finite sample size $n$ to obtain empirical eigenvalues $\{\tilde{\lambda }_i\}$ and eigenvectors $\{\tilde{\boldsymbol{u}}_i\}$, and therefore understanding the error so introduced is of central importance. We analyse eigenvector error $\|\boldsymbol{u}_i - \tilde{\boldsymbol{u}}_i \|^2$ while leveraging the assumption that the true covariance matrix having size $p$ is drawn from a matrix ensemble with known spectral properties—particularly, we assume the distribution of population eigenvalues weakly converges as $p\to \infty $ to a spectral density $\rho (\lambda )$ and that the spacing between population eigenvalues is similar to that for the Gaussian orthogonal ensemble. Our approach complements previous analyses of eigenvector error that require the full set of eigenvalues to be known, which can be computationally infeasible when $p$ is large. To provide a scalable approach for uncertainty quantification of eigenvector error, we consider a fixed eigenvalue $\lambda $ and approximate the distribution of the expected square error $r= \mathbb{E}\left [\| \boldsymbol{u}_i - \tilde{\boldsymbol{u}}_i \|^2\right ]$ across themore »matrix ensemble for all $\boldsymbol{u}_i$ associated with $\lambda _i=\lambda $. We find, for example, that for sufficiently large matrix size $p$ and sample size $n> p$, the probability density of $r$ scales as $1/nr^2$. This power-law scaling implies that the eigenvector error is extremely heterogeneous—even if $r$ is very small for most eigenvectors, it can be large for others with non-negligible probability. We support this and further results with numerical experiments.« less
  5. We give two new quantum algorithms for solving semidefinite programs (SDPs) providing quantum speed-ups. We consider SDP instances with m constraint matrices, each of dimension n, rank at most r, and sparsity s. The first algorithm assumes an input model where one is given access to an oracle to the entries of the matrices at unit cost. We show that it has run time O~(s^2 (sqrt{m} epsilon^{-10} + sqrt{n} epsilon^{-12})), with epsilon the error of the solution. This gives an optimal dependence in terms of m, n and quadratic improvement over previous quantum algorithms (when m ~~ n). The second algorithm assumes a fully quantum input model in which the input matrices are given as quantum states. We show that its run time is O~(sqrt{m}+poly(r))*poly(log m,log n,B,epsilon^{-1}), with B an upper bound on the trace-norm of all input matrices. In particular the complexity depends only polylogarithmically in n and polynomially in r. We apply the second SDP solver to learn a good description of a quantum state with respect to a set of measurements: Given m measurements and a supply of copies of an unknown state rho with rank at most r, we show we can find in time sqrt{m}*poly(logmore »m,log n,r,epsilon^{-1}) a description of the state as a quantum circuit preparing a density matrix which has the same expectation values as rho on the m measurements, up to error epsilon. The density matrix obtained is an approximation to the maximum entropy state consistent with the measurement data considered in Jaynes' principle from statistical mechanics. As in previous work, we obtain our algorithm by "quantizing" classical SDP solvers based on the matrix multiplicative weight update method. One of our main technical contributions is a quantum Gibbs state sampler for low-rank Hamiltonians, given quantum states encoding these Hamiltonians, with a poly-logarithmic dependence on its dimension, which is based on ideas developed in quantum principal component analysis. We also develop a "fast" quantum OR lemma with a quadratic improvement in gate complexity over the construction of Harrow et al. [Harrow et al., 2017]. We believe both techniques might be of independent interest.« less