skip to main content


Title: Approximate is Good Enough: Probabilistic Variants of Dimensional and Margin Complexity
We present and study approximate notions of dimensional and margin complexity, which correspond to the minimal dimension or norm of an embedding required to {\em approximate}, rather then exactly represent, a given hypothesis class. We show that such notions are not only sufficient for learning using linear predictors or a kernel, but unlike the exact variants, are also necessary. Thus they are better suited for discussing limitations of linear or kernel methods.  more » « less
Award ID(s):
1764032
NSF-PAR ID:
10167331
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of Conference on Learning Theory (COLT 2020)
Volume:
125
Page Range / eLocation ID:
2236-2262
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we introduce a new, spectral notion of approximation between directed graphs, which we call singular value (SV) approximation. SV-approximation is stronger than previous notions of spectral approximation considered in the literature, including spectral approximation of Laplacians for undirected graphs [ST04], standard approximation for directed graphs [CKP + 17], and unit-circle (UC) approximation for directed graphs [AKM + 20]. Further, SV approximation enjoys several useful properties not possessed by previous notions of approximation, e.g., it is preserved under products of randomwalk matrices and bounded matrices. We provide a nearly linear-time algorithm for SV-sparsifying (and hence UC-sparsifying) Eulerian directed graphs, as well as ℓ-step random walks on such graphs, for any ℓ≤poly(n). Combined with the Eulerian scaling algorithms of [CKK + 18], given an arbitrary (not necessarily Eulerian) directed graph and a set S of vertices, we can approximate the stationary probability mass of the (S,Sc) cut in an ℓ-step random walk to within a multiplicative error of 1/polylog(n) and an additive error of 1/poly(n) in nearly linear time. As a starting point for these results, we provide a simple black-box reduction from SV-sparsifying Eulerian directed graphs to SV-sparsifying undirected graphs; such a directed-to-undirected reduction was not known for previous notions of spectral approximation. 
    more » « less
  2. null (Ed.)
    Characterizing topological properties and anomalous behaviors of higher-dimensional topological spaces via notions of curvatures is by now quite common in mainstream physics and mathematics, and it is therefore natural to try to extend these notions from the non-network domains in a suitable way to the network science domain. In this article we discuss one such extension, namely Ollivier’s discretization of Ricci curvature. We first motivate, define and illustrate the Ollivier–Ricci Curvature. In the next section we provide some “not-previously-published” bounds on the exact and approximate computation of the curvature measure. In the penultimate section we review a method based on the linear sketching technique for efficient approximate computation of the Ollivier–Ricci network curvature. Finally in the last section we provide concluding remarks with pointers for further reading. 
    more » « less
  3. Kernel methods provide an elegant framework for developing nonlinear learning algorithms from simple linear methods. Though these methods have superior empirical performance in several real data applications, their usefulness is inhibited by the significant computational burden incurred in large sample situations. Various approximation schemes have been proposed in the literature to alleviate these computational issues, and the approximate kernel machines are shown to retain the empirical performance. However, the theoretical properties of these approximate kernel machines are less well understood. In this work, we theoretically study the trade-off between computational complexity and statistical accuracy in Nystrom approximate kernel principal component analysis (KPCA), wherein we show that the Nystrom approximate KPCA matches the statistical performance of (non-approximate) KPCA while remaining computationally beneficial. Additionally, we show that Nystrom approximate KPCA outperforms the statistical behavior of another popular approximation scheme, the random feature approximation, when applied to KPCA. 
    more » « less
  4. null (Ed.)
    We study fast algorithms for computing basic properties of an n x n positive semidefinite kernel matrix K corresponding to n points x_1,...,x_n in R^d. In particular, we consider the estimating the sum of kernel matrix entries, along with its top eigenvalue and eigenvector. These are some of the most basic problems defined over kernel matrices. We show that the sum of matrix entries can be estimated up to a multiplicative factor of 1+epsilon in time sublinear in n and linear in d for many popular kernel functions, including the Gaussian, exponential, and rational quadratic kernels. For these kernels, we also show that the top eigenvalue (and a witnessing approximate eigenvector) can be approximated to a multiplicative factor of 1+epsilon in time sub-quadratic in n and linear in d. Our algorithms represent significant advances in the best known runtimes for these problems. They leverage the positive definiteness of the kernel matrix, along with a recent line of work on efficient kernel density estimation. 
    more » « less
  5. We study fast algorithms for computing fundamental properties of a positive semidefinite kernel matrix K∈ R^{n*n} corresponding to n points x1,…,xn∈R^d. In particular, we consider estimating the sum of kernel matrix entries, along with its top eigenvalue and eigenvector. We show that the sum of matrix entries can be estimated to 1+ϵ relative error in time sublinear in n and linear in d for many popular kernels, including the Gaussian, exponential, and rational quadratic kernels. For these kernels, we also show that the top eigenvalue (and an approximate eigenvector) can be approximated to 1+ϵ relative error in time subquadratic in n and linear in d. Our algorithms represent significant advances in the best known runtimes for these problems. They leverage the positive definiteness of the kernel matrix, along with a recent line of work on efficient kernel density estimation. 
    more » « less