This content will become publicly available on July 4, 2024
 Award ID(s):
 2003720
 NSFPAR ID:
 10465994
 Date Published:
 Journal Name:
 Numerical Linear Algebra with Applications
 ISSN:
 10705325
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is efficiency – given n input points, most kernelbased algorithms need to materialize the full n × n kernel matrix before performing any subsequent computation, thus incurring Ω(n^2) runtime. Breaking this quadratic barrier for various problems has therefore, been a subject of extensive research efforts. We break the quadratic barrier and obtain subquadratic time algorithms for several fundamental linearalgebraic and graph processing primitives, including approximating the top eigenvalue and eigenvector, spectral sparsification, solving lin ear systems, local clustering, lowrank approximation, arboricity estimation and counting weighted triangles. We build on the recently developed Kernel Density Estimation framework, which (after preprocessing in time subquadratic in n) can return estimates of row/column sums of the kernel matrix. In particular, we de velop efficient reductions from weighted vertex and weighted edge sampling on kernel graphs, simulating random walks on kernel graphs, and importance sampling on matrices to Kernel Density Estimation and show that we can generate samples from these distributions in sublinear (in the support of the distribution) time. Our reductions are the central ingredient in each of our applications and we believe they may be of independent interest. We empirically demonstrate the efficacy of our algorithms on lowrank approximation (LRA) and spectral sparsi fication, where we observe a 9x decrease in the number of kernel evaluations over baselines for LRA and a 41x reduction in the graph size for spectral sparsification.more » « less

Kernel methods offer the flexibility to learn complex relationships in modern, large data sets while enjoying strong theoretical guarantees on quality. Unfortunately, these methods typically require cubic running time in the data set size, a prohibitive cost in the largedata setting. Random feature maps (RFMs) and the Nyström method both consider lowrank approximations to the kernel matrix as a potential solution. But, in order to achieve desirable theoretical guarantees, the former may require a prohibitively large number of features J+, and the latter may be prohibitively expensive for highdimensional problems. We propose to combine the simplicity and generality of RFMs with a datadependent feature selection scheme to achieve desirable theoretical approximation properties of Nyström with just O(\log J+) features. Our key insight is to begin with a large set of random features, then reduce them to a small number of weighted features in a datadependent, computationally efficient way, while preserving the statistical guarantees of using the original large set of features. We demonstrate the efficacy of our method with theory and experimentsincluding on a data set with over 50 million observations. In particular, we show that our method achieves small kernel matrix approximation error and better test set accuracy with provably fewer random features than stateoftheart methods.more » « less

Summary This paper presents an efficient method to perform structured matrix approximation by separation and hierarchy (SMASH), when the original dense matrix is associated with a kernel function. Given the points in a domain, a tree structure is first constructed based on an adaptive partition of the computational domain to facilitate subsequent approximation procedures. In contrast to existing schemes based on either analytic or purely algebraic approximations, SMASH takes advantage of both approaches and greatly improves efficiency. The algorithm follows a bottom‐up traversal of the tree and is able to perform the operations associated with each node on the same level in parallel. A strong rank‐revealing factorization is applied to the initial analytic approximation in the
separation regime so that a special structure is incorporated into the final nested bases. As a consequence, the storage is significantly reduced on one hand and a hierarchy of the original grid is constructed on the other hand. Due to this hierarchy, nested bases at upper levels can be computed in a similar way as the leaf level operations but on coarser grids. The main advantages of SMASH include its simplicity of implementation, its flexibility to construct various hierarchical rank structures, and a low storage cost. The efficiency and robustness of SMASH are demonstrated through various test problems arising from integral equations, structured matrices, etc. 
Abstract Kernelized Gram matrix $W$ constructed from data points $\{x_i\}_{i=1}^N$ as $W_{ij}= k_0( \frac{ \ x_i  x_j \^2} {\sigma ^2} ) $ is widely used in graphbased geometric data analysis and unsupervised learning. An important question is how to choose the kernel bandwidth $\sigma $, and a common practice called selftuned kernel adaptively sets a $\sigma _i$ at each point $x_i$ by the $k$nearest neighbor (kNN) distance. When $x_i$s are sampled from a $d$dimensional manifold embedded in a possibly highdimensional space, unlike with fixedbandwidth kernels, theoretical results of graph Laplacian convergence with selftuned kernels have been incomplete. This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted)Laplacian for a new family of kNN selftuned kernels $W^{(\alpha )}_{ij} = k_0( \frac{ \ x_i  x_j \^2}{ \epsilon \hat{\rho }(x_i) \hat{\rho }(x_j)})/\hat{\rho }(x_i)^\alpha \hat{\rho }(x_j)^\alpha $, where $\hat{\rho }$ is the estimated bandwidth function by kNN and the limiting operator is also parametrized by $\alpha $. When $\alpha = 1$, the limiting operator is the weighted manifold Laplacian $\varDelta _p$. Specifically, we prove the pointwise convergence of $L_N f $ and convergence of the graph Dirichlet form with rates. Our analysis is based on first establishing a $C^0$ consistency for $\hat{\rho }$ which bounds the relative estimation error $\hat{\rho }  \bar{\rho }/\bar{\rho }$ uniformly with high probability, where $\bar{\rho } = p^{1/d}$ and $p$ is the data density function. Our theoretical results reveal the advantage of the selftuned kernel over the fixedbandwidth kernel via smaller variance error in lowdensity regions. In the algorithm, no prior knowledge of $d$ or data density is needed. The theoretical results are supported by numerical experiments on simulated data and handwritten digit image data.more » « less

Abstract High‐resolution characterization of hydraulic properties and dense nonaqueous phase liquid (DNAPL) contaminant source is crucial to develop efficient remediation strategies. However, DNAPL characterization suffers from a limited number of borehole data in the field, resulting in a low‐resolution estimation. Moreover, high‐resolution DNAPL characterization requires a large number of unknowns to be estimated, presenting a computational bottleneck. In this paper, a low‐cost geophysical approach, the self‐potential method, is used as additional information for hydraulic properties characterization. Joint inversion of hydraulic head and self‐potential measurements is proposed to improve hydraulic conductivity estimation, which is then used to characterize the DNAPL saturation distribution by inverting partitioning tracer measurements. The computational barrier is overcome by (a) solving the inversion by the principal component geostatistical approach, in which the covariance matrix is replaced by a low‐rank approximation, thus reducing the number of forward model runs; (b) using temporal moments of concentrations instead of individual concentration data points for faster forward simulations. To assess the ability of the proposed approach, numerical experiments are conducted in a 3‐D aquifer with 10^{4}unknown hydraulic conductivities and DNAPL saturations. Results show that with realistic DNAPL sources and a limited number of hydraulic heads, the traditional hydraulic/partitioning tracer tomography roughly reconstructs subsurface heterogeneity but fails to resolve the DNAPL distribution. By adding self‐potential data, the error is reduced by 24% in hydraulic conductivity estimation and 68% in DNAPL saturation characterization. The proposed sequential inversion framework utilizes the complementary information from multi‐source hydrogeophysical data sets, and can provide high‐resolution characterizations for realistic DNAPL sources.