NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions.

Indyk, Piotr; Kapralov, Michael; Sheth, Kshiteej; Wagner, Tal (May 2025, ICLR)

Free, publicly-accessible full text available May 1, 2026
Learned Interpolation for Better Streaming Quantile Approximation with Worst-Case Guarantees

https://doi.org/10.1137/1.9781611977714.8

Schiefer, Nicholas; Chen, Justin Y; Indyk, Piotr; Narayanan, Shyam; Silwal, Sandeep; Wagner, Tal (January 2023, SIAM Conference on Applied and Computational Discrete Algorithms)

An ε-approximate quantile sketch over a stream of n inputs approximates the rank of any query point q—that is, the number of input points less than q—up to an additive error of εn, generally with some probability of at least 1−1/ poly(n), while consuming o(n) space. While the celebrated KLL sketch of Karnin, Lang, and Liberty achieves a provably optimal quantile approximation algorithm over worst-case streams, the approximations it achieves in practice are often far from optimal. Indeed, the most commonly used technique in practice is Dunning’s t-digest, which often achieves much better approximations than KLL on realworld data but is known to have arbitrarily large errors in the worst case. We apply interpolation techniques to the streaming quantiles problem to attempt to achieve better approximations on real-world data sets than KLL while maintaining similar guarantees in the worst case.
more » « less
Full Text Available
Generalization Bounds for Data-Driven Numerical Linear Algebra

Bartlett, Peter L.; Indyk, Piotr; Wagner, Tal (January 2022, Proceedings of the 35th Conference on Learning Theory (COLT2022))

Full Text Available
Exponentially Improving the Complexity of Simulating the Weisfeiler-Lehman Test with Graph Neural Networks

Aamand, Anders; Chen, Justin Y; Indyk, Piotr; Narayanan, Shyam; Rubinfeld, Ronitt; Schiefer, Nicholas; Silwal, Sandeep; Wagner, Tal (January 2022, Conference on Neural Information Processing Systems)

Recent work shows that the expressive power of Graph Neural Networks (GNNs) in distinguishing non-isomorphic graphs is exactly the same as that of the Weisfeiler-Lehman (WL) graph test. In particular, they show that the WL test can be simulated by GNNs. However, those simulations involve neural networks for the “combine” function of size polynomial or even exponential in the number of graph nodes n, as well as feature vectors of length linear in n. We present an improved simulation of the WL test on GNNs with exponentially lower complexity. In particular, the neural network implementing the combine function in each node has only polylog(n) parameters, and the feature vectors exchanged by the nodes of GNN consists of only O(log n) bits. We also give logarithmic lower bounds for the feature vector length and the size of the neural networks, showing the (near)-optimality of our construction.
more » « less
Full Text Available
Few-Shot Data-Driven Algorithms for Low Rank Approximation

Indyk, Piotr; Wagner, Tal; Woodruff, David P. (January 2021, Proceedings of Machine Learning Research)

Full Text Available
Triangle and Four Cycle Counting with Predictions in Graph Streams

Chen, Justin Y; Eden, Talya; Indyk, Piotr; Lin, Honghao; Narayanan, Shyam; Rubinfeld, Ronitt; Silwal, Sandeep; Wagner, Tal; Woodruff, David P.; Zhang, Michael (January 2022, International Conference on Learning Representations)

Full Text Available
Faster Kernel Matrix Algebra via Density Estimation.

Backurs, Arturs; Indyk, Piotr; Musco, Cameron; Wagner, Tal. (January 2021, International Conference on Machine Learning (ICML))

We study fast algorithms for computing fundamental properties of a positive semidefinite kernel matrix K∈ R^{n*n} corresponding to n points x1,…,xn∈R^d. In particular, we consider estimating the sum of kernel matrix entries, along with its top eigenvalue and eigenvector. We show that the sum of matrix entries can be estimated to 1+ϵ relative error in time sublinear in n and linear in d for many popular kernels, including the Gaussian, exponential, and rational quadratic kernels. For these kernels, we also show that the top eigenvalue (and an approximate eigenvector) can be approximated to 1+ϵ relative error in time subquadratic in n and linear in d. Our algorithms represent significant advances in the best known runtimes for these problems. They leverage the positive definiteness of the kernel matrix, along with a recent line of work on efficient kernel density estimation.
more » « less
Full Text Available
Learning-based Support Estimation in Sublinear Time

Eden, Talya; Indyk, Piotr; Narayanan, Shyam; Rubinfeld, Ronitt; Silwal, Sandeep; Wagner, Tal (January 2021, International Conference on Learning Representations)

We consider the problem of estimating the number of distinct elements in a large data set (or, equivalently, the support size of the distribution induced by the data set) from a random sample of its elements. The problem occurs in many applications, including biology, genomics, computer systems and linguistics. A line of research spanning the last decade resulted in algorithms that estimate the support up to ±εn from a sample of size O(log2(1/ε)·n/logn), where n is the data set size. Unfortunately, this bound is known to be tight, limiting further improvements to the complexity of this problem. In this paper we consider estimation algorithms augmented with a machine-learning-based predictor that, given any element, returns an estimation of its frequency. We show that if the predictor is correct up to a constant approximation factor, then the sample complexity can be reduced significantly, to log(1/ε)·n1−Θ(1/log(1/ε)).We evaluate the proposed algorithms on a collection of data sets, using the neural-network based estimators from Hsu et al, ICLR’19 as predictors. Our experiments demonstrate substantial (up to 3x) improvements in the estimation accuracy com-pared to the state of the art algorithm.
more » « less
Full Text Available
Learning Space Partitions for Nearest Neighbor Search

Dong, Yihe; Indyk, Piotr; Razenshteyn, Ilya P; Wagner, Tal (May 2020, ICLR)

Full Text Available
Scalable nearest neighbor search for optimal transport

Backurs, Arturs; Dong, Yihe; Indyk, Piotr; Razenshteyn, Ilya; Wagner, Tal (June 2020, ICML)

Full Text Available

« Prev Next »

Search for: All records