skip to main content

Search for: All records

Creators/Authors contains: "Indyk, Piotr"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available January 1, 2023
  2. Free, publicly-accessible full text available January 1, 2023
  3. We study fast algorithms for computing fundamental properties of a positive semidefinite kernel matrix K∈ R^{n*n} corresponding to n points x1,…,xn∈R^d. In particular, we consider estimating the sum of kernel matrix entries, along with its top eigenvalue and eigenvector. We show that the sum of matrix entries can be estimated to 1+ϵ relative error in time sublinear in n and linear in d for many popular kernels, including the Gaussian, exponential, and rational quadratic kernels. For these kernels, we also show that the top eigenvalue (and an approximate eigenvector) can be approximated to 1+ϵ relative error in time subquadratic in n and linear in d. Our algorithms represent significant advances in the best known runtimes for these problems. They leverage the positive definiteness of the kernel matrix, along with a recent line of work on efficient kernel density estimation.
  4. We consider the problem of estimating the number of distinct elements in a large data set (or, equivalently, the support size of the distribution induced by the data set) from a random sample of its elements. The problem occurs in many applications, including biology, genomics, computer systems and linguistics. A line of research spanning the last decade resulted in algorithms that estimate the support up to ±εn from a sample of size O(log2(1/ε)·n/logn), where n is the data set size. Unfortunately, this bound is known to be tight, limiting further improvements to the complexity of this problem. In this paper we consider estimation algorithms augmented with a machine-learning-based predictor that, given any element, returns an estimation of its frequency. We show that if the predictor is correct up to a constant approximation factor, then the sample complexity can be reduced significantly, to log(1/ε)·n1−Θ(1/log(1/ε)).We evaluate the proposed algorithms on a collection of data sets, using the neural-network based estimators from Hsu et al, ICLR’19 as predictors. Our experiments demonstrate substantial (up to 3x) improvements in the estimation accuracy com-pared to the state of the art algorithm.