We study statistical/computational tradeoffs for the following density estimation problem: given kdistributionsv1,...,vk overadiscretedomain of size n, and sampling access to a distribution p, identify vi that is “close” to p. Our main result is the first data structure that, given a sublinear (in n) number of samples from p, identifies vi in time sublinear in k. We also give an improved version of the algorithm of (Acharya et al., 2018) that reports vi in time linear in k. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.
more »
« less
Faster Sublinear Algorithms using Conditional Sampling
A conditional sampling oracle for a probability distribution D returns samples from the conditional distribution of D restricted to a specified subset of the domain. A recent line of work (Chakraborty et al. 2013 and Cannone et al. 2014) has shown that having access to such a conditional sampling oracle requires only polylogarithmic or even constant number of samples to solve distribution testing problems like identity and uniformity. This significantly improves over the standard sampling model where polynomially many samples are necessary.
Inspired by these results, we introduce a computational model based on conditional sampling to develop sublinear algorithms with exponentially faster runtimes compared to standard sublinear algorithms. We focus on geometric optimization problems over points in high dimensional Euclidean space. Access to these points is provided via a conditional sampling oracle that takes as input a succinct representation of a subset of the domain and outputs a uniformly random point in that subset. We study two well studied problems: kmeans clustering and estimating the weight of the minimum spanning tree. In contrast to prior algorithms for the classic model, our algorithms have time, space and sample complexity that is polynomial in the dimension and polylogarithmic in the number of points.
Finally, we comment on the applicability of the model and compare with existing ones like streaming, parallel and distributed computational models.
more »
« less
 Award ID(s):
 1650733
 NSFPAR ID:
 10026356
 Date Published:
 Journal Name:
 Proceedings of the annual ACMSIAM Symposium on Discrete Algorithms
 ISSN:
 10719040
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We investigate quantum algorithms for classification, a fundamental problem in machine learning, with provable guarantees. Given n ddimensional data points, the stateoftheart (and optimal) classical algorithm for training classifiers with constant margin by Clarkson et al. runs in Õ (n+d), which is also optimal in its input/output model. We design sublinear quantum algorithms for the same task running in Õ (\sqrt{n}+\sqrt{d}), a quadratic improvement in both n and d. Moreover, our algorithms use the standard quantization of the classical input and generate the same classical output, suggesting minimal overheads when used as subroutines for endtoend applications. We also demonstrate a tight lower bound (up to polylog factors) and discuss the possibility of implementation on nearterm quantum machines.more » « less

We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a dvariate normal (μ,Σ) means a samples is only revealed if it falls in some subset S⊆ℝd; otherwise the samples are hidden and their count in proportion to the revealed samples is also hidden. We show that the mean μ and covariance matrix Σ can be estimated with arbitrary accuracy in polynomialtime, as long as we have oracle access to S, and S has nontrivial measure under the unknown dvariate normal distribution. Additionally we show that without oracle access to S, any nontrivial estimation is impossible.more » « less

We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a dvariate normal (μ,Σ) means a samples is only revealed if it falls in some subset S⊆ℝd; otherwise the samples are hidden and their count in proportion to the revealed samples is also hidden. We show that the mean μ and covariance matrix Σ can be estimated with arbitrary accuracy in polynomialtime, as long as we have oracle access to S, and S has nontrivial measure under the unknown dvariate normal distribution. Additionally we show that without oracle access to S, any nontrivial estimation is impossible.more » « less

In the model of local computation algorithms (LCAs), we aim to compute the queried part of the output by examining only a small (sublinear) portion of the input. Many recently developed LCAs on graph problems achieve time and space complexities with very low dependence on n, the number of vertices. Nonetheless, these complexities are generally at least exponential in d, the upper bound on the degree of the input graph. Instead, we consider the case where parameter d can be moderately dependent on n, and aim for complexities with subexponential dependence on d, while maintaining polylogarithmic dependence on n. We present: a randomized LCA for computing maximal independent sets whose time and space complexities are quasipolynomial in d and polylogarithmic in n; for constant ε>0, a randomized LCA that provides a (1−ε)approximation to maximum matching with high probability, whose time and space complexities are polynomial in d and polylogarithmic in n.more » « less