NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Statistical-Computational Trade-offs for Density Estimation

Aamand, Anders; Andoni, Alexandr; Chen, Justin Y; Indyk, Piotr; Narayanan, Shyam; Silwal, Sandeep; Xu, Haike (December 2024, Advances in neural information processing systems)

Full Text Available
Differentially Private Approximate Near Neighbor Counting in High Dimensions

Andoni, Alexandr; Indyk, Piotr; Mahabadi, Sepideh; Narayanan, Shyam (December 2023, Neural Information Processing Systems)

Full Text Available
The Full Landscape of Robust Mean Testing: Sharp Separations between Oblivious and Adaptive Contamination

Canonne, Clément L.; Hopkins, Samuel B.; Li, Jerry; Liu, Allen; Narayanan, Shyam (November 2023, 64th Annual IEEE Symposium on Foundations of Computer Science (FOCS))

We consider the question of Gaussian mean testing, a fundamental task in high-dimensional distribution testing and signal processing, subject to adversarial corruptions of the samples. We focus on the relative power of different adversaries, and show that, in contrast to the common wisdom in robust statistics, there exists a strict separation between adaptive adversaries (strong contamination) and oblivious ones (weak contamination) for this task. Specifically, we resolve both the information-theoretic and computational landscapes for robust mean testing. In the exponential-time setting, we establish the tight sample complexity of testing N(0,I) against N(αv,I), where ∥v∥2=1, with an ε-fraction of adversarial corruptions, to be Θ~(max(d−−√α2,dε3α4,min(d2/3ε2/3α8/3,dεα2))), while the complexity against adaptive adversaries is Θ~(max(d−−√α2,dε2α4)), which is strictly worse for a large range of vanishing ε,α. To the best of our knowledge, ours is the first separation in sample complexity between the strong and weak contamination models. In the polynomial-time setting, we close a gap in the literature by providing a polynomial-time algorithm against adaptive adversaries achieving the above sample complexity Θ~(max(d−−√/α2,dε2/α4)), and a low-degree lower bound (which complements an existing reduction from planted clique) suggesting that all efficient algorithms require this many samples, even in the oblivious-adversary setting.
more » « less
Full Text Available
Robustness Implies Privacy in Statistical Estimation

Hopkins, Samuel B.; Kamath, Gautam; Majid, Mahbod; Narayanan, Shyam (June 2023, 55th Annual ACM Symposium on Theory of Computing (STOC))

We study the relationship between adversarial robustness and differential privacy in high-dimensional algorithmic statistics. We give the first black-box reduction from privacy to robustness which can produce private estimators with optimal tradeoffs among sample complexity, accuracy, and privacy for a wide range of fundamental high-dimensional parameter estimation problems, including mean and covariance estimation. We show that this reduction can be implemented in polynomial time in some important special cases. In particular, using nearly-optimal polynomial-time robust estimators for the mean and covariance of high-dimensional Gaussians which are based on the Sum-of-Squares method, we design the first polynomial-time private estimators for these problems with nearly-optimal samples-accuracy-privacy tradeoffs. Our algorithms are also robust to a constant fraction of adversarially-corrupted samples.
more » « less
Full Text Available
Sampling an Edge in Sublinear Time Exactly and Optimally

https://doi.org/10.1137/1.9781611977585.ch23

Eden, Talya; Narayanan, Shyam; Tetek, Jakub (January 2023, Symposium on Simplicity in Algorithms)

Sampling edges from a graph in sublinear time is a fundamental problem and a powerful subroutine for designing sublinear-time algorithms. Suppose we have access to the vertices of the graph and know a constant-factor approximation to the number of edges. An algorithm for pointwise ε-approximate edge sampling with complexity has been given by Eden and Rosenbaum [SOSA 2018]. This has been later improved by Tetek and Thorup [STOC 2022] to . At the same time, time is necessary. We close the problem, by giving an algorithm with complexity for the task of sampling an edge exactly uniformly.
more » « less
Full Text Available
Data Structures for Density Estimation

Aamand, Anders; Andoni, Alexandr; Chen, Justin; Indyk, Piotr; Narayanan, Shyam; Silwal, Sandeep (January 2023, International Conference on Machine Learning)

We study statistical/computational tradeoffs for the following density estimation problem: given kdistributionsv1,...,vk overadiscretedomain of size n, and sampling access to a distribution p, identify vi that is “close” to p. Our main result is the first data structure that, given a sublinear (in n) number of samples from p, identifies vi in time sublinear in k. We also give an improved version of the algorithm of (Acharya et al., 2018) that reports vi in time linear in k. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.
more » « less
Full Text Available
Data Structures for Density Estimation

Aamand, Anders; Andoni, Alexandr; Chen, Justin Y.; Indyk, Piotr; Narayanan, Shyam; Silwal, Sandeep (January 2023, International Conference on Machine Learning, {ICML} 2023)

Full Text Available
Learned Interpolation for Better Streaming Quantile Approximation with Worst-Case Guarantees

https://doi.org/10.1137/1.9781611977714.8

Schiefer, Nicholas; Chen, Justin Y; Indyk, Piotr; Narayanan, Shyam; Silwal, Sandeep; Wagner, Tal (January 2023, SIAM Conference on Applied and Computational Discrete Algorithms)

An ε-approximate quantile sketch over a stream of n inputs approximates the rank of any query point q—that is, the number of input points less than q—up to an additive error of εn, generally with some probability of at least 1−1/ poly(n), while consuming o(n) space. While the celebrated KLL sketch of Karnin, Lang, and Liberty achieves a provably optimal quantile approximation algorithm over worst-case streams, the approximations it achieves in practice are often far from optimal. Indeed, the most commonly used technique in practice is Dunning’s t-digest, which often achieves much better approximations than KLL on realworld data but is known to have arbitrarily large errors in the worst case. We apply interpolation techniques to the streaming quantiles problem to attempt to achieve better approximations on real-world data sets than KLL while maintaining similar guarantees in the worst case.
more » « less
Full Text Available
Frequency Estimation with One-Sided Error

https://doi.org/10.1137/1.9781611977073.31

Indyk, Piotr; Narayanan, Shyam; Woodruff, David P. (January 2022, Proceedings of the Annual ACMSIAM Symposium on Discrete Algorithms)

Full Text Available
Exponentially Improving the Complexity of Simulating the Weisfeiler-Lehman Test with Graph Neural Networks

Aamand, Anders; Chen, Justin Y; Indyk, Piotr; Narayanan, Shyam; Rubinfeld, Ronitt; Schiefer, Nicholas; Silwal, Sandeep; Wagner, Tal (January 2022, Conference on Neural Information Processing Systems)

Recent work shows that the expressive power of Graph Neural Networks (GNNs) in distinguishing non-isomorphic graphs is exactly the same as that of the Weisfeiler-Lehman (WL) graph test. In particular, they show that the WL test can be simulated by GNNs. However, those simulations involve neural networks for the “combine” function of size polynomial or even exponential in the number of graph nodes n, as well as feature vectors of length linear in n. We present an improved simulation of the WL test on GNNs with exponentially lower complexity. In particular, the neural network implementing the combine function in each node has only polylog(n) parameters, and the feature vectors exchanged by the nodes of GNN consists of only O(log n) bits. We also give logarithmic lower bounds for the feature vector length and the size of the neural networks, showing the (near)-optimality of our construction.
more » « less
Full Text Available

« Prev Next »

Search for: All records