NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Statistical-Computational Trade-offs for Density Estimation

Aamand, Anders; Andoni, Alexandr; Chen, Justin Y; Indyk, Piotr; Narayanan, Shyam; Silwal, Sandeep; Xu, Haike (December 2024, Advances in neural information processing systems)

Full Text Available
Improved Frequency Estimation Algorithms with and without Predictions

Aamand, Anders; Chen, Justin Y.; Nguyen, Huy; Silwal, Sandeep; Vakilian, Ali (September 2023, Advances in Neural Information Processing Systems)

Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically bound the error of the estimated frequencies for any possible input. The work of Hsu et al.~(2019) introduced the idea of using machine learning to tailor sketching algorithms to the specific data distribution they are being run on. In particular, their learning-augmented frequency estimation algorithm uses a learned heavy-hitter oracle which predicts which elements will appear many times in the stream. We give a novel algorithm, which in some parameter regimes, already theoretically outperforms the learning based algorithm of Hsu et al. without the use of any predictions. Augmenting our algorithm with heavy-hitter predictions further reduces the error and improves upon the state of the art. Empirically, our algorithms achieve superior performance in all experiments compared to prior approaches.
more » « less
Full Text Available
Data Structures for Density Estimation

Aamand, Anders; Andoni, Alexandr; Chen, Justin; Indyk, Piotr; Narayanan, Shyam; Silwal, Sandeep (January 2023, International Conference on Machine Learning)

We study statistical/computational tradeoffs for the following density estimation problem: given kdistributionsv1,...,vk overadiscretedomain of size n, and sampling access to a distribution p, identify vi that is “close” to p. Our main result is the first data structure that, given a sublinear (in n) number of samples from p, identifies vi in time sublinear in k. We also give an improved version of the algorithm of (Acharya et al., 2018) that reports vi in time linear in k. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.
more » « less
Full Text Available
Data Structures for Density Estimation

Aamand, Anders; Andoni, Alexandr; Chen, Justin Y.; Indyk, Piotr; Narayanan, Shyam; Silwal, Sandeep (January 2023, International Conference on Machine Learning, {ICML} 2023)

Full Text Available
(Optimal) Online Bipartite Matching with Degree Information

Aamand, Anders; Chen, Justin Y; Indyk, Piotr (January 2022, Conference on Neural Information Processing Systems)

We propose a model for online graph problems where algorithms are given access to an oracle that predicts (e.g., based on modeling assumptions or on past data) the degrees of nodes in the graph. Within this model, we study the classic problem of online bipartite matching, and a natural greedy matching algorithm called MinPredictedDegree, which uses predictions of the degrees of offline nodes. For the bipartite version of a stochastic graph model due to Chung, Lu, and Vu where the expected values of the offline degrees are known and used as predictions, we show that MinPredictedDegree stochastically dominates any other online algorithm, i.e., it is optimal for graphs drawn from this model. Since the “symmetric” version of the model, where all online nodes are identical, is a special case of the well-studied “known i.i.d. model”, it follows that the competitive ratio of MinPredictedDegree on such inputs is at least 0.7299. For the special case of graphs with power law degree distributions, we show that MinPredictedDegree frequently produces matchings almost as large as the true maximum matching on such graphs. We complement these results with an extensive empirical evaluation showing that MinPredictedDegree compares favorably to state-of-the-art online algorithms for online matching.
more » « less
Full Text Available
On sums of monotone random integer variables

https://doi.org/10.1214/22-ECP500

Aamand, Anders; Alon, Noga; Houen, Jakob Bæk; Thorup, Mikkel (January 2022, Electronic Communications in Probability)

Full Text Available
Exponentially Improving the Complexity of Simulating the Weisfeiler-Lehman Test with Graph Neural Networks

Aamand, Anders; Chen, Justin Y; Indyk, Piotr; Narayanan, Shyam; Rubinfeld, Ronitt; Schiefer, Nicholas; Silwal, Sandeep; Wagner, Tal (January 2022, Conference on Neural Information Processing Systems)

Recent work shows that the expressive power of Graph Neural Networks (GNNs) in distinguishing non-isomorphic graphs is exactly the same as that of the Weisfeiler-Lehman (WL) graph test. In particular, they show that the WL test can be simulated by GNNs. However, those simulations involve neural networks for the “combine” function of size polynomial or even exponential in the number of graph nodes n, as well as feature vectors of length linear in n. We present an improved simulation of the WL test on GNNs with exponentially lower complexity. In particular, the neural network implementing the combine function in each node has only polylog(n) parameters, and the feature vectors exchanged by the nodes of GNN consists of only O(log n) bits. We also give logarithmic lower bounds for the feature vector length and the size of the neural networks, showing the (near)-optimality of our construction.
more » « less
Full Text Available

Search for: All records