Improved Frequency Estimation Algorithms with and without Predictions

Aamand, Anders; Chen, Justin Y.; Nguyen, Huy; Silwal, Sandeep; Vakilian, Ali

Citation Details

Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically bound the error of the estimated frequencies for any possible input. The work of Hsu et al.~(2019) introduced the idea of using machine learning to tailor sketching algorithms to the specific data distribution they are being run on. In particular, their learning-augmented frequency estimation algorithm uses a learned heavy-hitter oracle which predicts which elements will appear many times in the stream. We give a novel algorithm, which in some parameter regimes, already theoretically outperforms the learning based algorithm of Hsu et al. without the use of any predictions. Augmenting our algorithm with heavy-hitter predictions further reduces the error and improves upon the state of the art. Empirically, our algorithms achieve superior performance in all experiments compared to prior approaches. more »

Award ID(s):: 1750716

PAR ID:: 10494032

Author(s) / Creator(s):: Aamand, Anders; Chen, Justin Y.; Nguyen, Huy; Silwal, Sandeep; Vakilian, Ali

Publisher / Repository:: Curran Associates, Inc.

Date Published:: 2023-09-21

Journal Name:: Advances in Neural Information Processing Systems

Volume:: 36

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Conference Paper:
The DOI is not currently available.

More Like this