Randomized error removal for online spread estimation in data streaming

Wang, Haibo; Ma, Chaoyi; Odegbile, Olufemi O; Chen, Shigang; Peir, Jih-Kwon

doi:10.14778/3447689.3447707

Citation Details

Randomized error removal for online spread estimation in data streaming

Measuring flow spread in real time from large, high-rate data streams has numerous practical applications, where a data stream is modeled as a sequence of data items from different flows and the spread of a flow is the number of distinct items in the flow. Past decades have witnessed tremendous performance improvement for single-flow spread estimation. However, when dealing with numerous flows in a data stream, it remains a significant challenge to measure per-flow spread accurately while reducing memory footprint. The goal of this paper is to introduce new multi-flow spread estimation designs that incur much smaller processing overhead and query overhead than the state of the art, yet achieves significant accuracy improvement in spread estimation. We formally analyze the performance of these new designs. We implement them in both hardware and software, and use real-world data traces to evaluate their performance in comparison with the state of the art. The experimental results show that our best sketch significantly improves over the best existing work in terms of estimation accuracy, data item processing throughput, and online query throughput. more »

Award ID(s):: 1909077 1719222

PAR ID:: 10297512

Author(s) / Creator(s):: Wang, Haibo; Ma, Chaoyi; Odegbile, Olufemi O; Chen, Shigang; Peir, Jih-Kwon

Date Published:: 2021-02-01

Journal Name:: Proceedings of the VLDB Endowment

Volume:: 14

Issue:: 6

ISSN:: 2150-8097

Page Range / eLocation ID:: 1040 to 1052

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.14778/3447689.3447707

More Like this