ZIP: Lazy Imputation during Query Processing

Lin, Yiming; Mehrotra, Sharad

doi:10.14778/3617838.3617841

Citation Details

ZIP: Lazy Imputation during Query Processing

This paper develops a query-time missing value imputation framework, entitled ZIP, that modifies relational operators to be imputation aware in order to minimize the joint cost of imputing and query processing. The modified operators use a cost-based decision function to determine whether to invoke imputation or to defer to downstream operators to resolve missing values. The modified query processing logic ensures results with deferred imputations are identical to those produced if all missing values were imputed first. ZIP includes a novel outer-join based approach to preserve missing values during execution, and a bloom filter based index to optimize the space and running overhead. Extensive experiments on both real and synthetic data sets demonstrate 10 to 25 times improvement when augmenting the state-of-the-art technology, ImputeDB, with ZIP-based deferred imputation. ZIP also outperforms the offline approach by up to 19607 times in a real data set. more »

Award ID(s):: 2008993

PAR ID:: 10562504

Author(s) / Creator(s):: Lin, Yiming; Mehrotra, Sharad

Publisher / Repository:: ACM

Date Published:: 2023-09-01

Journal Name:: Proceedings of the VLDB Endowment

Volume:: 17

Issue:: 1

ISSN:: 2150-8097

Page Range / eLocation ID:: 28 to 40

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Journal Article:
https://doi.org/10.14778/3617838.3617841

More Like this