Sampling Methods for Inner Product Sketching

Daliri, Majid; Freire, Juliana; Musco, Christopher; Santos, Aécio; Zhang, Haoxiang

doi:10.14778/3665844.3665850

Citation Details

Sampling Methods for Inner Product Sketching

Recently, Bessa et al. (PODS 2023) showed that sketches based on coordinated weighted sampling theoretically and empirically outperform popular linear sketching methods like Johnson-Lindentrauss projection and CountSketch for the ubiquitous problem of inner product estimation. We further develop this finding by introducing and analyzing two alternative sampling-based methods. In contrast to the computationally expensive algorithm in Bessa et al., our methods run in linear time (to compute the sketch) and perform better in practice, significantly beating linear sketching on a variety of tasks. For example, they provide state-of-the-art results for estimating the correlation between columns in unjoined tables, a problem that we show how to reduce to inner product estimation in a black-box way. While based on known sampling techniques (threshold and priority sampling) we introduce significant new theoretical analysis to prove approximation guarantees for our methods. more »

Award ID(s):: 2106888

PAR ID:: 10540021

Author(s) / Creator(s):: Daliri, Majid; Freire, Juliana; Musco, Christopher; Santos, Aécio; Zhang, Haoxiang

Publisher / Repository:: VLDB Endowment

Date Published:: 2024-05-01

Journal Name:: Proceedings of the VLDB Endowment

Volume:: 17

Issue:: 9

ISSN:: 2150-8097

Page Range / eLocation ID:: 2185 to 2197

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.14778/3665844.3665850

More Like this