A Sketch-based Index for Correlated Dataset Search

Santos, Aecio; Bessa, Aline; Musco, Christopher; Freire, Juliana

doi:10.1109/ICDE53745.2022.00264

Citation Details

A Sketch-based Index for Correlated Dataset Search

Dataset search is emerging as a critical capability in both research and industry: it has spurred many novel applications, ranging from the enrichment of analyses of real-world phenomena to the improvement of machine learning models. Recent research in this field has explored a new class of data-driven queries: queries consist of datasets and retrieve, from a large collection, related datasets. In this paper, we study a specific type of data-driven query that supports relational data augmentation through numerical data relationships: given an input query table, find the top-k tables that are both joinable with it and contain columns that are correlated with a column in the query. We propose a novel hashing scheme that allows the construction of a sketch-based index to support efficient correlated table search. We show that our proposed approach is effective and efficient, and achieves better trade-offs that significantly improve both the ranking accuracy and recall compared to the state-of-the-art solutions. more »

Award ID(s):: 2106888

PAR ID:: 10353475

Author(s) / Creator(s):: Santos, Aecio; Bessa, Aline; Musco, Christopher; Freire, Juliana

Date Published:: 2022-05-01

Journal Name:: 2022 IEEE 38th International Conference on Data Engineering (ICDE)

Page Range / eLocation ID:: 2928 to 2941

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ICDE53745.2022.00264

More Like this