Fantastic Tables and Where to Find Them: Table Search in Semantic Data Lakes. In EDBT 2025.

Christensen, Martin P; Leventidis, Aristotelis; Lissandrini, Matteo; Rocco, Laura Di; Miller, Renée J; Hose, Katja

doi:10.48786/edbt.2025.32

Citation Details

Fantastic Tables and Where to Find Them: Table Search in Semantic Data Lakes. In EDBT 2025.

In data lakes, one of the core challenges remains finding rele- vant tables. We introduce the notion of semantic data lakes, i.e., repositories where datasets are linked to concepts and entities described in a knowledge graph (KG). We formalize the problem of semantic table search, i.e., retrieving tables containing informa- tion semantically related to a given set of entities, and provide the first formal definition of semantic relatedness of a dataset to tuples of entities. Our solution offers the first general framework to compute the semantic relevance of the contents of a table w.r.t. entity tuples, as well as efficient algorithms (exploiting seman- tic signals, such as entity types and embeddings) to scale the semantic search to repositories with hundreds of thousands of distinct tables. Our extensive experiments on both real-world and synthetic benchmarks show that our approach is able to retrieve more relevant tables (up to 5.4 times higher recall) in comparison to existing methods while ensuring fast response times (up to 17 times faster with LSH). more »

Award ID(s):: 2325632 2107248

PAR ID:: 10614601

Author(s) / Creator(s):: Christensen, Martin P; Leventidis, Aristotelis; Lissandrini, Matteo; Rocco, Laura Di; Miller, Renée J; Hose, Katja

Editor(s):: EDBT

Publisher / Repository:: OpenProceedings.org

Date Published:: 2025-01-01

Subject(s) / Keyword(s):: Data Management Database Technology

Format(s):: Medium: X

Institution:: EDBT International Conference on Extending Data Base Techology

Sponsoring Org:: National Science Foundation

Dataset:
https://doi.org/10.48786/edbt.2025.32

More Like this