Towards a Learned Cost Model for Distributed Spatial Join: Data, Code & Models

Vu, Tin; Belussi, Alberto; Migliorini, Sara; Eldawy, Ahmed

doi:10.1145/3511808.3557712

Citation Details

Towards a Learned Cost Model for Distributed Spatial Join: Data, Code & Models

Geospatial data comprise around 60% of all the publicly available data. One of the essential and most complex operations that brings together multiple geospatial datasets is the spatial join operation. Due to its complexity, there is a lot of partitioning techniques and parallel algorithms for the spatial join problem. This leads to a complex query optimization problem: which algorithm to use for a given pair of input datasets that we want to join? With the rise of machine learning, there is a promise in addressing this problem with the use of various learned models. However, one of the concerns is the lack of standard and publicly available data to train and test on, as well as the lack of accessible baseline models. This resource paper helps the research community solve this problem by providing synthetic and real datasets for spatial join, source code for constructing more datasets, and several baseline solutions that researchers can further extend and compare to. more »

Award ID(s):: 1924694 1838222 2046236

NSF-PAR ID:: 10469096

Author(s) / Creator(s):: Vu, Tin; Belussi, Alberto; Migliorini, Sara; Eldawy, Ahmed

Publisher / Repository:: ACM

Date Published:: 2022-10-17

Page Range / eLocation ID:: 4550 to 4554

Format(s):: Medium: X

Location:: Atlanta GA USA

Sponsoring Org:: National Science Foundation

Conference Paper:
https://doi.org/10.1145/3511808.3557712

More Like this