Distributed numerical and machine learning computations via two-phase execution of aggregated join trees

Jankov, Dimitrije; Yuan, Binhang; Luo, Shangyu; Jermaine, Chris

doi:10.14778/3450980.3450991

Citation Details

Distributed numerical and machine learning computations via two-phase execution of aggregated join trees

When numerical and machine learning (ML) computations are expressed relationally, classical query execution strategies (hash-based joins and aggregations) can do a poor job distributing the computation. In this paper, we propose a two-phase execution strategy for numerical computations that are expressed relationally, as aggregated join trees (that is, expressed as a series of relational joins followed by an aggregation). In a pilot run, lineage information is collected; this lineage is used to optimally plan the computation at the level of individual records. Then, the computation is actually executed. We show experimentally that a relational system making use of this two-phase strategy can be an excellent platform for distributed ML computations. more »

Award ID(s):: 1910803 2008240 1918651

PAR ID:: 10280662

Author(s) / Creator(s):: Jankov, Dimitrije; Yuan, Binhang; Luo, Shangyu; Jermaine, Chris

Date Published:: 2021-03-01

Journal Name:: Proceedings of the VLDB Endowment

Volume:: 14

Issue:: 7

ISSN:: 2150-8097

Page Range / eLocation ID:: 1228 to 1240

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.14778/3450980.3450991

More Like this