A Comparison of End-to-End Decision Forest Inference Pipelines

Guan, Hong; Masood, Saif; Dwarampudi, Mahidhar; Gunda, Venkatesh; Min, Hong; Yu, Lei; Nag, Soham; Zou, Jia

doi:10.1145/3620678.3624656

Citation Details

A Comparison of End-to-End Decision Forest Inference Pipelines

Decision forest, including RandomForest, XGBoost, and Light-GBM, dominates the machine learning tasks over tabular data. Recently, several frameworks were developed for decision forest inference, such as ONNX, TreeLite from Amazon, TensorFlow Decision Forest from Google, HummingBirdfrom Microsoft, Nvidia FIL, and lleaves. While these frameworks are fully optimized for inference computations, they are all decoupled with databases and general data management frameworks, which leads to cross-system performance overheads. We first provided a DICT model to understand the performance gaps between decoupled and in-database inference. We further identified that for in-database inference, in addition to the popular UDF-centric representation that encapsulates the ML into one User Defined Function(UDF), there also exists a relation-centric representation that breaks down the decision forest inference into several fine-grained SQL operations. The relation-centric representation can achieve significantly better performance for large models. We optimized both implementations and conducted a comprehensive benchmark to compare these two implementations to the aforementioned decoupled inference pipelines and existing in-database inference pipelines such as Spark-SQL and PostgresML. The evaluation results validated the DICT model and demonstrated the superior performance of our in-database inference design compared to the baselines. more »

Award ID(s):: 2144923

PAR ID:: 10514881

Author(s) / Creator(s):: Guan, Hong; Masood, Saif; Dwarampudi, Mahidhar; Gunda, Venkatesh; Min, Hong; Yu, Lei; Nag, Soham; Zou, Jia

Publisher / Repository:: ACM

Date Published:: 2023-10-30

Journal Name:: Proceedings of 2023 ACM Symposium on Cloud Computing (SoCC'23)

ISBN:: 9798400703874

Page Range / eLocation ID:: 200 to 215

Format(s):: Medium: X

Location:: Santa Cruz CA USA

Sponsoring Org:: National Science Foundation

Conference Paper:
https://doi.org/10.1145/3620678.3624656

More Like this