SageDB: An Instance-Optimized Data Analytics System

Ding, Jialin; Marcus, Ryan; Kipf, Andreas; Nathan, Vikram; Nrusimha, Aniruddha; Vaidya, Kapil; van Renen, Alexander; Kraska, Tim

doi:10.14778/3565838.3565857

Modern data systems are typically both complex and general-purpose. They are complex because of the numerous internal knobs and parameters that users need to manually tune in order to achieve good performance; they are general-purpose because they are designed to handle diverse use cases, and therefore often do not achieve the best possible performance for any specific use case. A recent trend aims to tackle these pitfalls: instance-optimized systems are designed to automatically self-adjust in order to achieve the best performance for a specific use case, i.e., a dataset and query workload. Thus far, the research community has focused on creating instance-optimized database components, such as learned indexes and learned cardinality estimators, which are evaluated in isolation. However, to the best of our knowledge, there is no complete data system built with instance-optimization as a foundational design principle. In this paper, we present a progress report on SageDB, our effort towards building the first instance-optimized data system. SageDB synthesizes various instance-optimization techniques to automatically specialize for a given use case, while simultaneously exposing a simple user interface that places minimal technical burden on the user. Our prototype outperforms a commercial cloud-based analytics system by up to 3X on end-to-end query workloads and up to 250X on individual queries. SageDB is an ongoing research effort, and we highlight our lessons learned and key directions for future work.

More Like this