Democratizing Data Science through Interactive Curation of ML Pipelines

Shang, Zeyuan; Zgraggen, Emanuel; Buratti, Benedetto; Kossmann, Ferdinand; Eichmann, Philipp; Chung, Yeounoh; Binnig, Carsten; Upfal, Eli; Kraska, Tim

doi:10.1145/3299869.3319863

Citation Details

Democratizing Data Science through Interactive Curation of ML Pipelines

Statistical knowledge and domain expertise are key to extract actionable insights out of data, yet such skills rarely coexist together. In Machine Learning, high-quality results are only attainable via mindful data preprocessing, hyperparameter tuning and model selection. Domain experts are often overwhelmed by such complexity, de-facto inhibiting a wider adoption of ML techniques in other fields. Existing libraries that claim to solve this problem, still require well-trained practitioners. Those frameworks involve heavy data preparation steps and are often too slow for interactive feedback from the user, severely limiting the scope of such systems. In this paper we present Alpine Meadow, a first Interactive Automated Machine Learning tool. What makes our system unique is not only the focus on interactivity, but also the combined systemic and algorithmic design approach; on one hand we leverage ideas from query optimization, on the other we devise novel selection and pruning strategies combining cost-based Multi-Armed Bandits and Bayesian Optimization. We evaluate our system on over 300 datasets and compare against other AutoML tools, including the current NIPS winner, as well as expert solutions. Not only is Alpine Meadow able to significantly outperform the other AutoML systems while --- in contrast to the other systems --- providing interactive latencies, but also outperforms in 80% of the cases expert solutions over data sets we have never seen before. more »

Award ID(s):: 1813444

PAR ID:: 10183329

Author(s) / Creator(s):: Shang, Zeyuan; Zgraggen, Emanuel; Buratti, Benedetto; Kossmann, Ferdinand; Eichmann, Philipp; Chung, Yeounoh; Binnig, Carsten; Upfal, Eli; Kraska, Tim

Date Published:: 2019-01-01

Journal Name:: SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

Page Range / eLocation ID:: 1171 to 1188

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3299869.3319863

More Like this