The Right Tool for the Job: Data-Centric Workflows in Vizier

Oliver Kennedy, Boris Glavic

Citation Details

Data scientists use a wide variety of systems with a wide variety of user interfaces such as spreadsheets and notebooks for their data exploration, discovery, preprocessing, and analysis tasks. While this wide selection of tools offers data scientists the freedom to pick the right tool for each task, each of these tools has limitations (e.g., the lack of reproducibility of notebooks), data needs to be translated between tool-specific formats, and common functionality such as versioning, provenance, and dealing with data errors often has to be implemented for each system. We argue that rather than alternating between task-specific tools, a superior approach is to build multiple user-interfaces on top of a single incremental workflow / dataflow platform with built-in support for versioning, provenance, error & tracking, and data cleaning. We discuss Vizier, a notebook system that implements this approach, introduce the challenges that arose in building such a system, and highlight how our work on Vizier lead to novel research in uncertain data management and incremental execution of workflows. more »

Award ID(s):: 1640864 1750460 1956149 1956123 2107107

PAR ID:: 10400895

Author(s) / Creator(s):: Oliver Kennedy, Boris Glavic

Editor(s):: Sudeepa Roy and Jun Yang

Date Published:: 2022-09-01

Journal Name:: Bulletin of the Technical Committee on Data Engineering

Volume:: 45

Issue:: 3

Page Range / eLocation ID:: 129-144

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this