Sudeepa Roy and Jun Yang
(Ed.)
Data scientists use a wide variety of systems with a wide variety of user interfaces such as spreadsheets
and notebooks for their data exploration, discovery, preprocessing, and analysis tasks. While this wide
selection of tools offers data scientists the freedom to pick the right tool for each task, each of these
tools has limitations (e.g., the lack of reproducibility of notebooks), data needs to be translated between
tool-specific formats, and common functionality such as versioning, provenance, and dealing with data
errors often has to be implemented for each system. We argue that rather than alternating between
task-specific tools, a superior approach is to build multiple user-interfaces on top of a single incremental
workflow / dataflow platform with built-in support for versioning, provenance, error & tracking, and data
cleaning. We discuss Vizier, a notebook system that implements this approach, introduce the challenges
that arose in building such a system, and highlight how our work on Vizier lead to novel research in
uncertain data management and incremental execution of workflows.
more »
« less