skip to main content


Title: B2: Bridging Code and Interactive Visualization in Computational Notebooks
Data scientists have embraced computational notebooks to author analysis code and accompanying visualizations within a single document. Currently, although these media may be interleaved, they remain siloed: interactive visualizations must be manually specified as they are divorced from the analysis provenance expressed via dataframes, while code cells have no access to users' interactions with visualizations, and hence no way to operate on the results of interaction. To bridge this divide, we present B2, a set of techniques grounded in treating data queries as a shared representation between the code and interactive visualizations. B2 instruments data frames to track the queries expressed in code and synthesize corresponding visualizations. These visualizations are displayed in a dashboard to facilitate interactive analysis. When an interaction occurs, B2 reifies it as a data query and generates a history log in a new code cell. Subsequent cells can use this log to further analyze interaction results and, when marked as reactive, to ensure that code is automatically recomputed when new interaction occurs. In an evaluative study with data scientists, we find that B2 promotes a tighter feedback loop between coding and interacting with visualizations. All participants frequently moved from code to visualization and vice-versa, which facilitated their exploratory data analysis in the notebook.  more » « less
Award ID(s):
1942659 1900991 1730628
NSF-PAR ID:
10213186
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology
Page Range / eLocation ID:
152 to 165
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Computational notebooks, such as Jupyter, support rich data visualization. However, even when visualizations in notebooks are interactive, they are a dead end: Interactive data manipulations, such as selections, applying labels, filters, categorizations, or fixes to column or cell values, could be efficiently applied in interactive visual components, but interactive components typically cannot manipulate Python data structures. Furthermore, actions performed in interactive plots are lost as soon as the cell is re‐run, prohibiting reusability and reproducibility. To remedy this problem, we introduce Persist, a family of techniques to (a) capture interaction provenance, enabling the persistence of interactions, and (b) map interactions to data manipulations that can be applied to dataframes. We implement our approach as a JupyterLab extension that supports tracking interactions in Vega‐Altair plots and in a data table view. Persist can re‐execute interaction provenance when a notebook or a cell is re‐executed, enabling reproducibility and re‐use. We evaluate Persist in a user study targeting data manipulations with 11 participants skilled in Python and Pandas, comparing it to traditional code‐based approaches. Participants were consistently faster and were able to correctly complete more tasks with Persist. 
    more » « less
  2. We present Animated Vega-Lite, a set of extensions to Vega-Lite that model animated visualizations as time-varying data queries. In contrast to alternate approaches for specifying animated visualizations, which prize a highly expressive design space, Animated Vega-Lite prioritizes unifying animation with the language's existing abstractions for static and interactive visualizations to enable authors to smoothly move between or combine these modalities. Thus, to compose animation with static visualizations, we represent time as an encoding channel. Time encodings map a data field to animation keyframes, providing a lightweight specification for animations without interaction. To compose animation and interaction, we also represent time as an event stream; Vega-Lite selections, which provide dynamic data queries, are now driven not only by input events but by timer ticks as well. We evaluate the expressiveness of our approach through a gallery of diverse examples that demonstrate coverage over taxonomies of both interaction and animation. We also critically reflect on the conceptual affordances and limitations of our contribution by interviewing five expert developers of existing animation grammars. These reflections highlight the key motivating role of in-the-wild examples, and identify three central tradeoffs: the language design process, the types of animated transitions supported, and how the systems model keyframes. 
    more » « less
  3. null (Ed.)
    In this paper, we present a new benchmark to validate the suitability of database systems for interactive visualization workloads. While there exist proposals for evaluating database systems on interactive data exploration workloads, none rely on real user traces for database benchmarking. To this end, our long term goal is to collect user traces that represent workloads with different exploration characteristics. In this paper, we present an initial benchmark that focuses on "crossfilter"-style applications, which are a popular interaction type for data exploration and a particularly demanding scenario for testing database system performance. We make our benchmark materials, including input datasets, interaction sequences, corresponding SQL queries, and analysis code, freely available as a community resource, to foster further research in this area: https://osf.io/9xerb/?view_only=81de1a3f99d04529b6b173a3bd5b4d23. 
    more » « less
  4. Abstract

    Researchers collect large amounts of user interaction data with the goal of mapping user's workflows and behaviors to their high‐level motivations, intuitions, and goals. Although the visual analytics community has proposed numerous taxonomies to facilitate this mapping process, no formal methods exist for systematically applying these existing theories to user interaction logs. This paper seeks to bridge the gap between visualization task taxonomies and interaction log data by making the taxonomies more actionable for interaction log analysis. To achieve this, we leverage structural parallels between how people express themselves through interactions and language by reformulating existing theories asregular grammars.We represent interactions asterminalswithin a regular grammar, similar to the role of individual words in a language, and patterns of interactions ornon‐terminalsasregular expressionsover these terminals to capture common language patterns. To demonstrate our approach, we generate regular grammars for seven existing visualization taxonomies and develop code to apply them to three public interaction log datasets. In analyzing these regular grammars, we find that the taxonomies at the low‐level (i.e., terminals) show mixed results in expressing multiple interaction log datasets, and taxonomies at the high‐level (i.e., regular expressions) have limited expressiveness, due to primarily two challenges: inconsistencies in interaction log dataset granularity and structure, and under‐expressiveness of certain terminals. Based on our findings, we suggest new research directions for the visualization community to augment existing taxonomies, develop new ones, and build better interaction log recording processes to facilitate the data‐driven development of user behavior taxonomies.

     
    more » « less
  5. Data scientists reportedly spend 60 to 80 percent of their time in their daily routines on data wrangling, i.e. cleaning data and extracting features. However, data wrangling code is often repetitive and error-prone to write. Moreover, it is easy to introduce subtle bugs when reusing and adopting existing code, which result not in crashes but reduce model quality. To support data scientists with data wrangling, we present a technique to generate interactive documentation for data wrangling code. We use (1) program synthesis techniques to automatically summarize data transformations and (2) test case selection techniques to purposefully select representative examples from the data based on execution information collected with tailored dynamic program analysis. We demonstrate that a JupyterLab extension with our technique can provide documentation for many cells in popular notebooks and find in a user study that users with our plugin are faster and more effective at finding realistic bugs in data wrangling code. 
    more » « less