skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 13 until 2:00 AM ET on Saturday, September 14 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on June 1, 2025

Title: Persist: Persistent and Reusable Interactions in Computational Notebooks
Computational notebooks, such as Jupyter, support rich data visualization. However, even when visualizations in notebooks are interactive, they are a dead end: Interactive data manipulations, such as selections, applying labels, filters, categorizations, or fixes to column or cell values, could be efficiently applied in interactive visual components, but interactive components typically cannot manipulate Python data structures. Furthermore, actions performed in interactive plots are lost as soon as the cell is re‐run, prohibiting reusability and reproducibility. To remedy this problem, we introduce Persist, a family of techniques to (a) capture interaction provenance, enabling the persistence of interactions, and (b) map interactions to data manipulations that can be applied to dataframes. We implement our approach as a JupyterLab extension that supports tracking interactions in Vega‐Altair plots and in a data table view. Persist can re‐execute interaction provenance when a notebook or a cell is re‐executed, enabling reproducibility and re‐use. We evaluate Persist in a user study targeting data manipulations with 11 participants skilled in Python and Pandas, comparing it to traditional code‐based approaches. Participants were consistently faster and were able to correctly complete more tasks with Persist.  more » « less
Award ID(s):
2213756 1751238
NSF-PAR ID:
10517907
Author(s) / Creator(s):
; ;
Publisher / Repository:
EuroGraphics
Date Published:
Journal Name:
Computer Graphics Forum
Volume:
43
Issue:
3
ISSN:
0167-7055
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Data scientists have embraced computational notebooks to author analysis code and accompanying visualizations within a single document. Currently, although these media may be interleaved, they remain siloed: interactive visualizations must be manually specified as they are divorced from the analysis provenance expressed via dataframes, while code cells have no access to users' interactions with visualizations, and hence no way to operate on the results of interaction. To bridge this divide, we present B2, a set of techniques grounded in treating data queries as a shared representation between the code and interactive visualizations. B2 instruments data frames to track the queries expressed in code and synthesize corresponding visualizations. These visualizations are displayed in a dashboard to facilitate interactive analysis. When an interaction occurs, B2 reifies it as a data query and generates a history log in a new code cell. Subsequent cells can use this log to further analyze interaction results and, when marked as reactive, to ensure that code is automatically recomputed when new interaction occurs. In an evaluative study with data scientists, we find that B2 promotes a tighter feedback loop between coding and interacting with visualizations. All participants frequently moved from code to visualization and vice-versa, which facilitated their exploratory data analysis in the notebook. 
    more » « less
  2. Abstract

    Interactive visual analysis has many advantages, but an important disadvantage is that analysis processes and workflows cannot be easily stored and reused. This is in contrast to code‐based analysis workflows, which can simply be run on updated datasets, and adapted when necessary. In this paper, we introduce methods to capture workflows in interactive visualization systems for different interactions such as selections, filters, categorizing/grouping, labeling, and aggregation. These workflows can then be applied to updated datasets, making interactive visualization sessions reusable. We demonstrate this specification using an interactive visualization system that tracks interaction provenance, and allows generating workflows from the recorded actions. The system can then be used to compare different versions of datasets and apply workflows to them. Finally, we introduce a Python library that can load workflows and apply it to updated datasets directly in a computational notebook, providing a seamless bridge between computational workflows and interactive visualization tools.

     
    more » « less
  3. CompuCell3D (CC3D) is an open-source software framework for building and executing multi-cell biological virtual-tissue models. It represents cells using the Glazier–Graner–Hogeweg model, also known as Cellular Potts model. The primary CC3D application consists of two separate tools, a smart model editor (Twedit++) and a tool for model execution, visualization and steering (Player). The CompuCell3D version 4.x release introduces support for Jupyter Notebooks, an interactive computational environment, which brings the benefits of reproducibility, portability, and self-documentation. Since model specifications in CC3D are written in Python and CC3DML and Jupyter supports Python and other languages, Jupyter can naturally act as an integrated development environment (IDE) for CC3D users as well as a live document with embedded text and simulations. This update follows the trend in software to move away from monolithic freestanding applications to the distribution of methodologies in the form of libraries that can be used in conjunction with other libraries and packages. With these benefits, CC3D deployed inJupyter Notebook is a more natural and efficient platform for scientific publishing and education using CC3D. 
    more » « less
  4. Marschall, Tobias (Ed.)
    Abstract Motivation

    JBrowse Jupyter is a package that aims to close the gap between Python programming and genomic visualization. Web-based genome browsers are routinely used for publishing and inspecting genome annotations. Historically they have been deployed at the end of bioinformatics pipelines, typically decoupled from the analysis itself. However, emerging technologies such as Jupyter notebooks enable a more rapid iterative cycle of development, analysis and visualization.

    Results

    We have developed a package that provides a Python interface to JBrowse 2’s suite of embeddable components, including the primary Linear Genome View. The package enables users to quickly set up, launch and customize JBrowse views from Jupyter notebooks. In addition, users can share their data via Google’s Colab notebooks, providing reproducible interactive views.

    Availability and implementation

    JBrowse Jupyter is released under the Apache License and is available for download on PyPI. Source code and demos are available on GitHub at https://github.com/GMOD/jbrowse-jupyter.

     
    more » « less
  5. Sudeepa Roy and Jun Yang (Ed.)
    Data scientists use a wide variety of systems with a wide variety of user interfaces such as spreadsheets and notebooks for their data exploration, discovery, preprocessing, and analysis tasks. While this wide selection of tools offers data scientists the freedom to pick the right tool for each task, each of these tools has limitations (e.g., the lack of reproducibility of notebooks), data needs to be translated between tool-specific formats, and common functionality such as versioning, provenance, and dealing with data errors often has to be implemented for each system. We argue that rather than alternating between task-specific tools, a superior approach is to build multiple user-interfaces on top of a single incremental workflow / dataflow platform with built-in support for versioning, provenance, error & tracking, and data cleaning. We discuss Vizier, a notebook system that implements this approach, introduce the challenges that arose in building such a system, and highlight how our work on Vizier lead to novel research in uncertain data management and incremental execution of workflows. 
    more » « less