skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: Visual (dis)Confirmation: Validating Models and Hypotheses with Visualizations
Data visualization provides a powerful way for analysts to explore and make data-driven discoveries. However, current visual analytic tools provide only limited support for hypothesis-driven inquiry, as their built-in interactions and workflows are primarily intended for exploratory analysis. Visualization tools notably lack capabilities that would allow users to visually and incrementally test the fit of their conceptual models and provisional hypotheses against the data. This imbalance could bias users to overly rely on exploratory analysis as the principal mode of inquiry, which can be detrimental to discovery. In this paper, we introduce Visual (dis) Confirmation, a tool for conducting confirmatory, hypothesis-driven analyses with visualizations. Users interact by framing hypotheses and data expectations in natural language. The system then selects conceptually relevant data features and automatically generates visualizations to validate the underlying expectations. Distinctively, the resulting visualizations also highlight places where one's mental model disagrees with the data, so as to stimulate reflection. The proposed tool represents a new class of interactive data systems capable of supporting confirmatory visual analysis, and responding more intelligently by spotlighting gaps between one's knowledge and the data. We describe the algorithmic techniques behind this workflow. We also demonstrate the utility of the tool through a case study.  more » « less
Award ID(s):
1755611
NSF-PAR ID:
10130799
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
23rd International Conference in Information Visualization – Part II
Page Range / eLocation ID:
116 to 121
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Visualizations of data provide a proven method for analysts to explore and make data-driven discoveries. However, current visualization tools provide only limited support for hypothesis-driven analyses, and often lack capabilities that would allow users to visually test the fit of their conceptual models against the data. This imbalance could bias users to overly rely on exploratory visual analysis as the principal mode of inquiry, which can be detrimental to discovery. To address this gap, we propose a new paradigm for ‘concept-driven’ visual analysis. In this style of analysis, analysts share their conceptual models and hypotheses with the system. The system then uses those inputs to drive the generation of visualizations, while providing plots and interactions to explore places where models and data disagree. We discuss key characteristics and design considerations for concept-driven visualizations, and report preliminary findings from a formative study. 
    more » « less
  2. Visualization tools facilitate exploratory data analysis, but fall short at supporting hypothesis-based reasoning. We conducted an exploratory study to investigate how visualizations might support a concept-driven analysis style, where users can optionally share their hypotheses and conceptual models in natural language, and receive customized plots depicting the fit of their models to the data. We report on how participants leveraged these unique affordances for visual analysis. We found that a majority of participants articulated meaningful models and predictions, utilizing them as entry points to sensemaking. We contribute an abstract typology representing the types of models participants held and externalized as data expectations. Our findings suggest ways for rearchitecting visual analytics tools to better support hypothesis- and model-based reasoning, in addition to their traditional role in exploratory analysis. We discuss the design implications and reflect on the potential benefits and challenges involved. 
    more » « less
  3. Exploratory data science largely happens in computational notebooks with dataframe APIs, such as pandas, that support flexible means to transform, clean, and analyze data. Yet, visually exploring data in dataframes remains tedious, requiring substantial programming effort for visualization and mental effort to determine what analysis to perform next. We propose Lux, an always-on framework for accelerating visual insight discovery in dataframe workflows. When users print a dataframe in their notebooks, Lux recommends visualizations to provide a quick overview of the patterns and trends and suggests promising analysis directions. Lux features a high-level language for generating visualizations on demand to encourage rapid visual experimentation with data. We demonstrate that through the use of a careful design and three system optimizations, Lux adds no more than two seconds of overhead on top of pandas for over 98% of datasets in the UCI repository. We evaluate Lux in terms of usability via interviews with early adopters, finding that Lux helps fulfill the needs of data scientists for visualization support within their dataframe workflows. Lux has already been embraced by data science practitioners, with over 3.1k stars on Github. 
    more » « less
  4. With an ever-growing amount of collected data, the importance of visualization as an analysis component is growing in concert. The creation of good visualizations often doesn't happen in one step but is rather an iterative and exploratory process. However, this process is currently not well supported in most of the available visualization tools and systems. Visualization authors are forced to commit prematurely to particular design aspects of their creations, and when exploring potential variant visualizations, they are forced to adopt ad hoc techniques such as copying code snippets or keeping a collection of separate files. We propose variational visualizations as a model supporting open-ended exploration of the design space of information visualization. Together with that model, we present a prototype implementation in the form of a domain-specific language embedded in Purescript. 
    more » « less
  5. Creating data visualizations requires diverse skills including computer programming, statistics, and graphic design. Visualization practitioners, often formally trained in one but not all of these areas, increasingly face the challenge of reconciling, integrating and prioritizing competing disciplinary values, norms and priorities. To inform multidisciplinary visualization pedagogy, we analyze the negotiation of values in the rhetoric and affordances of two common tools for creating visual representations of data: R and Adobe Illustrator. Features of, and discourse around, these standard visualization tools illustrate both a convergence of values and priorities (clear, attractive, and communicative data-driven graphics) side-by-side with a retention of rhetorical divisions between disciplinary communities (statistical analysis in contrast to creative expression). We discuss implications for data-driven work and data science curricula within the current environment where data visualization practice is converging while values in rhetoric remain divided. 
    more » « less