skip to main content


This content will become publicly available on April 1, 2025

Title: An empirical study of counterfactual visualization to support visual causal inference
Counterfactuals – expressing what might have been true under different circumstances – have been widely applied in statistics and machine learning to help understand causal relationships. More recently, counterfactuals have begun to emerge as a technique being applied within visualization research. However, it remains unclear to what extent counterfactuals can aid with visual data communication. In this paper, we primarily focus on assessing the quality of users’ understanding of data when provided with counterfactual visualizations. We propose a preliminary model of causality comprehension by connecting theories from causal inference and visual data communication. Leveraging this model, we conducted an empirical study to explore how counterfactuals can improve users’ understanding of data in static visualizations. Our results indicate that visualizing counterfactuals had a positive impact on participants’ interpretations of causal relations within datasets. These results motivate a discussion of how to more effectively incorporate counterfactuals into data visualizations.  more » « less
Award ID(s):
2211845
PAR ID:
10534193
Author(s) / Creator(s):
; ;
Publisher / Repository:
Sage
Date Published:
Journal Name:
Information Visualization
Volume:
23
Issue:
2
ISSN:
1473-8716
Page Range / eLocation ID:
197 to 214
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Exploratory data analysis of high-dimensional datasets is a crucial task for which visual analytics can be especially useful. However, the ad hoc nature of exploratory analysis can also lead users to draw incorrect causal inferences. Previous studies have demonstrated this risk and shown that integrating counterfactual concepts within visual analytics systems can improve users’ understanding of visualized data. However, effectively leveraging counterfactual concepts can be challenging, with only bespoke implementations found in prior work. Moreover, it can require expertise in both counterfactual subset analysis and visualization to implement the functionalities practically. This paper aims to help address these challenges in two ways. First, we propose an operator-based conceptual model for the use of counterfactuals that is informed by prior work in visualization research. Second, we contribute the Co-op library, an open and extensible reference implementation of this model that can support the integration of counterfactual-based subset computation with visualization systems. To evaluate the effectiveness and generalizability of Co-op, the library was used to construct two different visual analytics systems each supporting a distinct user workflow. In addition, expert interviews were conducted with professional visual analytics researchers and engineers to gain more insights regarding how Co-op could be leveraged. Finally, informed in part by these evaluation results, we distil a set of key design implications for effectively leveraging counterfactuals in future visualization systems. 
    more » « less
  2. Analysts often make visual causal inferences about possible data-generating models. However, visual analytics (VA) software tends to leave these models implicit in the mind of the analyst, which casts doubt on the statistical validity of informal visual “insights”. We formally evaluate the quality of causal inferences from visualizations by adopting causal support—a Bayesian cognition model that learns the probability of alternative causal explanations given some data—as a normative benchmark for causal inferences. We contribute two experiments assessing how well crowdworkers can detect (1) a treatment effect and (2) a confounding relationship. We find that chart users’ causal inferences tend to be insensitive to sample size such that they deviate from our normative benchmark. While interactively cross-filtering data in visualizations can improve sensitivity, on average users do not perform reliably better with common visualizations than they do with textual contingency tables. These experiments demonstrate the utility of causal support as an evaluation framework for inferences in VA and point to opportunities to make analysts’ mental models more explicit in VA software. 
    more » « less
  3. Answering counterfactual queries has important applications such as explainability, robustness, and fairness but is challenging when the causal variables are unobserved and the observations are non-linear mixtures of these latent variables, such as pixels in images. One approach is to recover the latent Structural Causal Model (SCM), which may be infeasible in practice due to requiring strong assumptions, e.g., linearity of the causal mechanisms or perfect atomic interventions. Meanwhile, more practical ML-based approaches using naive domain translation models to generate counterfactual samples lack theoretical grounding and may construct invalid counterfactuals. In this work, we strive to strike a balance between practicality and theoretical guarantees by analyzing a specific type of causal query called domain counterfactuals, which hypothesizes what a sample would have looked like if it had been generated in a different domain (or environment). We show that recovering the latent SCM is unnecessary for estimating domain counterfactuals, thereby sidestepping some of the theoretic challenges. By assuming invertibility and sparsity of intervention, we prove domain counterfactual estimation error can be bounded by a data fit term and intervention sparsity term. Building upon our theoretical results, we develop a theoretically grounded practical algorithm that simplifies the modeling process to generative model estimation under autoregressive and shared parameter constraints that enforce intervention sparsity. Finally, we show an improvement in counterfactual estimation over baseline methods through extensive simulated and image-based experiments. 
    more » « less
  4. Attempting to make sense of a phenomenon or crisis, social media users often share data visualizations and interpretations that can be erroneous or misleading. Prior work has studied how data visualizations can mislead, but do misleading visualizations reach a broad social media audience? And if so, do users amplify or challenge misleading interpretations? To answer these questions, we conducted a mixed-methods analysis of the public's engagement with data visualization posts about COVID-19 on Twitter. Compared to posts with accurate visual insights, our results show that posts with misleading visualizations garner more replies in which the audiences point out nuanced fallacies and caveats in data interpretations. Based on the results of our thematic analysis of engagement, we identify and discuss important opportunities and limitations to effectively leveraging crowdsourced assessments to address data-driven misinformation. 
    more » « less
  5. Abstract Background

    Increased use of visualizations as wildfire communication tools with public and professional audiences—particularly 3D videos and virtual or augmented reality—invites discussion of their ethical use in varied social and temporal contexts. Existing studies focus on the use of such visualizations prior to fire events and commonly use hypothetical scenarios intended to motivate proactive mitigation or explore decision-making, overlooking the insights that those who have already experienced fire events can provide to improve user engagement and understanding of wildfire visualizations more broadly. We conducted semi-structured interviews with 101 residents and professionals affected by Colorado’s 2020 East Troublesome and 2021 Marshall Fires, using 3D model visualizations of fire events on tablets as a discussion tool to understand how fire behavior influenced evacuation experiences and decision-making. We provide empirically gathered insights that can inform the ethical use of wildfire visualizations by scientists, managers, and communicators working at the intersection of fire management and public safety.

    Results

    Study design, interview discussions, and field observations from both case studies reveal the importance of nuanced and responsive approaches for the use of 3D visualizations, with an emphasis on the implementation of protocols that ensure the risk of harm to the intended audience is minimal. We share five considerations for use of visualizations as communication tools with public and professional audiences, expanding existing research into post-fire spaces: (1) determine whether the use of visualizations will truly benefit users; (2) connect users to visualizations by incorporating local values; (3) provide context around model uncertainty; (4) design and share visualizations in ways that meet the needs of the user; (5) be cognizant of the emotional impacts that sharing wildfire visualizations can have.

    Conclusions

    This research demonstrates the importance of study design and planning that considers the emotional and psychological well-being of users. For users that do wish to engage with visualizations, this technical note provides guidance for ensuring meaningful understandings that can generate new discussion and knowledge. We advocate for communication with visualizations that consider local context and provide opportunities for users to engage to a level that suits them, suggesting that visualizations should serve as catalysts for meaningful dialogue rather than conclusive information sources.

     
    more » « less