skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Hypothesis Formalization: Empirical Findings, Software Limitations, and Design Implications
Data analysis requires translating higher level questions and hypotheses into computable statistical models. We present a mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process we refer to as hypothesis formalization . In a formative content analysis of 50 research papers, we find that researchers highlight decomposing a hypothesis into sub-hypotheses, selecting proxy variables, and formulating statistical models based on data collection design as key steps. In a lab study, we find that analysts fixated on implementation and shaped their analyses to fit familiar approaches, even if sub-optimal. In an analysis of software tools, we find that tools provide inconsistent, low-level abstractions that may limit the statistical models analysts use to formalize hypotheses. Based on these observations, we characterize hypothesis formalization as a dual-search process balancing conceptual and statistical considerations constrained by data and computation and discuss implications for future tools.  more » « less
Award ID(s):
1901386
PAR ID:
10355052
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
ACM Transactions on Computer-Human Interaction
Volume:
29
Issue:
1
ISSN:
1073-0516
Page Range / eLocation ID:
1 to 28
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Visualizations of data provide a proven method for analysts to explore and make data-driven discoveries. However, current visualization tools provide only limited support for hypothesis-driven analyses, and often lack capabilities that would allow users to visually test the fit of their conceptual models against the data. This imbalance could bias users to overly rely on exploratory visual analysis as the principal mode of inquiry, which can be detrimental to discovery. To address this gap, we propose a new paradigm for ‘concept-driven’ visual analysis. In this style of analysis, analysts share their conceptual models and hypotheses with the system. The system then uses those inputs to drive the generation of visualizations, while providing plots and interactions to explore places where models and data disagree. We discuss key characteristics and design considerations for concept-driven visualizations, and report preliminary findings from a formative study. 
    more » « less
  2. Data visualization provides a powerful way for analysts to explore and make data-driven discoveries. However, current visual analytic tools provide only limited support for hypothesis-driven inquiry, as their built-in interactions and workflows are primarily intended for exploratory analysis. Visualization tools notably lack capabilities that would allow users to visually and incrementally test the fit of their conceptual models and provisional hypotheses against the data. This imbalance could bias users to overly rely on exploratory analysis as the principal mode of inquiry, which can be detrimental to discovery. In this paper, we introduce Visual (dis) Confirmation, a tool for conducting confirmatory, hypothesis-driven analyses with visualizations. Users interact by framing hypotheses and data expectations in natural language. The system then selects conceptually relevant data features and automatically generates visualizations to validate the underlying expectations. Distinctively, the resulting visualizations also highlight places where one's mental model disagrees with the data, so as to stimulate reflection. The proposed tool represents a new class of interactive data systems capable of supporting confirmatory visual analysis, and responding more intelligently by spotlighting gaps between one's knowledge and the data. We describe the algorithmic techniques behind this workflow. We also demonstrate the utility of the tool through a case study. 
    more » « less
  3. Though statistical analyses are centered on research questions and hypotheses, current statistical analysis tools are not. Users must first translate their hypotheses into specific statistical tests and then perform API calls with functions and parameters. To do so accurately requires that users have statistical expertise. To lower this barrier to valid, replicable statistical analysis, we introduce Tea, a high-level declarative language and runtime system. In Tea, users express their study design, any parametric assumptions, and their hypotheses. Tea compiles these high-level specifications into a constraint satisfaction problem that determines the set of valid statistical tests and then executes them to test the hypothesis. We evaluate Tea using a suite of statistical analyses drawn from popular tutorials. We show that Tea generally matches the choices of experts while automatically switching to non-parametric tests when parametric assumptions are not met. We simulate the effect of mistakes made by non-expert users and show that Tea automatically avoids both false negatives and false positives that could be produced by the application of incorrect statistical tests. 
    more » « less
  4. Visualization tools facilitate exploratory data analysis, but fall short at supporting hypothesis-based reasoning. We conducted an exploratory study to investigate how visualizations might support a concept-driven analysis style, where users can optionally share their hypotheses and conceptual models in natural language, and receive customized plots depicting the fit of their models to the data. We report on how participants leveraged these unique affordances for visual analysis. We found that a majority of participants articulated meaningful models and predictions, utilizing them as entry points to sensemaking. We contribute an abstract typology representing the types of models participants held and externalized as data expectations. Our findings suggest ways for rearchitecting visual analytics tools to better support hypothesis- and model-based reasoning, in addition to their traditional role in exploratory analysis. We discuss the design implications and reflect on the potential benefits and challenges involved. 
    more » « less
  5. Narrative sensemaking is a fundamental process to understand sequential information. Narrative maps are a visual representation framework that can aid analysts in this process. They allow analysts to understand the big picture of a narrative, uncover new relationships between events, and model connections between storylines. As a sensemaking tool, narrative maps have applications in intelligence analysis, misinformation modeling, and computational journalism. In this work, we seek to understand how analysts construct narrative maps in order to improve narrative map representation and extraction methods. We perform an experiment with a data set of news articles. Our main contribution is an analysis of how analysts construct narrative maps. The insights extracted from our study can be used to design narrative map visualizations, extraction algorithms, and visual analytics tools to support the sensemaking process. 
    more » « less