Data analysis requires translating higher level questions and hypotheses into computable statistical models. We present a mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process we refer to as hypothesis formalization . In a formative content analysis of 50 research papers, we find that researchers highlight decomposing a hypothesis into sub-hypotheses, selecting proxy variables, and formulating statistical models based on data collection design as key steps. In a lab study, we find that analysts fixated on implementation and shaped their analyses to fit familiar approaches, even if sub-optimal. In an analysis of software tools, we find that tools provide inconsistent, low-level abstractions that may limit the statistical models analysts use to formalize hypotheses. Based on these observations, we characterize hypothesis formalization as a dual-search process balancing conceptual and statistical considerations constrained by data and computation and discuss implications for future tools.
more »
« less
Application of Statistics Training to Real-World Contexts: Issues Related to Working as Data Analysts Outside Academe
Through I-Corps™ customer discovery interviews (NSF Award 1925391), the authors determined that early and mid-career data analysts would be positively benefitted by the development and commercialization of an interactive software tool designed to assist in the selection of statistical tests for their real-world applications. The primary advantage addressed with this innovation is the concomitant reduction in time spent by data analysts in training and/or researching which statistical method to employ for a specific application. This paper details the development of the Stat Tree™ software prototype to accomplish those goals.
more »
« less
- Award ID(s):
- 1925391
- PAR ID:
- 10617257
- Publisher / Repository:
- North American Business Press
- Date Published:
- Journal Name:
- Journal of Strategic Innovation and Sustainability
- Volume:
- 16
- Issue:
- 2
- ISSN:
- 1718-2077
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Analysts often make visual causal inferences about possible data-generating models. However, visual analytics (VA) software tends to leave these models implicit in the mind of the analyst, which casts doubt on the statistical validity of informal visual “insights”. We formally evaluate the quality of causal inferences from visualizations by adopting causal support—a Bayesian cognition model that learns the probability of alternative causal explanations given some data—as a normative benchmark for causal inferences. We contribute two experiments assessing how well crowdworkers can detect (1) a treatment effect and (2) a confounding relationship. We find that chart users’ causal inferences tend to be insensitive to sample size such that they deviate from our normative benchmark. While interactively cross-filtering data in visualizations can improve sensitivity, on average users do not perform reliably better with common visualizations than they do with textual contingency tables. These experiments demonstrate the utility of causal support as an evaluation framework for inferences in VA and point to opportunities to make analysts’ mental models more explicit in VA software.more » « less
-
Abstract High-throughput experimentation (HTE) is an increasingly important tool in reaction discovery. While the hardware for running HTE in the chemical laboratory has evolved significantly in recent years, there remains a need for software solutions to navigate data-rich experiments. Here we have developed phactor™, a software that facilitates the performance and analysis of HTE in a chemical laboratory. phactor™ allows experimentalists to rapidly design arrays of chemical reactions or direct-to-biology experiments in 24, 96, 384, or 1,536 wellplates. Users can access online reagent data, such as a chemical inventory, to virtually populate wells with experiments and produce instructions to perform the reaction array manually, or with the assistance of a liquid handling robot. After completion of the reaction array, analytical results can be uploaded for facile evaluation, and to guide the next series of experiments. All chemical data, metadata, and results are stored in machine-readable formats that are readily translatable to various software. We also demonstrate the use of phactor™ in the discovery of several chemistries, including the identification of a low micromolar inhibitor of the SARS-CoV-2 main protease. Furthermore, phactor™ has been made available for free academic use in 24- and 96-well formats via an online interface.more » « less
-
[Context and motivation] Trace matrices are lynch pins for the development of mission- and safety-critical software systems and are useful for all software systems, yet automated methods for recovering trace links are far from perfect. This limitation makes the job of human analysts who must vet recovered trace links more difficult. [Question/Problem] Earlier studies suggested that certain analyst behaviors when performing trace recovery tasks lead to decreased accuracy of recovered trace relationships. We propose a three-step experimental study to: (a) determine if there really are behaviors that lead to errors of judgment for analysts, (b) enhance the requirements tracing software to curtail such behaviors, and (c) determine if curtailing such behaviors results in increased accuracy. [Principal ideas/results] We report on a preliminary study we undertook in which we modified the user interface of RETRO.NET to curtail two behaviors indicated by the earlier work. We report on observed results. [Contributions] We describe and discuss a major study of potentially unwanted analyst behaviors and present results of a preliminary study toward determining if curbing these behaviors with enhancements to tracing software leads to fewer human errors.more » « less
-
null (Ed.)Every year, tens of thousands of refugees are resettled to dozens of host countries. Although there is growing evidence that the initial placement of refugee families profoundly affects their lifetime outcomes, there have been few attempts to optimize resettlement decisions. We integrate machine learning and integer optimization into an innovative software tool, Annie™ Matching and Outcome Optimization for Refugee Empowerment (Annie™ Moore), that assists a U.S. resettlement agency with matching refugees to their initial placements. Our software suggests optimal placements while giving substantial autonomy to the resettlement staff to fine-tune recommended matches, thereby streamlining their resettlement operations. Initial back testing indicates that Annie™ can improve short-run employment outcomes by 22%–38%. We conclude by discussing several directions for future work.more » « less
An official website of the United States government

