skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: One data set, many analysts: Implications for practicing scientists
Researchers routinely face choices throughout the data analysis process. It is often opaque to readers how these choices are made, how they affect the findings, and whether or not data analysis results are unduly influenced by subjective decisions. This concern is spurring numerous investigations into the variability of data analysis results. The findings demonstrate that different teams analyzing the same data may reach different conclusions. This is the “many-analysts” problem. Previous research on the many-analysts problem focused on demonstrating its existence, without identifying specific practices for solving it. We address this gap by identifying three pitfalls that have contributed to the variability observed in many-analysts publications and providing suggestions on how to avoid them.  more » « less
Award ID(s):
2152746
PAR ID:
10511558
Author(s) / Creator(s):
;
Publisher / Repository:
Frontiers in Psychology
Date Published:
Journal Name:
Frontiers in Psychology
Volume:
14
ISSN:
1664-1078
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Many-analysts studies explore how well an empirical claim withstands plausible alternative analyses of the same dataset by multiple, independent analysis teams. Conclusions from these studies typically rely on a single outcome metric (e.g. effect size) provided by each analysis team. Although informative about the range of plausible effects in a dataset, a single effect size from each team does not provide a complete, nuanced understanding of how analysis choices are related to the outcome. We used the Delphi consensus technique with input from 37 experts to develop an 18-item subjective evidence evaluation survey (SEES) to evaluate how each analysis team views the methodological appropriateness of the research design and the strength of evidence for the hypothesis. We illustrate the usefulness of the SEES in providing richer evidence assessment with pilot data from a previous many-analysts study. 
    more » « less
  2. Abstract Analyzing large and complex datasets for critical decision making can benefit from a collective effort involving a team of analysts. However, insights and findings from different analysts are often incomplete, disconnected, or even conflicting. Most existing analysis tools lack proper support for examining and resolving the conflicts among the findings in order to consolidate the results of collaborative data analysis. In this paper, we present CoVA, a visual analytics system incorporating conflict detection and resolution for supporting asynchronous collaborative data analysis. By using a declarative visualization language and graph representation for managing insights and insight provenance, CoVA effectively leverages distributed revision control workflow from software engineering to automatically detect and properly resolve conflicts in collaborative analysis results. In addition, CoVA provides an effective visual interface for resolving conflicts as well as combining the analysis results. We conduct a user study to evaluate CoVA for collaborative data analysis. The results show that CoVA allows better understanding and use of the findings from different analysts. 
    more » « less
  3. Drawing reliable inferences from data involves many, sometimes arbitrary, decisions across phases of data collection, wrangling, and modeling. As different choices can lead to diverging conclusions, understanding how researchers make analytic decisions is important for supporting robust and replicable analysis. In this study, we pore over nine published research studies and conduct semi-structured interviews with their authors. We observe that researchers often base their decisions on methodological or theoretical concerns, but subject to constraints arising from the data, expertise, or perceived interpretability. We confirm that researchers may experiment with choices in search of desirable results, but also identify other reasons why researchers explore alternatives yet omit findings. In concert with our interviews, we also contribute visualizations for communicating decision processes throughout an analysis. Based on our results, we identify design opportunities for strengthening end-to-end analysis, for instance via tracking and meta-analysis of multiple decision paths. 
    more » « less
  4. A critical step in data analysis for many different types of experiments is the identification of features with theoretically defined shapes in N -dimensional datasets; examples of this process include finding peaks in multi-dimensional molecular spectra or emitters in fluorescence microscopy images. Identifying such features involves determining if the overall shape of the data is consistent with an expected shape; however, it is generally unclear how to quantitatively make this determination. In practice, many analysis methods employ subjective, heuristic approaches, which complicates the validation of any ensuing results—especially as the amount and dimensionality of the data increase. Here, we present a probabilistic solution to this problem by using Bayes’ rule to calculate the probability that the data have any one of several potential shapes. This probabilistic approach may be used to objectively compare how well different theories describe a dataset, identify changes between datasets and detect features within data using a corollary method called Bayesian Inference-based Template Search; several proof-of-principle examples are provided. Altogether, this mathematical framework serves as an automated ‘engine’ capable of computationally executing analysis decisions currently made by visual inspection across the sciences. 
    more » « less
  5. The complexities of many environmental problems make the task of identifying potential solutions daunting. We present a diagnostic framework to help guide environmental policy analysts and practitioners to think more systematically about the major types of environmental problems and their possible policy responses. Our framework helps the user classify a problem into 1 of the 3 main problem categories, and then for each of the problem types think about contextual factors that will influence the choice of policy responses. The main problem types are (1) common-pool resource (CPR) problems (e.g., overfishing, groundwater depletion, and forest degradation); (2) pollution problems (e.g., greenhouse gas emissions, eutrophication, acid rain, and smog); and (3) hazards (natural and human-made hazards, including hurricanes, wildfires, and levy collapse). For each of these problems, the framework asks users to consider several contextual factors that are known to influence the likely effectiveness of different policy responses, particularly fast-thinking behavior. The framework is a heuristic tool that will help novice analysts develop a deeper understanding of the problems at hand and an appreciation for the complexities involved in coming up with workable solutions to environmental challenges. The proposed framework is not prescriptive but analytical in that it asks users guiding questions to assess multiple aspects of a problem. The resulting problem assessment helps to narrow down the number of viable options for environmental policy responses, each of which may, in turn, be assessed with an eye toward their legal, political, and social viability. 
    more » « less