skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Statistics of Replication
Abstract. The concept of replication is fundamental to the logic and rhetoric of science, including the argument that science is self-correcting. Yet there is very little literature on the methodology of replication. In this article, I argue that the definition of replication should not require underlying effects to be identical, but should permit some variation in true effects to be allowed. I note that different possible analyses could be used to determine whether studies replicate. Finally, I argue that a single replication study is almost never adequate to determine whether a result replicates. Thus, methodological work on the design of replication studies would be useful.  more » « less
Award ID(s):
1841075
PAR ID:
10173463
Author(s) / Creator(s):
Date Published:
Journal Name:
Methodology
Volume:
15
Issue:
Supplement 1
ISSN:
1614-1881
Page Range / eLocation ID:
3 to 14
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The problem of assessing whether experimental results can be replicated is becoming increasingly important in many areas of science. It is often assumed that assessing replication is straightforward: All one needs to do is repeat the study and see whether the results of the original and replication studies agree. This article shows that the statistical test for whether two studies obtain the same effect is smaller than the power of either study to detect an effect in the first place. Thus, unless the original study and the replication study have unusually high power (e.g., power of 98%), a single replication study will not have adequate sensitivity to provide an unambiguous evaluation of replication. 
    more » « less
  2. A series of failed replications and frauds have raised questions regarding self-correction in science. Metascientific activists have advocated policies that incentivize replications and make them more diagnostically potent. We argue that current debates, as well as research in science and technology studies, have paid little heed to a key dimension of replication practice. Although it sometimes serves a diagnostic function, replication is commonly motivated by a practical desire to extend research interests. The resulting replication, which we label ‘integrative’, is characterized by a pragmatic flexibility toward protocols. The goal is to appropriate what is useful, not test for truth. Within many experimental cultures, however, integrative replications can produce replications of ambiguous diagnostic power. Based on interviews with 60 members of the Board of Reviewing Editors for the journal Science, we show how the interplay between the diagnostic and integrative motives for replication differs between fields and produces different cultures of replication. We offer six theses that aim to put science and technology studies and science activism into dialog to show why effective reforms will need to confront issues of disciplinary difference. 
    more » « less
  3. Practicing reproducible scientific research requires access to appropriate reproducibility methodology and software, as well as open data. Strict reproducibility in complex scientific domains such as environmental science, ecology and medicine, however, is difficult if not impossible. Here, we consider replication as a relaxed but bona fide substitution for strict reproducibility and propose using 3D terrain visualization for replication in environmental science studies that propose causal relationships between one or more driver variables and one or more response variables across complex ecosystem landscapes. We base our contention of the usefulness of visualization for replication on more than ten years observing environmental science modelers who use our 3D terrain visualization software to develop, calibrate, validate, and integrate predictive models. To establish the link between replication and model validation and corroboration, we consider replication as proposed by Munafò, i.e., triangulation. We enumerate features of visualization systems that would enable such triangulation and argue that such systems would render feasible domain-specific, open visualization software for use in replicating environmental science studies. 
    more » « less
  4. Abstract If contextual values can play necessary and beneficial roles in scientific research, to what extent should science communicators be transparent about such values? This question is particularly pressing in contexts where there appears to be significant resistance among some non-experts to accept certain scientific claims or adopt science-based policies or recommendations. This paper examines whether value transparency can help promote non-experts’ warranted epistemic trust of experts. I argue that there is a prima facie case in favor of transparency because it can promote four conditions that are thought to be required for epistemic trustworthiness. I then consider three main arguments that transparency about values is likely to be ineffective in promoting such trust (and may undermine it). This analysis shows that while these arguments show that value transparency is not sufficient for promoting epistemic trust, they fail to show that rejecting value transparency as a norm for science communicators is more likely to promote warranted epistemic trust than a qualified norm of value transparency (along with other strategies). Finally, I endorse a tempered understanding of value transparency and consider what this might require in practice. 
    more » « less
  5. Abstract Empirical evaluations of replication have become increasingly common, but there has been no unified approach to doing so. Some evaluations conduct only a single replication study while others run several, usually across multiple laboratories. Designing such programs has largely contended with difficult issues about which experimental components are necessary for a set of studies to be considered replications. However, another important consideration is that replication studies be designed to support sufficiently sensitive analyses. For instance, if hypothesis tests are to be conducted about replication, studies should be designed to ensure these tests are well-powered; if not, it can be difficult to determine conclusively if replication attempts succeeded or failed. This paper describes methods for designing ensembles of replication studies to ensure that they are both adequately sensitive and cost-efficient. It describes two potential analyses of replication studies—hypothesis tests and variance component estimation—and approaches to obtaining optimal designs for them. Using these results, it assesses the statistical power, precision of point estimators and optimality of the design used by the Many Labs Project and finds that while it may have been sufficiently powered to detect some larger differences between studies, other designs would have been less costly and/or produced more precise estimates or higher-powered hypothesis tests. 
    more » « less