skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Design of Replication Studies
Abstract Empirical evaluations of replication have become increasingly common, but there has been no unified approach to doing so. Some evaluations conduct only a single replication study while others run several, usually across multiple laboratories. Designing such programs has largely contended with difficult issues about which experimental components are necessary for a set of studies to be considered replications. However, another important consideration is that replication studies be designed to support sufficiently sensitive analyses. For instance, if hypothesis tests are to be conducted about replication, studies should be designed to ensure these tests are well-powered; if not, it can be difficult to determine conclusively if replication attempts succeeded or failed. This paper describes methods for designing ensembles of replication studies to ensure that they are both adequately sensitive and cost-efficient. It describes two potential analyses of replication studies—hypothesis tests and variance component estimation—and approaches to obtaining optimal designs for them. Using these results, it assesses the statistical power, precision of point estimators and optimality of the design used by the Many Labs Project and finds that while it may have been sufficiently powered to detect some larger differences between studies, other designs would have been less costly and/or produced more precise estimates or higher-powered hypothesis tests.  more » « less
Award ID(s):
1841075
PAR ID:
10400110
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series A: Statistics in Society
Volume:
184
Issue:
3
ISSN:
0964-1998
Page Range / eLocation ID:
p. 868-886
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Experimenter bias and expectancy effects have been well studied in the social sciences and even in human-computer interaction. They refer to the nonideal study-design choices made by experimenters which can unfairly influence the outcomes of their studies. While these biases need to be considered when designing any empirical study, they can be particularly significant in the context of replication studies which can stray from the studies being replicated in only a few admissible ways. Although there are general guidelines for making valid, unbiased choices in each of the several steps in experimental design, making such choices when conducting replication studies has not been well explored. We reviewed 16 replication studies in information visualization published in four top venues between 2008 to present to characterize how the study designs of the replication studies differed from those of the studies they replicated. We present our characterization categories which include the prevalence of crowdsourcing, and the commonly-found replication types and study-design differences. We draw guidelines based on these categories towards helping researchers make meaningful and unbiased decisions when designing replication studies. Our paper presents the first steps in gaining a larger understanding of this topic and contributes to the ongoing efforts of encouraging researchers to conduct and publish more replication studies in information visualization. 
    more » « less
  2. Abstract Over the past decade, there has been a rapid increase in the development of predictive models at the intersection of molecular ecology, genomics, and global change. The common goal of these ‘genomic forecasting’ models is to integrate genomic data with environmental and ecological data in a model to make quantitative predictions about the vulnerability of populations to climate change.Despite rapid methodological development and the growing number of systems in which genomic forecasts are made, the forecasts themselves are rarely evaluated in a rigorous manner with ground‐truth experiments. This study reviews the evaluation experiments that have been done, introduces important terminology regarding the evaluation of genomic forecasting models, and discusses important elements in the design and reporting of ground‐truth experiments.To date, experimental evaluations of genomic forecasts have found high variation in the accuracy of forecasts, but it is difficult to compare studies on a common ground due to different approaches and experimental designs. Additionally, some evaluations may be biased toward higher performance because training data and testing data are not independent. In addition to independence between training data and testing data, important elements in the design of an evaluation experiment include the construction and parameterization of the forecasting model, the choice of fitness proxies to measure for test data, the construction of the evaluation model, the choice of evaluation metric(s), the degree of extrapolation to novel environments or genotypes, and the sensitivity, uncertainty and reproducbility of forecasts.Although genomic forecasting methods are becoming more accessible, evaluating their limitations in a particular study system requires careful planning and experimentation. Meticulously designed evaluation experiments can clarify the robustness of the forecasts for application in management. Clear reporting of basic elements of experimental design will improve the rigour of evaluations, and in turn our understanding of why models work in some cases and not others. 
    more » « less
  3. null (Ed.)
    In many scenarios, information must be disseminated over intermittently-connected environments when network infrastructure becomes unavailable. Example scenarios include disasters in which first responders need to send updates about their critical tasks. If such updates pertain to a shared data set (e.g., pins on a map), their consistent dissemination is important. We can achieve this through causal ordering and consensus. Popular consensus algorithms, such as Paxos and Raft, are most suited for connected environments with reliable links. While some work has been done on designing consensus algorithms for intermittently-connected environments, such as the One-Third Rule (OTR) algorithm, there is need to improve their efficiency and timely completion. We propose CoNICE, a framework to ensure consistent dissemination of updates among users in intermittently-connected, infrastructure-less environments. It achieves efficiency by exploiting hierarchical namespaces for faster convergence, and lower communication overhead. CoNICE provides three levels of consistency to users' views, namely replication, causality and agreement. It uses epidemic propagation to provide adequate replication ratios, and optimizes and extends Vector Clocks to provide causality. To ensure agreement, CoNICE extends basic OTR to support long-term fragmentation and critical decision invalidation scenarios. We integrate the multilevel consistency schema of CoNICE, with a naming schema that follows a topic hierarchy-based dissemination framework, to improve functionality and performance. Performing city-scale simulation experiments, we demonstrate that CoNICE is effective in achieving its consistency goals, and is efficient and scalable in the time for convergence and utilized network resources. 
    more » « less
  4. In many scenarios, information must be disseminated over intermittently-connected environments when the network infrastructure becomes unavailable, e.g., during disasters where first responders need to send updates about critical tasks. If such updates pertain to a shared data set, dissemination consistency is important. This can be achieved through causal ordering and consensus. Popular consensus algorithms, e.g., Paxos, are most suited for connected environments. While some work has been done on designing consensus algorithms for intermittently-connected environments, such as the One-Third Rule (OTR) algorithm, there is still need to improve their efficiency and timely completion. We propose CoNICE, a framework to ensure consistent dissemination of updates among users in intermittently-connected, infrastructure-less environments. It achieves efficiency by exploiting hierarchical namespaces for faster convergence, and lower communication overhead. CoNICE provides three levels of consistency to users, namely replication, causality and agreement. It uses epidemic propagation to provide adequate replication ratios, and optimizes and extends Vector Clocks to provide causality. To ensure agreement, CoNICE extends OTR to also support long-term network fragmentation and decision invalidation scenarios; we define local and global consensus pertaining to within and across fragments respectively. We integrate CoNICE's consistency preservation with a naming schema that follows a topic hierarchy-based dissemination framework, to improve functionality and performance. Using the Heard-Of model formalism, we prove CoNICE's consensus to be correct. Our technique extends previously established proof methods for consensus in asynchronous environments. Performing city-scale simulation, we demonstrate CoNICE's scalability in achieving consistency in convergence time, utilization of network resources, and reduced energy consumption. 
    more » « less
  5. null (Ed.)
    Informal learning institutions, such as museums, science centers, and community-based organizations, play a critical role in providing opportunities for students to engage in science, technology, engineering, and mathematics (STEM) activities during out-of-school time hours. In recent years, thousands of studies, evaluations, and conference proceedings have been published measuring the impact that these programs have had on their participants. However, because studies of informal science education (ISE) programs vary considerably in how they are designed and in the quality of their designs, it is often quite difficult to assess their impact on participants. Knowing whether the outcomes reported by these studies are supported with sufficient evidence is important not only for maximizing participant impact, but also because there are considerable economic and human resources invested to support informal learning initiatives. To address this problem, I used the theories of impact analysis and triangulation as a framework for developing user-friendly rubrics for assessing quality of research designs and evidence of impact. I used two main sources, research-based recommendations from STEM governing bodies and feedback from a focus group, to identify criteria indicative of high-quality STEM research and study design. Accordingly, I developed three STEM Research Design Rubrics, one for quantitative studies, one for qualitative studies, and another for mixed methods studies, that can be used by ISE researchers, practitioners, and evaluators to assess research design quality. Likewise, I developed three STEM Impact Rubrics, one for quantitative studies, one for qualitative studies, and another for mixed methods studies, that can be used by ISE researchers, practitioners, and evaluators to assess evidence of outcomes. The rubrics developed in this study are practical tools that can be used by ISE researchers, practitioners, and evaluators to improve the field of informal science learning by increasing the quality of study design and for discerning whether studies or program evaluations are providing sufficient evidence of impact. 
    more » « less