skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Scientific Tests and Continuous Integration Strategies to Enhance Reproducibility in the Scientific Software Context
Continuous integration (CI) is a well-established technique in commercial and open-source software projects, although not routinely used in scientific publishing. In the scientific software context, CI can serve two functions to increase reproducibility of scientific results: providing an established platform for testing the reproducibility of these results, and demonstrating to other scientists how the code and data generate the published results. We explore scientific software testing and CI strategies using two articles published in the areas of applied mathematics and computational physics. We discuss lessons learned from reproducing these articles as well as examine and discuss existing tests. We introduce the notion of a scientific test as one that produces computational results from a published article. We then consider full result reproduction within a CI environment. If authors find their work too time or resource intensive to easily adapt to a CI context, we recommend the inclusion of results from reduced versions of their work (e.g., run at lower resolution, with shorter time scales, with smaller data sets) alongside their primary results within their article. While these smaller versions may be less interesting scientifically, they can serve to verify that published code and data are working properly. We demonstrate such reduction tests on the two articles studied.  more » « less
Award ID(s):
1839010 1763788
PAR ID:
10097170
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2nd International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS’19)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Reproducibility is fundamental to science, and an important component of reproducibility is computational reproducibility: the ability of a researcher to recreate the results of a published study using the original author’s raw data and code. Although most people agree that computational reproducibility is important, it is still difficult to achieve in practice. In this article, the authors describe their approach to enabling computational reproducibility for the 12 articles in this special issue of Socius about the Fragile Families Challenge. The approach draws on two tools commonly used by professional software engineers but not widely used by academic researchers: software containers (e.g., Docker) and cloud computing (e.g., Amazon Web Services). These tools made it possible to standardize the computing environment around each submission, which will ease computational reproducibility both today and in the future. Drawing on their successes and struggles, the authors conclude with recommendations to researchers and journals. 
    more » « less
  2. Tirthankar Ghosal, Sergi Blanco-Cuaresma (Ed.)
    Reproducibility is an important feature of science; experiments are retested, and analyses are repeated. Trust in the findings increases when consistent results are achieved. Despite the importance of reproducibility, significant work is often involved in these efforts, and some published findings may not be reproducible due to oversights or errors. In this paper, we examine a myriad of features in scholarly articles published in computer science conferences and journals and test how they correlate with reproducibility. We collected data from three different sources that labeled publications as either reproducible or irreproducible and employed statistical significance tests to identify features of those publications that hold clues about reproducibility. We found the readability of the scholarly article and accessibility of the software artifacts through hyperlinks to be strong signals noticeable amongst reproducible scholarly articles. 
    more » « less
  3. Reproducibility of results is a cornerstone of the scientific method. Scientific computing encounters two challenges when aiming for this goal. Firstly, reproducibility should not depend on details of the runtime environment, such as the compiler version or computing environment, so results are verifiable by third-parties. Secondly, different versions of software code executed in the same runtime environment should produce consistent numerical results for physical quantities. In this manuscript, we test the feasibility of reproducing scientific results obtained using the IllinoisGRMHD code that is part of an open-source community software for simulation in relativistic astrophysics, the Einstein Toolkit. We verify that numerical results of simulating a single isolated neutron star with IllinoisGRMHD can be reproduced, and compare them to results reported by the code authors in 2015. We use two different supercomputers: Expanse at SDSC, and Stampede2 at TACC. By compiling the source code archived along with the paper on both Expanse and Stampede2, we find that IllinoisGRMHD reproduces results published in its announcement paper up to errors comparable to round-off level changes in initial data parameters. We also verify that a current version of IllinoisGRMHD reproduces these results once we account for bug fixes which have occurred since the original publication. 
    more » « less
  4. Why are some research studies easy to reproduce while others are difficult? Casting doubt on the accuracy of scientific work is not fruitful, especially when an individual researcher cannot reproduce the claims made in the paper. There could be many subjective reasons behind the inability to reproduce a scientific paper. The field of Machine Learning (ML) faces a reproducibility crisis, and surveying a portion of published articles has resulted in a group realization that although sharing code repositories would be appreciable, code bases are not the end all be all for determining the reproducibility of an article. Various parties involved in the publication process have come forward to address the reproducibility crisis and solutions such as badging articles as reproducible, reproducibility checklists at conferences (NeurIPS, ICML, ICLR, etc.), and sharing artifacts on OpenReview come across as promising solutions to the core problem. The breadth of literature on reproducibility focuses on measures required to avoid ir-reproducibility, and there is not much research into the effort behind reproducing these articles. In this paper, we investigate the factors that contribute to the easiness and difficulty of reproducing previously published studies and report on the foundational framework to quantify effort of reproducibility. 
    more » « less
  5. The last few years has seen a substantial push toward “Open Data” by policy makers, researchers, archivists, and even the public. This article postulates that the value of data is not intrinsic but instead derives from its ability to produce knowledge; the extraction of which from data is not deterministic. The value of data is realized through a focus on the reproducibility of the findings from the data, which acknowledges the complexity of the leap from data to knowledge, and the inextricable interrelationships between data, software, computational environments and cyberinfrastructure, and knowledge. Modern information archiving practices have a long history and were shaped in a pre-digital world comprised of physical objects such as books, monographs, film, paper, and other physical artifacts. This article argues that “data,” the modern collection of digital bits representing empirical measurements, is a wholly new entity and not a digital analog to any physical object. It further argues that a focus on the interrelationships between digital artifacts and their unique properties, instead of Open Data alone, will instead produce an augmented and more useful understanding of knowledge when it is derived from digital data. Data-derived knowledge, represented by claims in the scholarly record, must persistently link to immutable versions of the digital artifacts from which it was derived, including 1) any data, 2) software that allows access to the data and the regeneration of those claims that rely on the version of the data, and 3) computational environment information including input parameters, function invocation sequences, and resource details. In this sense the epistemological gap between data and extracted knowledge can be closed. Datasets and software are often subject to change and revision, sometimes even with high velocity, and such changes imply new versions with new unique identifiers. We propose considering knowledge, rather than data in isolation, with a schematic model representing the interconnectedness of datasets, software, and computational information upon which its derivation depends. Capturing the interconnectedness of these digital artifacts, and their relationship to the knowledge they generate, is essential for supporting the reproducibility, transparency, and cognitive tractability of scientific claims derived from digital data. 
    more » « less