NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Case Study in Scientific Reproducibility from the Event Horizon Telescope (EHT)

https://doi.org/10.1109/eScience51609.2021.00045

Ketron, R.; Leonard, J.; Roachell, B.; Patel, R.; White, R.; Caino-Lores, S.; Tan, N.; Miles, P.; Vahi, K.; Deelman, E.; et al (September 2021, 2021 IEEE 17th International Conference on eScience (eScience))

Full Text Available
Trust but Verify: How to Leverage Policies, Workflows, and Infrastructure to Ensure Computational Reproducibility in Publication

https://doi.org/10.1162/99608f92.25982dcf

Willis, Craig; Stodden, Victoria (February 2021, Harvard Data Science Review)

Full Text Available
The data science life cycle: a disciplined approach to advancing data science as a science

https://doi.org/10.1145/3360646

Stodden, Victoria (June 2020, Communications of the ACM)

A cycle that traces ways to define the landscape of data science.
more » « less
Full Text Available
Building a Vision for Reproducibility in the Cyberinfrastructure Ecosystem: Leveraging Community Efforts

https://doi.org/10.14529/jsfi200106

Chapp, D.; Stodden, V.; Taufer, M. (March 2020, Supercomputing Frontiers and Innovations)

The scientific computing community has long taken a leadership role in understanding and assessing the relationship of reproducibility to cyberinfrastructure, ensuring that computational results - such as those from simulations - are "reproducible", that is, the same results are obtained when one re-uses the same input data, methods, software and analysis conditions. Starting almost a decade ago, the community has regularly published and advocated for advances in this area. In this article we trace this thinking and relate it to current national efforts, including the 2019 National Academies of Science, Engineering, and Medicine report on "Reproducibility and Replication in Science". To this end, this work considers high performance computing workflows that emphasize workflows combining traditional simulations (e.g. Molecular Dynamics simulations) with in situ analytics. We leverage an analysis of such workflows to (a) contextualize the 2019 National Academies of Science, Engineering, and Medicine report's recommendations in the HPC setting and (b) envision a path forward in the tradition of community driven approaches to reproducibility and the acceleration of science and discovery. The work also articulates avenues for future research at the intersection of transparency, reproducibility, and computational infrastructure that supports scientific discovery.
more » « less
Full Text Available
Beyond Open Data: A Model for Linking Digital Artifacts to Enable Reproducibility of Scientific Claims

https://doi.org/10.1145/3391800.3398172

Stodden, Victoria (January 2020, Third International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS'20))

The last few years has seen a substantial push toward “Open Data” by policy makers, researchers, archivists, and even the public. This article postulates that the value of data is not intrinsic but instead derives from its ability to produce knowledge; the extraction of which from data is not deterministic. The value of data is realized through a focus on the reproducibility of the findings from the data, which acknowledges the complexity of the leap from data to knowledge, and the inextricable interrelationships between data, software, computational environments and cyberinfrastructure, and knowledge. Modern information archiving practices have a long history and were shaped in a pre-digital world comprised of physical objects such as books, monographs, film, paper, and other physical artifacts. This article argues that “data,” the modern collection of digital bits representing empirical measurements, is a wholly new entity and not a digital analog to any physical object. It further argues that a focus on the interrelationships between digital artifacts and their unique properties, instead of Open Data alone, will instead produce an augmented and more useful understanding of knowledge when it is derived from digital data. Data-derived knowledge, represented by claims in the scholarly record, must persistently link to immutable versions of the digital artifacts from which it was derived, including 1) any data, 2) software that allows access to the data and the regeneration of those claims that rely on the version of the data, and 3) computational environment information including input parameters, function invocation sequences, and resource details. In this sense the epistemological gap between data and extracted knowledge can be closed. Datasets and software are often subject to change and revision, sometimes even with high velocity, and such changes imply new versions with new unique identifiers. We propose considering knowledge, rather than data in isolation, with a schematic model representing the interconnectedness of datasets, software, and computational information upon which its derivation depends. Capturing the interconnectedness of these digital artifacts, and their relationship to the knowledge they generate, is essential for supporting the reproducibility, transparency, and cognitive tractability of scientific claims derived from digital data.
more » « less
Full Text Available

Search for: All records