skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Building a Vision for Reproducibility in the Cyberinfrastructure Ecosystem: Leveraging Community Efforts
The scientific computing community has long taken a leadership role in understanding and assessing the relationship of reproducibility to cyberinfrastructure, ensuring that computational results - such as those from simulations - are "reproducible", that is, the same results are obtained when one re-uses the same input data, methods, software and analysis conditions. Starting almost a decade ago, the community has regularly published and advocated for advances in this area. In this article we trace this thinking and relate it to current national efforts, including the 2019 National Academies of Science, Engineering, and Medicine report on "Reproducibility and Replication in Science". To this end, this work considers high performance computing workflows that emphasize workflows combining traditional simulations (e.g. Molecular Dynamics simulations) with in situ analytics. We leverage an analysis of such workflows to (a) contextualize the 2019 National Academies of Science, Engineering, and Medicine report's recommendations in the HPC setting and (b) envision a path forward in the tradition of community driven approaches to reproducibility and the acceleration of science and discovery. The work also articulates avenues for future research at the intersection of transparency, reproducibility, and computational infrastructure that supports scientific discovery.  more » « less
Award ID(s):
1941443 1839010 2138776
PAR ID:
10157271
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Supercomputing Frontiers and Innovations
Volume:
7
Issue:
1
ISSN:
2313-8734
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    One of the pathways by which the scientific community confirms the validity of a new scientific discovery is by repeating the research that produced it. When a scientific effort fails to independently confirm the computations or results of a previous study, some fear that it may be a symptom of a lack of rigor in science, while others argue that such an observed inconsistency can be an important precursor to new discovery. Concerns about reproducibility and replicability have been expressed in both scientific and popular media. As these concerns came to light, Congress requested that the National Academies of Sciences, Engineering, and Medicine conduct a study to assess the extent of issues related to reproducibility and replicability and to offer recommendations for improving rigor and transparency in scientific research. Reproducibility and Replicability in Science defines reproducibility and replicability and examines the factors that may lead to non-reproducibility and non-replicability in research. Unlike the typical expectation of reproducibility between two computations, expectations about replicability are more nuanced, and in some cases a lack of replicability can aid the process of scientific discovery. This report provides recommendations to researchers, academic institutions, journals, and funders on steps they can take to improve reproducibility and replicability in science. 
    more » « less
  2. There is a growing need to train a diverse range of students in engineering disciplines and a growing demand for a skilled workforce with graduate degrees (Pearson et al., 2022; National Academies of Sciences, Engineering, and Medicine, 2019; National Science Foundation, 1996). A team of specialists in engineering and organizational systems worked together on a grant sponsored by the National Science Foundation’s (NSF) Scholarships in Science, Technology, Engineering, and Mathematics (S-STEM) program to explore how evidence-based strategies used successfully at the undergraduate level might improve the recruitment, retention, and outcomes of graduate programs. In this study, we interviewed a sample of the stakeholders who support low-income, first-generation, and/or rural graduate engineering students, to gain insight into the barriers they face in their efforts. We used a thematic analysis of transcribed interviews to draw conclusions. We found seven themes describing the facilitators and seven themes describing the barriers that stakeholders face in supporting these students. Our findings have implications for researchers who would investigate and implement future organizational support systems as well as for the leaders who would design and implement an array of interventions as part of an organizational support system. 
    more » « less
  3. Abstract When the scientific dataset evolves or is reused in workflows creating derived datasets, the integrity of the dataset with its metadata information, including provenance, needs to be securely preserved while providing assurances that they are not accidentally or maliciously altered during the process. Providing a secure method to efficiently share and verify the data as well as metadata is essential for the reuse of the scientific data. The National Science Foundation (NSF) funded Open Science Chain (OSC) utilizes consortium blockchain to provide a cyberinfrastructure solution to maintain integrity of the provenance metadata for published datasets and provides a way to perform independent verification of the dataset while promoting reuse and reproducibility. The NSF- and National Institutes of Health (NIH)-funded Neuroscience Gateway (NSG) provides a freely available web portal that allows neuroscience researchers to execute computational data analysis pipeline on high performance computing resources. Combined, the OSC and NSG platforms form an efficient, integrated framework to automatically and securely preserve and verify the integrity of the artifacts used in research workflows while using the NSG platform. This paper presents the results of the first study that integrates OSC–NSG frameworks to track the provenance of neurophysiological signal data analysis to study brain network dynamics using the Neuro-Integrative Connectivity tool, which is deployed in the NSG platform. Database URL: https://www.opensciencechain.org. 
    more » « less
  4. null (Ed.)
    Biomedical research data sets are becoming larger and more complex, and computing capabilities are expanding to enable transformative scientific results. The National Institutes of Health's (NIH's) National Library of Medicine (NLM) has the unique role of ensuring that biomedical research data are findable, accessible, interoperable, and reusable in an ethical manner. Tools that forecast the costs of long-term data preservation could be useful as the cost to curate and manage these data in meaningful ways continues to increase, as could stewardship to assess and maintain data that have future value. The National Academies of Sciences, Engineering, and Medicine convened a workshop on July 11-12, 2019 to gather insight and information in order to develop and demonstrate a framework for forecasting long-term costs for preserving, archiving, and accessing biomedical data. Presenters and attendees discussed tools and practices that NLM could use to help researchers and funders better integrate risk management practices and considerations into data preservation, archiving, and accessing decisions; methods to encourage NIH-funded researchers to consider, update, and track lifetime data; and burdens on the academic researchers and industry staff to implement these tools, methods, and practices. This publication summarizes the presentations and discussion of the workshop. 
    more » « less
  5. null (Ed.)
    Scientific data, along with its analysis, accuracy, completeness, and reproducibility, plays a vital role in advancing science and engineering. Open Science Chain (OSC) provides a Cyberinfrastructure platform, built using distributed ledger technologies, where verification information about scientific dataset is stored and managed in a consortium blockchain. Researchers have the ability to independently verify the authenticity of scientific results using the information stored with OSC. Researchers can also build research workflows by linking data entries in the ledger and external repositories such as GitHub that will allow for detailed provenance tracking. OSC enables answers to questions such as: how can we ensure research integrity when different research groups share and work on the same datasets across the world? Is it possible to enable quick verification of the exact data sets that were used for particular published research? Can we check the provenance of the data used in the research? In this poster, we highlight our work in building a secure, scalable architecture for OSC including developing a security module for storing identities that can be used by the researchers of science gateways communities to increase the confidence of their scientific results. 
    more » « less