skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: SciLedger: A Blockchain-based Scientific Workflow Provenance and Data Sharing Platform
Researchers collaborating from different locations need a method to capture and store scientific workflow provenance that guarantees provenance integrity and reproducibility. As modern science is moving towards greater data accessibility, researchers also need a platform for open access data sharing. We propose SciLedger, a blockchain-based platform that provides secure, trustworthy storage for scientific workflow provenance to reduce research fabrication and falsification. SciLedger utilizes a novel invalidation mechanism that only invalidates necessary provenance records. SciLedger also allows for workflows with complex structures to be stored on a single blockchain so that researchers can utilize existing data in their scientific workflows by branching from and merging existing workflows. Our experimental results show that SciLedger provides an able solution for maintaining academic integrity and research flexibility within scientific workflows.  more » « less
Award ID(s):
2051127
PAR ID:
10404755
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE International Conference on Collaboration and Internet Computing (CIC)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Scientific data, its analysis, accuracy, completeness, and reproducibility play a vital role in advancing science and engineering. Open Science Chain (OSC) is a cyberinfrastructure platform built using the Hyperledger Fabric (HLF) blockchain technology to address issues related to data reproducibility and accountability in scientific research. OSC preserves the integrity of research datasets and enables different research groups to share datasets with the integrity information. Additionally, it enables quick verification of the exact datasets that were used for a particular published research and tracks its provenance. In this paper, we describe OSC’s command line utility that will preserve the integrity of research datasets from within the researchers’ environment or from remote systems such as HPC resources or campus clusters used for research. The Python-based command line utility can be seamlessly integrated within research workflows and provides an easy way to preserve the integrity of research data in OSC blockchain platform. 
    more » « less
  2. null (Ed.)
    Scientific data, along with its analysis, accuracy, completeness, and reproducibility, plays a vital role in advancing science and engineering. Open Science Chain (OSC) provides a Cyberinfrastructure platform, built using distributed ledger technologies, where verification information about scientific dataset is stored and managed in a consortium blockchain. Researchers have the ability to independently verify the authenticity of scientific results using the information stored with OSC. Researchers can also build research workflows by linking data entries in the ledger and external repositories such as GitHub that will allow for detailed provenance tracking. OSC enables answers to questions such as: how can we ensure research integrity when different research groups share and work on the same datasets across the world? Is it possible to enable quick verification of the exact data sets that were used for particular published research? Can we check the provenance of the data used in the research? In this poster, we highlight our work in building a secure, scalable architecture for OSC including developing a security module for storing identities that can be used by the researchers of science gateways communities to increase the confidence of their scientific results. 
    more » « less
  3. Abstract When the scientific dataset evolves or is reused in workflows creating derived datasets, the integrity of the dataset with its metadata information, including provenance, needs to be securely preserved while providing assurances that they are not accidentally or maliciously altered during the process. Providing a secure method to efficiently share and verify the data as well as metadata is essential for the reuse of the scientific data. The National Science Foundation (NSF) funded Open Science Chain (OSC) utilizes consortium blockchain to provide a cyberinfrastructure solution to maintain integrity of the provenance metadata for published datasets and provides a way to perform independent verification of the dataset while promoting reuse and reproducibility. The NSF- and National Institutes of Health (NIH)-funded Neuroscience Gateway (NSG) provides a freely available web portal that allows neuroscience researchers to execute computational data analysis pipeline on high performance computing resources. Combined, the OSC and NSG platforms form an efficient, integrated framework to automatically and securely preserve and verify the integrity of the artifacts used in research workflows while using the NSG platform. This paper presents the results of the first study that integrates OSC–NSG frameworks to track the provenance of neurophysiological signal data analysis to study brain network dynamics using the Neuro-Integrative Connectivity tool, which is deployed in the NSG platform. Database URL: https://www.opensciencechain.org. 
    more » « less
  4. AI (artificial intelligence)-based analysis of geospatial data has gained a lot of attention. Geospatial datasets are multi-dimensional; have spatiotemporal context; exist in disparate formats; and require sophisticated AI workflows that include not only the AI algorithm training and testing, but also data preprocessing and result post-processing. This complexity poses a huge challenge when it comes to full-stack AI workflow management, as researchers often use an assortment of time-intensive manual operations to manage their projects. However, none of the existing workflow management software provides a satisfying solution on hybrid resources, full file access, data flow, code control, and provenance. This paper introduces a new system named Geoweaver to improve the efficiency of full-stack AI workflow management. It supports linking all the preprocessing, AI training and testing, and post-processing steps into a single automated workflow. To demonstrate its utility, we present a use case in which Geoweaver manages end-to-end deep learning for in-time crop mapping using Landsat data. We show how Geoweaver effectively removes the tedium of managing various scripts, code, libraries, Jupyter Notebooks, datasets, servers, and platforms, greatly reducing the time, cost, and effort researchers must spend on such AI-based workflows. The concepts demonstrated through Geoweaver serve as an important building block in the future of cyberinfrastructure for AI research. 
    more » « less
  5. Scientific workflows have become ubiquitous across scientific fields, and their execution methods and systems continue to be the subject of research and development. Most experimental evaluations of these workflows rely on workflow instances, which can be either real-world or synthetic, to ensure relevance to current application domains or explore hypothetical/future scenarios. The WfCommons project addresses this need by providing data and tools to support such evaluations. In this paper, we present an overview of WfCommons and describe two recent developments. Firstly, we introduce a workflow execution "tracer" for NextFlow, which significantly enhances the set of real-world instances available in WfCommons. Secondly, we describe a workflow instance "translator" that enables the execution of any real-world or synthetic WfCommons workflow instance using Dask. Our contributions aim to provide researchers and practitioners with more comprehensive resources for evaluating scientific workflows. 
    more » « less