Summary The explosion of IoT devices and sensors in recent years has led to a demand for efficiently storing, processing and analyzing time‐series data. Geoscience researchers use time‐series data stores such as Hydroserver, Virtual Observatory and Ecological Informatics System (VOEIS), and Cloud‐Hosted Real‐time Data Service (CHORDS). Many of these tools require a great deal of infrastructure to deploy and expertise to manage and scale. The Tapis framework, an NSF funded project, provides science as a service APIs to allow researchers to achieve faster scientific results, by eliminating the need to set up a complex infrastructure stack. The University of Hawai'i (UH) and Texas Advanced Computing Center (TACC) have collaborated to develop an open source Tapis Streams API that builds on the concepts of the CHORDS time series data service to support research. This new hosted service allows storing, processing, annotating, archiving, and querying time‐series data in the Tapis multi‐user and multi‐tenant collaborative platform. The Streams API provides a hosted production level middleware service that enables new data‐driven event workflows capabilities that may be leveraged by researchers and Tapis powered science gateways for handling spatially indexed time‐series datasets.
more »
« less
Detailed Functional Overview of an API and Workflow Engine for Scientific Research Computing
Constructing and executing reproducible workflows is fundamental to performing research in a variety of scientific domains. Many of the current commercial and open source solutions for workflow en- gineering impose constraints—either technical or budgetary—upon researchers, requiring them to use their limited funding on expensive cloud platforms or spend valuable time acquiring knowledge of software systems and processes outside of their domain expertise. Even though many commercial solutions offer free-tier services, they often do not meet the resource and architectural requirements (memory, data storage, compute time, networking, etc) for researchers to run their workflows effectively at scale. Tapis Workflows abstracts away the complexities of workflow creation and execution behind a web-based API with a simplified workflow model comprised of only pipelines and tasks. This paper will de- tail how Tapis Workflows approaches workflow management by exploring its domain model, the technologies used, application architecture, design patterns, how organizations are leveraging Tapis Workflows to solve unique problems in their scientific workflows, and this projects’s vision for a simple, open source, extensible, and easily deployable workflow engine.
more »
« less
- Award ID(s):
- 2229702
- PAR ID:
- 10451183
- Date Published:
- Journal Name:
- ACM
- ISSN:
- 2486-5400
- Page Range / eLocation ID:
- 8
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The Tapis Pods service is a novel open-source API within the Tapis platform which enables researchers to seamlessly manage Kubernetes containers, volumes, networking, and security at the Texas Advanced Computing Center (TACC). This paper explores the underlying operations, technologies, and workflows of the Tapis Pods service, showcasing its implementation and effectiveness. Additionally, we discuss current and potential use cases, highlighting the service's unique features, such as management capabilities, persistent storage, sharing, and automatically encrypted networking. Initial performance measurements against local Docker containers and alternative cloud solutions demonstrate the Tapis Pods service's competitive performance, emphasizing its value as a general interface for deploying user-defined containers.more » « less
-
Researchers collaborating from different locations need a method to capture and store scientific workflow provenance that guarantees provenance integrity and reproducibility. As modern science is moving towards greater data accessibility, researchers also need a platform for open access data sharing. We propose SciLedger, a blockchain-based platform that provides secure, trustworthy storage for scientific workflow provenance to reduce research fabrication and falsification. SciLedger utilizes a novel invalidation mechanism that only invalidates necessary provenance records. SciLedger also allows for workflows with complex structures to be stored on a single blockchain so that researchers can utilize existing data in their scientific workflows by branching from and merging existing workflows. Our experimental results show that SciLedger provides an able solution for maintaining academic integrity and research flexibility within scientific workflows.more » « less
-
Workflow management systems (WMSs) are commonly used to organize/automate sequences of tasks as workflows to accelerate scientific discoveries. During complex workflow modeling, a local interactive workflow environment is desirable, as users usually rely on their rich, local environments for fast prototyping and refinements before they consider using more powerful computing resources. However, existing WMSs do not simultaneously support local interactive workflow environments and HPC resources. In this paper, we present an on-demand access mechanism to remote HPC resources from desktop/laptopbased workflow management software to compose, monitor and analyze scientific workflows in the CyberWater project. Cyber- Water is an open-data and open-modeling software framework for environmental and water communities. In this work, we extend the open-model, open-data design of CyberWater with on-demand HPC accessing capacity. In particular, we design and implement the LaunchAgent library, which can be integrated into the local desktop environment to allow on-demand usage of remote resources for hydrology-related workflows. LaunchAgent manages authentication to remote resources, prepares the computationally-intensive or data-intensive tasks as batch jobs, submits jobs to remote resources, and monitors the quality of services for the users. LaunchAgent interacts seamlessly with other existing components in CyberWater, which is now able to provide advantages of both feature-rich desktop software experience and increased computation power through on-demand HPC/Cloud usage. In our evaluations, we demonstrate how a hydrology workflow that consists of both local and remote tasks can be constructed and show that the added on-demand HPC/Cloud usage helps speeding up hydrology workflows while allowing intuitive workflow configurations and execution using a desktop graphical user interface.more » « less
-
Scientific workflows have become ubiquitous across scientific fields, and their execution methods and systems continue to be the subject of research and development. Most experimental evaluations of these workflows rely on workflow instances, which can be either real-world or synthetic, to ensure relevance to current application domains or explore hypothetical/future scenarios. The WfCommons project addresses this need by providing data and tools to support such evaluations. In this paper, we present an overview of WfCommons and describe two recent developments. Firstly, we introduce a workflow execution "tracer" for NextFlow, which significantly enhances the set of real-world instances available in WfCommons. Secondly, we describe a workflow instance "translator" that enables the execution of any real-world or synthetic WfCommons workflow instance using Dask. Our contributions aim to provide researchers and practitioners with more comprehensive resources for evaluating scientific workflows.more » « less
An official website of the United States government

