As research projects grow more complex and researchers use a mix of tools - command-line scripts, science gateways, and Jupyter notebooks - it becomes increasingly difficult to track exactly how a final result was produced. Each tool often keeps its own logs, making it hard to reconstruct the full sequence of computational steps. This lack of end-to-end visibility poses a serious challenge for scientific reproducibility. Yet advanced computing remains a critical part of nearly every field of academic research, and researchers continue to rely on a wide range of interfaces to run their scientific software. To address this challenge, the Advanced Computing Interfaces group at the Texas Advanced Computing Center (TACC) created a system that collates logs from multiple sources - science gateways, Jupyter notebooks, and the Tapis platform - into one unified “audit trail.” The TACC Research Audit and Integration of Logs (TRAIL) system allows researchers and staff to follow the complete path a dataset or file took: from the moment it was first uploaded to TACC, through every step of computation, to the final result. This kind of tracking helps ensure scientific results can be reproduced and gives advanced computing services better insight into how data and resources are being used.
more »
« less
Extending Tapis Workflow Management Framework with Elastic Google Cloud Distributed System using CloudyCluster by Omnibond
The goal of a robust cyberinfrastructure (CI) ecosystem is to catalyse discovery and innovation. Tapis does this through offering a sustainable production-quality set of API services to support modern science and engineering research, which increasingly span geographically distributed data centers, instruments, experimental facilities, and a network of national and regional CI. Leveraging frameworks, such as Tapis, enables researchers to accomplish computational and data-intensive research in a secure, scalable, and reproducible way and allows them to focus on their research instead of the technology needed to accomplish it. This project aims to enable the integration of the Google Cloud Platform (GCP) and CloudyCluster resources into Tapis- supported science gateways to provide on-demand scaling needed by computational workflows. The new functionality uses Tapis event-driven Abaco Actors and CloudyCluster to create an elastic distributed cloud computing system on demand. This integration allows researchers and science gateways to augment cloud resources on top of existing local and national computing resources.
more »
« less
- Award ID(s):
- 1931575
- PAR ID:
- 10462242
- Date Published:
- Journal Name:
- Science Gateways 2022
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Summary The explosion of IoT devices and sensors in recent years has led to a demand for efficiently storing, processing and analyzing time‐series data. Geoscience researchers use time‐series data stores such as Hydroserver, Virtual Observatory and Ecological Informatics System (VOEIS), and Cloud‐Hosted Real‐time Data Service (CHORDS). Many of these tools require a great deal of infrastructure to deploy and expertise to manage and scale. The Tapis framework, an NSF funded project, provides science as a service APIs to allow researchers to achieve faster scientific results, by eliminating the need to set up a complex infrastructure stack. The University of Hawai'i (UH) and Texas Advanced Computing Center (TACC) have collaborated to develop an open source Tapis Streams API that builds on the concepts of the CHORDS time series data service to support research. This new hosted service allows storing, processing, annotating, archiving, and querying time‐series data in the Tapis multi‐user and multi‐tenant collaborative platform. The Streams API provides a hosted production level middleware service that enables new data‐driven event workflows capabilities that may be leveraged by researchers and Tapis powered science gateways for handling spatially indexed time‐series datasets.more » « less
-
The explosion of IoT devices and sensors in recent years has led to a demand for efficiently storing, processing and analyzing time-series data. Geoscience researchers use time-series data stores such as Hydroserver, VOEIS and CHORDS. Many of these tools require a great deal of infrastructure to deploy and expertise to manage and scale. Tapis's (formerly known as Agave) platform as a service provides a way to support researchers in a way that they are not responsible for the infrastructure and can focus on the science. The University of Hawaii (UH) and Texas Advanced Computing Center (TACC) have collaborated to develop a new API integration that combines Tapis with the CHORDS time series data service to support projects at both institutions for storing, annotating and querying time-series data. This new Streams API leverages the strengths of both the Tapis platform and CHORDS service to enable capabilities for supporting time-series data streams not available in either tool alone. These new capabilities may be leveraged by Tapis powered science gateways with needs for handling spatially indexed time-series data-sets for their researchers as they have been at UH and TACC.more » « less
-
Neuroscientists are increasingly relying on high performance/throughput computing resources for experimentation on voluminous data, analysis and visualization at multiple neural levels. Though current science gateways provide access to computing resources, datasets and tools specific to the disciplines, neuroscientists require guided knowledge discovery at various levels to accomplish their research/education tasks. The guidance can help them to navigate them through relevant publications, tools, topic associations and cloud platform options as they accomplish important research and education activities. To address this need and to spur research productivity and rapid learning platform development, we present “OnTimeRecommend”, a novel recommender system that comprises of several integrated recommender modules through RESTful web services. We detail a neuroscience use case in a CyNeuro science gateway, and show how the OnTimeRecommend design can enable novice/expert user interfaces, as well as template-driven control of heterogeneous cloud resources.more » « less
-
Abstract Science Gateways provide an easily accessible and powerful computing environment for researchers. These are built around a set of software tools that are frequently and heavily used by large number of researchers in specific domains. Science Gateways have been catering to a growing need of researchers for easy to use computational tools, however their usage model is typically single user-centric. As scientific research becomes ever more team oriented, the need driven by user-demand to support integrated collaborative capabilities in Science Gateways is natural progression. Ability to share data/results with others in an integrated manner is an important and frequently requested capability. In this article we will describe and discuss our work to provide a rich environment for data organization and data sharing by integrating the SeedMeLab (formerly SeedMe2) platform with two Science Gateways: CIPRES and GenApp. With this integration we also demonstrate SeedMeLab’s extensible features and how Science Gateways may incorporate and realize FAIR data principles in practice and transform into community data hubs.more » « less
An official website of the United States government

