The C-MĀIKI gateway is a science gateway that leverages a computational workload management API called Tapis to support modern, interoperable, and scalable microbiome data analysis. This project is focused on migrating an existing C-MĀIKI gateway pipeline from Tapis v2 to Tapis v3 so that it can take advantage of the new robust Tapis v3 features and stay modern. This requires three major steps: 1) Containerization of each existing microbiome workflow. 2) Create a new app definition for each of the workflows. 3) Enabling the ability to submit jobs to a SLURM scheduler inside of a singularity container to support the Nextflow workflow manager. This work presents the experience and challenges in upgrading the pipeline.
more »
« less
The C-MĀIKI Gateway: A Modern Science Platform for Analyzing Microbiome Data
In collaboration with the Center for Microbiome Analysis through Island Knowledge and Investigations (C-MĀIKI), the Hawaii EPSCoR Ike Wai project and the Hawaii Data Science Institute, a new science gateway, the C-MĀIKI gateway, was developed to support modern, interoperable and scalable microbiome data analysis. This gateway provides a web-based interface for accessing high-performance computing resources and storage to enable and support reproducible microbiome data analysis. The C-MĀIKI gateway is accelerating the analysis of microbiome data for Hawaii through ease of use and centralized infrastructure.
more »
« less
- PAR ID:
- 10343597
- Date Published:
- Journal Name:
- Practice and Experience in Advanced Research Computing
- Volume:
- 2022
- Page Range / eLocation ID:
- 1 to 7
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
. Granting agencies invest millions of dollars on the generation and analysis of data, making these products extremely valuable. However, without sufficient annotation of the methods used to collect and analyze the data, the ability to reproduce and reuse those products suffers. This lack of assurance of the quality and credibility of the data at the different stages in the research process essentially wastes much of the investment of time and funding and fails to drive research forward to the level of potential possible if everything was effectively annotated and disseminated to the wider research community. In order to address this issue for the Hawai'i Established Program to Stimulate Competitive Research (EPSCoR) project, a water science gateway was developed at the University of Hawai‘i (UH), called the ‘Ike Wai Gateway. In Hawaiian, ‘Ike means knowledge and Wai means water. The gateway supports research in hydrology and water management by providing tools to address questions of water sustainability in Hawai‘i. The gateway provides a framework for data acquisition, analysis, model integration, and display of data products. The gateway is intended to complement and integrate with the capabilities of the Consortium of Universities for the Advancement of Hydrologic Science's (CUAHSI) Hydroshare by providing sound data and metadata management capabilities for multi-domain field observations, analytical lab actions, and modeling outputs. Functionality provided by the gateway is supported by a subset of the CUAHSI’s Observations Data Model (ODM) delivered as centralized web based user interfaces and APIs supporting multi-domain data management, computation, analysis, and visualization tools to support reproducible science, modeling, data discovery, and decision support for the Hawai'i EPSCoR ‘Ike Wai research team and wider Hawai‘i hydrology community. By leveraging the Tapis platform, UH has constructed a gateway that ties data and advanced computing resources together to support diverse research domains including microbiology, geochemistry, geophysics, economics, and humanities, coupled with computational and modeling workflows delivered in a user friendly web interface with workflows for effectively annotating the project data and products. Disseminating results for the ‘Ike Wai project through the ‘Ike Wai data gateway and Hydroshare makes the research products accessible and reusable.more » « less
-
Building science gateways for humanities content poses new challenges to the science gateway community. Compared with science gateways devoted to scientific content, humanities-related projects usually require 1) processing data in various formats, such as text, image, video, etc., 2) constant public access from a broad audience, and therefore 3) reliable security, ideally with low maintenance. Many traditional science gateways are monolithic in design, which makes them easier to write, but they can be computationally inefficient when integrated with numerous scientific packages for data capture and pipeline processing. Since these packages tend to be single-threaded or nonmodular, they can create traffic bottlenecks when processing large numbers of requests. Moreover, these science gateways are usually challenging to resume development on due to long gaps between funding periods and the aging of the integrated scientific packages. In this paper, we study the problem of building science gateways for humanities projects by developing a service-based architecture, and present two such science gateways: the Moving Image Research Collections (MIRC) – a science gateway focusing on image analysis for digital surrogates of historical motion picture film, and SnowVision - a science gateway for studying pottery fragments in southeastern North America. For each science gateway, we present an overview of the background of the projects, and some unique challenges in their design and implementation. These two science gateways are deployed on XSEDE’s Jetstream academic clouding computing resource and are accessed through web interfaces. Apache Airavata middleware is used to manage the interactions between the web interface and the deep-learning-based (DL) backend service running on the Bridges graphics processing unit (GPU) cluster.more » « less
-
Advances in metagenomic sequencing have provided an unprecedented view of the microbial world, but untangling the web of microbe interdependencies and the complex relationship between microbiome and host is a major challenge in biology. New statistical methods are needed to analyze metagenomic data and infer these relationships. Focusing on amplicon sequencing data, we present methods for leveraging phylogenetic information in deep neural network models and for transfer learning from large data repositories. This approach is demonstrated in experiments using data from the Earth Microbiome Project (EMP) and a dataset of 1500 samples from Waimea Valley on the island of Oahu, Hawaii.more » « less
-
Over the last two decades, science gateways have become essential tools for supporting both research and education. The SimVascular application is an open source software package providing a complete pipeline from medical image data segmentation to patient-specific blood flow simulation and analysis. With an ever-increasing user base of students, educators, clinicians, and researchers, the development group wanted a user-friendly web portal for users to run SimVascular flow simulations and to be able to support a large number of users with minimum effort and also hide complexity of using HPCs. This paper discusses how the SimVascular Science Gateway became a tool for students, educators, and researchers of all levels and continues to gather and grow a strong research community.more » « less