skip to main content

Title: Accelerating complex modeling workflows in CyberWater using on-demand HPC/Cloud resources
Workflow management systems (WMSs) are commonly used to organize/automate sequences of tasks as workflows to accelerate scientific discoveries. During complex workflow modeling, a local interactive workflow environment is desirable, as users usually rely on their rich, local environments for fast prototyping and refinements before they consider using more powerful computing resources. However, existing WMSs do not simultaneously support local interactive workflow environments and HPC resources. In this paper, we present an on-demand access mechanism to remote HPC resources from desktop/laptopbased workflow management software to compose, monitor and analyze scientific workflows in the CyberWater project. Cyber- Water is an open-data and open-modeling software framework for environmental and water communities. In this work, we extend the open-model, open-data design of CyberWater with on-demand HPC accessing capacity. In particular, we design and implement the LaunchAgent library, which can be integrated into the local desktop environment to allow on-demand usage of remote resources for hydrology-related workflows. LaunchAgent manages authentication to remote resources, prepares the computationally-intensive or data-intensive tasks as batch jobs, submits jobs to remote resources, and monitors the quality of services for the users. LaunchAgent interacts seamlessly with other existing components in CyberWater, which is now able to provide advantages of both feature-rich desktop software experience and increased computation power through on-demand HPC/Cloud usage. In our evaluations, we demonstrate how a hydrology workflow that consists of both local and remote tasks more » can be constructed and show that the added on-demand HPC/Cloud usage helps speeding up hydrology workflows while allowing intuitive workflow configurations and execution using a desktop graphical user interface. « less
; ; ; ; ; ; ; ;
Award ID(s):
1835817 1835785
Publication Date:
Journal Name:
The 17th IEEE International Conference on eScience
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background Quantification of gene expression from RNA-seq data is a prerequisite for transcriptome analysis such as differential gene expression analysis and gene co-expression network construction. Individual RNA-seq experiments are larger and combining multiple experiments from sequence repositories can result in datasets with thousands of samples. Processing hundreds to thousands of RNA-seq data can result in challenges related to data management, access to sufficient computational resources, navigation of high-performance computing (HPC) systems, installation of required software dependencies, and reproducibility. Processing of larger and deeper RNA-seq experiments will become more common as sequencing technology matures. Results GEMmaker, is a nf-core compliant, Nextflow workflow, that quantifies gene expression from small to massive RNA-seq datasets. GEMmaker ensures results are highly reproducible through the use of versioned containerized software that can be executed on a single workstation, institutional compute cluster, Kubernetes platform or the cloud. GEMmaker supports popular alignment and quantification tools providing results in raw and normalized formats. GEMmaker is unique in that it can scale to process thousands of local or remote stored samples without exceeding available data storage. Conclusions Workflows that quantify gene expression are not new, and many already address issues of portability, reusability, and scale in terms of accessmore »to CPUs. GEMmaker provides these benefits and adds the ability to scale despite low data storage infrastructure. This allows users to process hundreds to thousands of RNA-seq samples even when data storage resources are limited. GEMmaker is freely available and fully documented with step-by-step setup and execution instructions.« less
  2. As large-scale scientific simulations and big data analyses become more popular, it is increasingly more expensive to store huge amounts of raw simulation results to perform post-analysis. To minimize the expensive data I/O, “in-situ” analysis is a promising approach, where data analysis applications analyze the simulation generated data on the fly without storing it first. However, it is challenging to organize, transform, and transport data at scales between two semantically different ecosystems due to the distinct software and hardware difference. To tackle these challenges, we design and implement the X-Composer framework. X-Composer connects cross-ecosystem applications to form an “in-situ” scientific workflow, and provides a unified approach and recipe for supporting such hybrid in-situ workflows on distributed heterogeneous resources. X-Composer reorganizes simulation data as continuous data streams and feeds them seamlessly into the Cloud-based stream processing services to minimize I/O overheads. For evaluation, we use X-Composer to set up and execute a cross-ecosystem workflow, which consists of a parallel Computational Fluid Dynamics simulation running on HPC, and a distributed Dynamic Mode Decomposition analysis application running on Cloud. Our experimental results show that X-Composer can seamlessly couple HPC and Big Data jobs in their own native environments, achieve good scalability, and providemore »high-fidelity analytics for ongoing simulations in real-time.« less
  3. Composable infrastructure holds the promise of accelerating the pace of academic research and discovery by enabling researchers to tailor the resources of a machine (e.g., GPUs, storage, NICs), on-demand, to address application needs. We were first introduced to composable infrastructure in 2018, and at the same time, there was growing demand among our College of Engineering faculty for GPU systems for data science, artificial intelligence / machine learning / deep learning, and visualization. Many purchased their own individual desktop or deskside systems, a few pursued more costly cloud and HPC solutions, and others looked to the College or campus computer center for GPU resources which, at the time, were scarce. After surveying the diverse needs of our faculty and studying product offerings by a few nascent startups in the composable infrastructure sector, we applied for and received a grant from the National Science Foundation in November 2019 to purchase a mid-scale system, configured to our specifications, for use by faculty and students for research and research training. This paper describes our composable infrastructure solution and implementation for our academic community. Given how modern workflows are progressively moving to containers and cloud frameworks (using Kubernetes) and to programming notebooks (primarily Jupyter),more »both for ease of use and for ensuring reproducible experiments, we initially adapted these tools for our system. We have since made it simpler to use our system, and now provide our users with a public facing JupyterHub server. We also added an expansion chassis to our system to enable composable co-location, which is a shared central architecture in which our researchers can insert and integrate specialized resources (GPUs, accelerators, networking cards, etc.) needed for their research. In February 2020, installation of our system was finalized and made operational and we began providing access to faculty in the College of Engineering. Now, two years later, it is used by over 40 faculty and students plus some external collaborators for research and research training. Their use cases and experiences are briefly described in this paper. Composable infrastructure has proven to be a useful computational system for workload variability, uneven applications, and modern workflows in academic environments.« less
  4. The management of security credentials (e.g., passwords, secret keys) for computational science workflows is a burden for scientists and information security officers. Problems with credentials (e.g., expiration, privilege mismatch) cause workflows to fail to fetch needed input data or store valuable scientific results, distracting scientists from their research by requiring them to diagnose the problems, re-run their computations, and wait longer for their results. In this paper, we introduce SciTokens, open source software to help scientists manage their security credentials more reliably and securely. We describe the SciTokens system architecture, design, and implementation addressing use cases from the Laser Interferometer Gravitational-Wave Observatory (LIGO) Scientific Collaboration and the Large Synoptic Survey Telescope (LSST) projects. We also present our integration with widely-used software that supports distributed scientific computing, including HTCondor, CVMFS, and XrootD. SciTokens uses IETF-standard OAuth tokens for capability-based secure access to remote scientific data. The access tokens convey the specific authorizations needed by the workflows, rather than general-purpose authentication impersonation credentials, to address the risks of scientific workflows running on distributed infrastructure including NSF resources (e.g., LIGO Data Grid, Open Science Grid, XSEDE) and public clouds (e.g., Amazon Web Services, Google Cloud, Microsoft Azure). By improving the interoperability and securitymore »of scientific workflows, SciTokens 1) enables use of distributed computing for scientific domains that require greater data protection and 2) enables use of more widely distributed computing resources by reducing the risk of credential abuse on remote systems.« less
  5. Scientific workflows drive most modern large-scale science breakthroughs by allowing scientists to define their computations as a set of jobs executed in a given order based on their data dependencies. Workflow management systems (WMSs) have become key to automating scientific workflows-executing computational jobs and orchestrating data transfers between those jobs running on complex high-performance computing (HPC) platforms. Traditionally, WMSs use files to communicate between jobs: a job writes out files that are read by other jobs. However, HPC machines face a growing gap between their storage and compute capabilities. To address that concern, the scientific community has adopted a new approach called in situ, which bypasses costly parallel filesystem I/O operations with faster in-memory or in-network communications. When using in situ approaches, communication and computations can be interleaved. In this work, we leverage the Decaf in situ dataflow framework to accelerate task-based scientific workflows managed by the Pegasus WMS, by replacing file communications with faster MPI messaging. We propose a new execution engine that uses Decaf to manage communications within a sub-workflow (i.e., set of jobs) to optimize inter-job communications. We consider two workflows in this study: (i) a synthetic workflow that benchmarks and compares file- and MPI-based communication; andmore »(ii) a realistic bioinformatics workflow that computes mu-tational overlaps in the human genome. Experiments show that in situ communication can improve the bioinformatics workflow execution time by 22% to 30% compared with file communication. Our results motivate further opportunities and challenges for bridging traditional WMSs with in situ frameworks.« less