skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, June 12 until 2:00 AM ET on Friday, June 13 due to maintenance. We apologize for the inconvenience.


Title: Reproducible Workflow on a Public Cloud for Computational Fluid Dynamics
In a new effort to make our research transparent and reproducible by others, we developed a workflow to run and share computational studies on the public cloud Microsoft Azure. It uses Docker containers to create an image of the application software stack. We also adopt several tools that facilitate creating and managing virtual machines on compute nodes and submitting jobs to these nodes. The configuration files for these tools are part of an expanded "reproducibility package" that includes workflow definitions for cloud computing, input files and instructions. This facilitates re-creating the cloud environment to re-run the computations under identical conditions. We also show that cloud offerings are now adequate to complete computational fluid dynamics studies with in-house research software that uses parallel computing with GPUs. We share with readers what we have learned from nearly two years of using Azure cloud to enhance transparency and reproducibility in our computational simulations.  more » « less
Award ID(s):
1747669
PAR ID:
10125028
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Computing in Science & Engineering
ISSN:
1521-9615
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The management of security credentials (e.g., passwords, secret keys) for computational science workflows is a burden for scientists and information security officers. Problems with credentials (e.g., expiration, privilege mismatch) cause workflows to fail to fetch needed input data or store valuable scientific results, distracting scientists from their research by requiring them to diagnose the problems, re-run their computations, and wait longer for their results. In this paper, we introduce SciTokens, open source software to help scientists manage their security credentials more reliably and securely. We describe the SciTokens system architecture, design, and implementation addressing use cases from the Laser Interferometer Gravitational-Wave Observatory (LIGO) Scientific Collaboration and the Large Synoptic Survey Telescope (LSST) projects. We also present our integration with widely-used software that supports distributed scientific computing, including HTCondor, CVMFS, and XrootD. SciTokens uses IETF-standard OAuth tokens for capability-based secure access to remote scientific data. The access tokens convey the specific authorizations needed by the workflows, rather than general-purpose authentication impersonation credentials, to address the risks of scientific workflows running on distributed infrastructure including NSF resources (e.g., LIGO Data Grid, Open Science Grid, XSEDE) and public clouds (e.g., Amazon Web Services, Google Cloud, Microsoft Azure). By improving the interoperability and security of scientific workflows, SciTokens 1) enables use of distributed computing for scientific domains that require greater data protection and 2) enables use of more widely distributed computing resources by reducing the risk of credential abuse on remote systems. 
    more » « less
  2. AWP-ODC is a 4th-order finite difference code used by the SCEC community for linear wave propagation, Iwan-type nonlinear dynamic rupture and wave propagation, and Strain Green Tensor simulation. We have ported and verified the CUDA-version of AWP-ODC-SGT, a reciprocal version used in the SCEC CyberShake project, to HIP so that it can also run on AMD GPUs. This code achieved sustained 32.6 Petaflop/s performance and 95.6% parallel efficiency at full scale on Frontier, a Leadership Computing Facility at Oak Ridge National Laboratory. The readiness of this community software on AMD Radeon Instinct GPUs and EPYC CPUs allows SCEC to take advantage of exascale systems to produce more realistic ground motions and accurate seismic hazard products. We have also deployed AWP-ODC to Azure to leverage the array of tools and services that Azure provides for tightly coupled HPC simulation on commercial cloud. We collaborated with Internet 2/Azure Accelerator supporting team, as part of Microsoft Internet2/Azure Accelerator for Research Fall 2022 Program, with Azure credits awarded through Cloudbank, an NSF-funded initiative. We demonstrate the AWP performance with a benchmark of ground motion simulation on various GPU based cloud instances, and a comparison of the cloud solution to on-premises bare-metal systems. AWP-ODC currently achieves excellent speedup and efficiency on CPU and GPU architectures. The Iwan-type dynamic rupture and wave propagation solver faces significant challenges, however, due to the increased computational workload with the number of yield surfaces chosen. Compared to linear solution, the Iwan model adds 10x-30x more computational time plus 5x-13x more memory consumption that require substantial code changes to obtain excellent performance. Supported by NSF’s Characteristic Science Applications (CSA) program for the Leadership-Class Computing Facility (LCCF) at Texas Advanced Computing Center (TACC), we are porting and improving the performance of this nonlinear AWP-ODC software, preparing for the next generation NSF LCCF system called Horizon, to be installed at TACC. During Texascale days on the current TACC’s Frontera, we carried out an Iwan-type nonlinear dynamic rupture and wave propagation simulation of a Mw7.8 scenario earthquake on the southern San Andreas fault. This simulation modeled 83 seconds of rupture with a grid spacing of 25 m to resolve frequencies up to 4 Hz with a minimum shear-wave velocity of 500 m/s. 
    more » « less
  3. Cloud computing has become a major approach to help reproduce computational experiments. Yet there are still two main difficulties in reproducing batch based big data analytics (including descriptive and predictive analytics) in the cloud. The first is how to automate end-to-end scalable execution of analytics including distributed environment provisioning, analytics pipeline description, parallel execution, and resource termination. The second is that an application developed for one cloud is difficult to be reproduced in another cloud, a.k.a. vendor lock-in problem. To tackle these problems, we leverage serverless computing and containerization techniques for automated scalable execution and reproducibility, and utilize the adapter design pattern to enable application portability and reproducibility across different clouds. We propose and develop an open-source toolkit that supports 1) fully automated end-to-end execution and reproduction via a single command, 2) automated data and configuration storage for each execution, 3) flexible client modes based on user preferences, 4) execution history query, and 5) simple reproduction of existing executions in the same environment or a different environment. We did extensive experiments on both AWS and Azure using four big data analytics applications that run on virtual CPU/GPU clusters. The experiments show our toolkit can achieve good execution performance, scalability, and efficient reproducibility for cloud-based big data analytics. 
    more » « less
  4. Reproducibility is fundamental to science, and an important component of reproducibility is computational reproducibility: the ability of a researcher to recreate the results of a published study using the original author’s raw data and code. Although most people agree that computational reproducibility is important, it is still difficult to achieve in practice. In this article, the authors describe their approach to enabling computational reproducibility for the 12 articles in this special issue of Socius about the Fragile Families Challenge. The approach draws on two tools commonly used by professional software engineers but not widely used by academic researchers: software containers (e.g., Docker) and cloud computing (e.g., Amazon Web Services). These tools made it possible to standardize the computing environment around each submission, which will ease computational reproducibility both today and in the future. Drawing on their successes and struggles, the authors conclude with recommendations to researchers and journals. 
    more » « less
  5. The scientific computing community has long taken a leadership role in understanding and assessing the relationship of reproducibility to cyberinfrastructure, ensuring that computational results - such as those from simulations - are "reproducible", that is, the same results are obtained when one re-uses the same input data, methods, software and analysis conditions. Starting almost a decade ago, the community has regularly published and advocated for advances in this area. In this article we trace this thinking and relate it to current national efforts, including the 2019 National Academies of Science, Engineering, and Medicine report on "Reproducibility and Replication in Science". To this end, this work considers high performance computing workflows that emphasize workflows combining traditional simulations (e.g. Molecular Dynamics simulations) with in situ analytics. We leverage an analysis of such workflows to (a) contextualize the 2019 National Academies of Science, Engineering, and Medicine report's recommendations in the HPC setting and (b) envision a path forward in the tradition of community driven approaches to reproducibility and the acceleration of science and discovery. The work also articulates avenues for future research at the intersection of transparency, reproducibility, and computational infrastructure that supports scientific discovery. 
    more » « less