skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Running a Pre-exascale, Geographically Distributed, Multi-cloud Scientific Simulation
As we approach the Exascale era, it is important to verify that the existing frameworks and tools will still work at that scale. Moreover, public Cloud computing has been emerging as a viable solution for both prototyping and urgent computing. Using the elasticity of the Cloud, we have thus put in place a pre-exascale HTCondor setup for running a scientific simulation in the Cloud, with the chosen application being IceCube's photon propagation simulation. I.e. this was not a purely demonstration run, but it was also used to produce valuable and much needed scientific results for the IceCube collaboration. In order to reach the desired scale, we aggregated GPU resources across 8 GPU models from many geographic regions across Amazon Web Services, Microsoft Azure, and the Google Cloud Platform. Using this setup, we reached a peak of over 51k GPUs corresponding to almost 380 PFLOP32s, for a total integrated compute of about 100k GPU hours. In this paper we provide the description of the setup, the problems that were discovered and overcome, as well as a short description of the actual science output of the exercise.  more » « less
Award ID(s):
1841479
PAR ID:
10211894
Author(s) / Creator(s):
; ; ;
Editor(s):
Sadayappan, Ponnuswamy; Chamberlain, Bradford L.; Juckeland, Guido; Ltaief, Hatem
Date Published:
Journal Name:
ISC High Performance 2020
Volume:
12151
Page Range / eLocation ID:
23-40
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Scientific computing needs are growing dramatically with time and are expanding in science domains that were previously not compute intensive. When compute workflows spike well in excess of the capacity of their local compute resource, capacity should be temporarily provisioned from somewhere else to both meet deadlines and to increase scientific output. Public Clouds have become an attractive option due to their ability to be provisioned with minimal advance notice. The available capacity of cost-effective instances is not well understood. This paper presents expanding the IceCube's production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise. 
    more » « less
  2. The Gordon Bell winning AWP-ODC application has a long history of boosted performance with MVAPICH on both CPU and GPU-based architectures. This talk will highlight a recent compression support implemented by the MVAPICH team, and its benefits to the large-scale earthquake simulation on the leadership class computing systems. The presentation will conclude with a discussion of the opportunities and technical challenges associated with the development of earthquake simulation software for Exascale computing. 
    more » « less
  3. The Gordon Bell-winning AWP-ODC application continues to push the boundaries of earthquake simulation by leveraging the enhanced performance of MVAPICH on both CPU and GPU based architectures. This presentation highlights the recent improvements to the code and its application to broadband deterministic 3D wave propagation simulations of earthquake ground motions, incorporating high-resolution surface topography and detailed underground structures. The results of these simulations provide critical insights into the potential impacts of major earthquakes, contributing to more effective disaster preparedness and mitigation strategies. Additionally, the presentation will address the scientific and technical challenges encountered during the process and discuss the implications for future large-scale seismic studies on Exascale computing systems. 
    more » « less
  4. AWP-ODC is a 4th-order finite difference code used by the SCEC community for linear wave propagation, Iwan-type nonlinear dynamic rupture and wave propagation, and Strain Green Tensor simulation. We have ported and verified the CUDA-version of AWP-ODC-SGT, a reciprocal version used in the SCEC CyberShake project, to HIP so that it can also run on AMD GPUs. This code achieved sustained 32.6 Petaflop/s performance and 95.6% parallel efficiency at full scale on Frontier, a Leadership Computing Facility at Oak Ridge National Laboratory. The readiness of this community software on AMD Radeon Instinct GPUs and EPYC CPUs allows SCEC to take advantage of exascale systems to produce more realistic ground motions and accurate seismic hazard products. We have also deployed AWP-ODC to Azure to leverage the array of tools and services that Azure provides for tightly coupled HPC simulation on commercial cloud. We collaborated with Internet 2/Azure Accelerator supporting team, as part of Microsoft Internet2/Azure Accelerator for Research Fall 2022 Program, with Azure credits awarded through Cloudbank, an NSF-funded initiative. We demonstrate the AWP performance with a benchmark of ground motion simulation on various GPU based cloud instances, and a comparison of the cloud solution to on-premises bare-metal systems. AWP-ODC currently achieves excellent speedup and efficiency on CPU and GPU architectures. The Iwan-type dynamic rupture and wave propagation solver faces significant challenges, however, due to the increased computational workload with the number of yield surfaces chosen. Compared to linear solution, the Iwan model adds 10x-30x more computational time plus 5x-13x more memory consumption that require substantial code changes to obtain excellent performance. Supported by NSF’s Characteristic Science Applications (CSA) program for the Leadership-Class Computing Facility (LCCF) at Texas Advanced Computing Center (TACC), we are porting and improving the performance of this nonlinear AWP-ODC software, preparing for the next generation NSF LCCF system called Horizon, to be installed at TACC. During Texascale days on the current TACC’s Frontera, we carried out an Iwan-type nonlinear dynamic rupture and wave propagation simulation of a Mw7.8 scenario earthquake on the southern San Andreas fault. This simulation modeled 83 seconds of rupture with a grid spacing of 25 m to resolve frequencies up to 4 Hz with a minimum shear-wave velocity of 500 m/s. 
    more » « less
  5. SLATE (Software for Linear Algebra Targeting Exascale) is a distributed, dense linear algebra library targeting both CPU-only and GPU-accelerated systems, developed over the course of the Exascale Computing Project (ECP). While it began with several documents setting out its initial design, significant design changes occurred throughout its development. In some cases, these were anticipated: an early version used a simple consistency flag that was later replaced with a full-featured consistency protocol. In other cases, performance limitations and software and hardware changes prompted a redesign. Sequential communication tasks were parallelized; host-to-host MPI calls were replaced with GPU device-to-device MPI calls; more advanced algorithms such as Communication Avoiding LU and the Random Butterfly Transform (RBT) were introduced. Early choices that turned out to be cumbersome, error prone, or inflexible have been replaced with simpler, more intuitive, or more flexible designs. Applications have been a driving force, prompting a lighter weight queue class, nonuniform tile sizes, and more flexible MPI process grids. Of paramount importance has been building a portable library that works across several different GPU architectures – AMD, Intel, and NVIDIA – while keeping a clean and maintainable codebase. Here we explore the evolving design choices and their effects, both in terms of performance and software sustainability. 
    more » « less