This content will become publicly available on August 19, 2025
- Award ID(s):
- 2311833
- PAR ID:
- 10538017
- Publisher / Repository:
- Annual MVAPICH User Group (MUG) Conference
- Date Published:
- Format(s):
- Medium: X
- Location:
- Columbus
- Sponsoring Org:
- National Science Foundation
More Like this
-
AWP-ODC is a 4th-order finite difference code used by the SCEC community for linear wave propagation, Iwan-type nonlinear dynamic rupture and wave propagation, and Strain Green Tensor simulation. We have ported and verified the CUDA-version of AWP-ODC-SGT, a reciprocal version used in the SCEC CyberShake project, to HIP so that it can also run on AMD GPUs. This code achieved sustained 32.6 Petaflop/s performance and 95.6% parallel efficiency at full scale on Frontier, a Leadership Computing Facility at Oak Ridge National Laboratory. The readiness of this community software on AMD Radeon Instinct GPUs and EPYC CPUs allows SCEC to take advantage of exascale systems to produce more realistic ground motions and accurate seismic hazard products. We have also deployed AWP-ODC to Azure to leverage the array of tools and services that Azure provides for tightly coupled HPC simulation on commercial cloud. We collaborated with Internet 2/Azure Accelerator supporting team, as part of Microsoft Internet2/Azure Accelerator for Research Fall 2022 Program, with Azure credits awarded through Cloudbank, an NSF-funded initiative. We demonstrate the AWP performance with a benchmark of ground motion simulation on various GPU based cloud instances, and a comparison of the cloud solution to on-premises bare-metal systems. AWP-ODC currently achieves excellent speedup and efficiency on CPU and GPU architectures. The Iwan-type dynamic rupture and wave propagation solver faces significant challenges, however, due to the increased computational workload with the number of yield surfaces chosen. Compared to linear solution, the Iwan model adds 10x-30x more computational time plus 5x-13x more memory consumption that require substantial code changes to obtain excellent performance. Supported by NSF’s Characteristic Science Applications (CSA) program for the Leadership-Class Computing Facility (LCCF) at Texas Advanced Computing Center (TACC), we are porting and improving the performance of this nonlinear AWP-ODC software, preparing for the next generation NSF LCCF system called Horizon, to be installed at TACC. During Texascale days on the current TACC’s Frontera, we carried out an Iwan-type nonlinear dynamic rupture and wave propagation simulation of a Mw7.8 scenario earthquake on the southern San Andreas fault. This simulation modeled 83 seconds of rupture with a grid spacing of 25 m to resolve frequencies up to 4 Hz with a minimum shear-wave velocity of 500 m/s.more » « less
-
The Gordon Bell-winning AWP-ODC application continues to push the boundaries of earthquake simulation by leveraging the enhanced performance of MVAPICH on both CPU and GPU based architectures. This presentation highlights the recent improvements to the code and its application to broadband deterministic 3D wave propagation simulations of earthquake ground motions, incorporating high-resolution surface topography and detailed underground structures. The results of these simulations provide critical insights into the potential impacts of major earthquakes, contributing to more effective disaster preparedness and mitigation strategies. Additionally, the presentation will address the scientific and technical challenges encountered during the process and discuss the implications for future large-scale seismic studies on Exascale computing systems.more » « less
-
null (Ed.)Dense linear algebra (DLA) has historically been in the vanguard of software that must be adapted first to hardware changes. This is because DLA is both critical to the accuracy and performance of so many different types of applications, and because they have proved to be outstanding vehicles for finding and implementing solutions to the problems that novel architectures pose. Therefore, in this paper we investigate the portability of the MAGMA DLA library to the latest AMD GPUs.We use auto tools to convert the CUDA code in MAGMA to the Heterogeneous-Computing Interface for Portability (HIP) language. MAGMA provides LAPACK for GPUs and benchmarks for fundamental DLA routines ranging from BLAS to dense factorizations, linear systems and eigen-problem solvers. We port these routines to HIP and quantify currently achievable performance through the MAGMA benchmarks for the main workload algorithms on MI25 and MI50 AMD GPUs. Comparison with performance roofline models and theoretical expectations are used to identify current limitations and directions for future improvements.more » « less
-
The Gordon Bell winning AWP-ODC application has a long history of boosted performance with MVAPICH on both CPU and GPU-based architectures. This talk will highlight a recent compression support implemented by the MVAPICH team, and its benefits to the large-scale earthquake simulation on the leadership class computing systems. The presentation will conclude with a discussion of the opportunities and technical challenges associated with the development of earthquake simulation software for Exascale computing.more » « less
-
Abstract The IceCube Neutrino Observatory is a cubic kilometer neutrino detector located at the geographic South Pole designed to detect high-energy astrophysical neutrinos. To thoroughly understand the detected neutrinos and their properties, the detector response to signal and background has to be modeled using Monte Carlo techniques. An integral part of these studies are the optical properties of the ice the observatory is built into. The simulated propagation of individual photons from particles produced by neutrino interactions in the ice can be greatly accelerated using graphics processing units (GPUs). In this paper, we (a collaboration between NVIDIA and IceCube) reduced the propagation time per photon by a factor of up to 3 on the same GPU. We achieved this by porting the OpenCL parts of the program to CUDA and optimizing the performance. This involved careful analysis and multiple changes to the algorithm. We also ported the code to NVIDIA OptiX to handle the collision detection. The hand-tuned CUDA algorithm turned out to be faster than OptiX. It exploits detector geometry and only a small fraction of photons ever travel close to one of the detectors.