skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Accelerating Numerical Relativity with Code Generation: CUDA-enabled Hyperbolic Relaxation
Abstract Next-generation gravitational wave detectors such as Cosmic Explorer, the Einstein Telescope, and LISA, demand highly accurate and extensive gravitational wave (GW) catalogs to faithfully extract physical parameters from observed signals. However, numerical relativity (NR) faces significant challenges in generating these catalogs at the required scale and accuracy on modern computers, as NR codes do not fully exploit modern GPU capabilities. In response, we extend NRPy, a Python-based NR code-generation framework, to develop NRPyEllipticGPU—a CUDA-optimized elliptic solver tailored for the binary black hole (BBH) initial data problem. NRPyEllipticGPU is the first GPU-enabled elliptic solver in the NR community, supporting a variety of coordinate systems and demonstrating substantial performance improvements on both consumer-grade and HPC-grade GPUs. We show that, when compared to a high-end CPU, NRPyEllipticGPU achieves on a high- end GPU up to a sixteenfold speedup in single precision while increasing double- precision performance by a factor of 2–4. This performance boost leverages the GPU’s superior parallelism and memory bandwidth to achieve a compute-bound application and enhancing the overall simulation efficiency. As NRPyEllipticGPU shares the core infrastructure common to NR codes, this work serves as a practical guide for developing full, CUDA-optimized NR codes.  more » « less
Award ID(s):
2508377 2409654 2411068 2108072 2004311 2110352 2227105
PAR ID:
10588944
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
IOP Publishing
Date Published:
Journal Name:
Classical and Quantum Gravity
ISSN:
0264-9381
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Simulations to calculate a single gravitational waveform (GW) can take several weeks. Yet, thousands of such simulations are needed for the detection and interpretation of gravitational waves. Future detectors will require even more accurate waveforms than those currently used. We present here the first large scale, adaptive mesh, multi-GPU numerical relativity (NR) code together with performance analysis and benchmarking. While comparisons are difficult to make, our GPU extension of the Dendro-GR NR code achieves a 6x speedup over existing state-of-the-art codes. We achieve 800 GFlops/s on a single NVIDIA A100 GPU with an overall 2.5x speedup over a two-socket, 128-core AMD EPYC 7763 CPU node with an equivalent CPU implementation. We present detailed performance analyses, parallel scalability results, and accuracy assessments for GWs computed for mass ratios q=1,2,4. We also present strong scalability up to 8 A100s and weak scaling up to 229,376 ×86 cores on the Texas Advanced Computing Center's Frontera system. 
    more » « less
  2. Octo-Tiger is a code for modeling three-dimensional self-gravitating astrophysical fluids. It was particularly designed for the study of dynamical mass transfer between interacting binary stars. Octo-Tiger is parallelized for distributed systems using the asynchronous many-task runtime system, the C++ standard library for parallelism and concurrency (HPX) and utilizes CUDA for its gravity solver. Recently, we have remodeled Octo-Tiger’s hydro solver to use a three-dimensional reconstruction scheme. In addition, we have ported the hydro solver to GPU using CUDA kernels. We present scaling results for the new hydro kernels on ORNL’s Summit machine using a Sedov-Taylor blast wave problem. We also compare Octo-Tiger’s new hydro scheme with its old hydro scheme, using a rotating star as a test problem. 
    more » « less
  3. Abstract We introduce CRK-HACC, an extension of the Hardware/Hybrid Accelerated Cosmology Code (HACC), to resolve gas hydrodynamics in large-scale structure formation simulations of the universe. The new framework couples the HACC gravitationalN-body solver with a modern smoothed-particle hydrodynamics (SPH) approach called conservative reproducing kernel SPH (CRKSPH). CRKSPH utilizes smoothing functions that exactly interpolate linear fields while manifestly preserving conservation laws (momentum, mass, and energy). The CRKSPH method has been incorporated to accurately model baryonic effects in cosmology simulations—an important addition targeting the generation of precise synthetic sky predictions for upcoming observational surveys. CRK-HACC inherits the codesign strategies of the HACC solver and is built to run on modern GPU-accelerated supercomputers. In this work, we summarize the primary solver components and present a number of standard validation tests to demonstrate code accuracy, including idealized hydrodynamic and cosmological setups, as well as self-similarity measurements. 
    more » « less
  4. We introduce NRPyElliptic, an elliptic solver for numerical relativity (NR) built within the NRPy+ framework. As its first application, NRPyElliptic sets up conformally flat, binary black hole (BBH) puncture initial data (ID) on a single numerical domain, similar to the widely used TwoPunctures code. Unlike TwoPunctures, NRPyElliptic employs a hyperbolic relaxation scheme, whereby arbitrary elliptic PDEs are trivially transformed into a hyperbolic system of PDEs. As consumers of NR ID generally already possess expertise in solving hyperbolic PDEs, they will generally find NRPyElliptic easier to tweak and extend than other NR elliptic solvers. When evolved forward in (pseudo)time, the hyperbolic system exponentially reaches a steady state that solves the elliptic PDEs. Notably NRPyElliptic accelerates the relaxation waves, which makes it many orders of magnitude faster than the usual constant-wavespeed approach. While it is still ∼12x slower than TwoPunctures at setting up full-3D BBH ID, NRPyElliptic requires only ≈0.3% of the runtime for a full BBH simulation in the Einstein Toolkit. Future work will focus on improving performance and generating other types of ID, such as binary neutron star. 
    more » « less
  5. We have implemented GPU-aware support across all AWP-ODC versions and enhanced message-passing collective communications for this memory-bound finite-difference solver. This provides cutting-edge communication support for production simulations on leadership-class computing facilities, including OLCF Frontier and TACC Vista. We achieved significant performance gains, reaching 37 sustained Petaflop/s and reducing time-to-solution by 17.2% using the GPU-aware feature on 8,192 Frontier nodes, or 65,336 MI250X GCDs. The AWP-ODC code has also been optimized for TACC Vista, an Arm-based NVIDIA GH200 Grace Hopper Superchip, demonstrating excellent application performance. This poster will showcase studies and GPU performance characteristics. We will discuss our verification of GPU-aware development and the use of high-performance MVAPICH libraries, including on-the-fly compression, on modern GPU clusters. 
    more » « less