We present GRaM-X (General Relativistic accelerated Magnetohydrodynamics on AMReX), a new GPU-accelerated dynamical-spacetime general relativistic magnetohydrodynamics (GRMHD) code which extends the GRMHD capability of Einstein Toolkit to GPU-based exascale systems. GRaM-X supports 3D adaptive mesh refinement (AMR) on GPUs via a new AMR driver for the Einstein Toolkit called CarpetX which in turn leverages AMReX, an AMR library developed for use by the United States DOE's Exascale Computing Project. We use the Z4c formalism to evolve the Einstein equations and the Valencia formulation to evolve the equations of GRMHD. GRaM-X supports both analytic as well as tabulated equations of state. We implement TVD and WENO reconstruction methods as well as the HLLE Riemann solver. We test the accuracy of the code using a range of tests on static spacetime, e.g. 1D magnetohydrodynamics shocktubes, the 2D magnetic rotor and a cylindrical explosion, as well as on dynamical spacetimes, i.e. the oscillations of a 3D Tolman-Oppenheimer-Volkhof star. We find excellent agreement with analytic results and results of other codes reported in literature. We also perform scaling tests and find that GRaM-X shows a weak scaling efficiency of ∼40%–50% on 2304 nodes (13824 NVIDIA V100 GPUs) with respect to single-node performance on OLCF's supercomputer Summit.
more »
« less
GRaM-X: a new GPU-accelerated dynamical spacetime GRMHD code for Exascale computing with the Einstein Toolkit
Abstract We presentGRaM-X(GeneralRelativisticacceleratedMagnetohydrodynamics on AMReX), a new GPU-accelerated dynamical-spacetime general relativistic magnetohydrodynamics (GRMHD) code which extends the GRMHD capability of Einstein Toolkit to GPU-based exascale systems.GRaM-Xsupports 3D adaptive mesh refinement (AMR) on GPUs via a new AMR driver for the Einstein Toolkit calledCarpetXwhich in turn leveragesAMReX, an AMR library developed for use by the United States DOE’s Exascale Computing Project. We use the Z4c formalism to evolve the Einstein equations and the Valencia formulation to evolve the equations of GRMHD.GRaM-Xsupports both analytic as well as tabulated equations of state. We implement TVD and WENO reconstruction methods as well as the HLLE Riemann solver. We test the accuracy of the code using a range of tests on static spacetime, e.g. 1D magnetohydrodynamics shocktubes, the 2D magnetic rotor and a cylindrical explosion, as well as on dynamical spacetimes, i.e. the oscillations of a 3D Tolman-Oppenheimer-Volkhof star. We find excellent agreement with analytic results and results of other codes reported in literature. We also perform scaling tests and find thatGRaM-Xshows a weak scaling efficiency of ∼40%–50% on 2304 nodes (13824 NVIDIA V100 GPUs) with respect to single-node performance on OLCF’s supercomputer Summit.
more »
« less
- Award ID(s):
- 2004879
- PAR ID:
- 10463814
- Publisher / Repository:
- IOP Publishing
- Date Published:
- Journal Name:
- Classical and Quantum Gravity
- Volume:
- 40
- Issue:
- 20
- ISSN:
- 0264-9381
- Page Range / eLocation ID:
- Article No. 205009
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract We presentAsterX, a novel open-source, modular, GPU-accelerated, fully general relativistic magnetohydrodynamic (GRMHD) code designed for dynamic spacetimes in 3D Cartesian coordinates, and tailored for exascale computing. We utilize block-structured adaptive mesh refinement (AMR) throughCarpetX, the new driver for theEinstein Toolkit, which is built onAMReX, a software framework for massively parallel applications.AsterXemploys the Valencia formulation for GRMHD, coupled with the ‘Z4c’ formalism for spacetime evolution, while incorporating high resolution shock capturing schemes to accurately handle the hydrodynamics.AsterXhas undergone rigorous testing in both static and dynamic spacetime, demonstrating remarkable accuracy and agreement with other codes in literature. Using subcycling in time, we find an overall performance gain of factor 2.5–4.5. Benchmarking the code through scaling tests on OLCF’s Frontier supercomputer, we demonstrate a weak scaling efficiency of about 67%–77% on 4096 nodes compared to an 8-node performance.more » « less
-
Abstract We present the implementation of a two-moment-based general-relativistic multigroup radiation transport module in theGeneral-relativisticmultigridnumerical (Gmunu) code. On top of solving the general-relativistic magnetohydrodynamics and the Einstein equations with conformally flat approximations, the code solves the evolution equations of the zeroth- and first-order moments of the radiations in the Eulerian-frame. An analytic closure relation is used to obtain the higher order moments and close the system. The finite-volume discretization has been adopted for the radiation moments. The advection in spatial space and frequency-space are handled explicitly. In addition, the radiation–matter interaction terms, which are very stiff in the optically thick region, are solved implicitly. The implicit–explicit Runge–Kutta schemes are adopted for time integration. We test the implementation with a number of numerical benchmarks from frequency-integrated to frequency-dependent cases. Furthermore, we also illustrate the astrophysical applications in hot neutron star and core-collapse supernovae modelings, and compare with other neutrino transport codes.more » « less
-
Abstract General relativistic magnetohydrodynamic (GRMHD) simulations have revolutionized our understanding of black hole accretion. Here, we present a GPU-accelerated GRMHD code H-AMR with multifaceted optimizations that, collectively, accelerate computation by 2–5 orders of magnitude for a wide range of applications. First, it introduces a spherical grid with 3D adaptive mesh refinement that operates in each of the three dimensions independently. This allows us to circumvent the Courant condition near the polar singularity, which otherwise cripples high-resolution computational performance. Second, we demonstrate that local adaptive time stepping on a logarithmic spherical-polar grid accelerates computation by a factor of ≲10 compared to traditional hierarchical time-stepping approaches. Jointly, these unique features lead to an effective speed of ∼109zone cycles per second per node on 5400 NVIDIA V100 GPUs (i.e., 900 nodes of the OLCF Summit supercomputer). We illustrate H-AMR's computational performance by presenting the first GRMHD simulation of a tilted thin accretion disk threaded by a toroidal magnetic field around a rapidly spinning black hole. With an effective resolution of 13,440 × 4608 × 8092 cells and a total of ≲22 billion cells and ∼0.65 × 108time steps, it is among the largest astrophysical simulations ever performed. We find that frame dragging by the black hole tears up the disk into two independently precessing subdisks. The innermost subdisk rotation axis intermittently aligns with the black hole spin, demonstrating for the first time that such long-sought alignment is possible in the absence of large-scale poloidal magnetic fields.more » « less
-
Abstract Block-Adaptive-Tree Solar-wind Roe-type Upwind Scheme (BATSRUS), our state-of-the-art extended magnetohydrodynamic code, is the most used and one of the most resource-consuming models in the Space Weather Modeling Framework. It has always been our objective to improve its efficiency and speed with emerging techniques, such as GPU acceleration. To utilize the GPU nodes on modern supercomputers, we port BATSRUS to GPUs with the OpenACC API. Porting the code to a single GPU requires rewriting and optimizing the most used functionalities of the original code into a new solver, which accounts for around 1% of the entire program in length. To port it to multiple GPUs, we implement a new message-passing algorithm to support its unique block-adaptive grid feature. We conduct weak scaling tests on as many as 256 GPUs and find good performance. The program has 50%–60% parallel efficiency on up to 256 GPUs and up to 95% efficiency within a single node (four GPUs). Running large problems on more than one node has reduced efficiency due to hardware bottlenecks. We also demonstrate our ability to run representative magnetospheric simulations on GPUs. The performance for a single A100 GPU is about the same as 270 AMD “Rome” CPU cores (2.1 128-core nodes), and it runs 3.6 times faster than real time. The simulation can run 6.9 times faster than real time on four A100 GPUs.more » « less
An official website of the United States government
