skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Parallelized domain decomposition for multi-dimensional Lagrangian random walk mass-transfer particle tracking schemes
Abstract. Lagrangian particle tracking schemes allow a wide range of flow and transport processes to be simulated accurately, but a major challenge is numerically implementing the inter-particle interactions in an efficient manner. This article develops a multi-dimensional, parallelized domain decomposition (DDC) strategy for mass-transfer particle tracking (MTPT) methods in which particles exchange mass dynamically. We show that this can be efficiently parallelized by employing large numbers of CPU cores to accelerate run times. In order to validate the approach and our theoretical predictions we focus our efforts on a well-known benchmark problem with pure diffusion, where analytical solutions in any number of dimensions are well established. In this work, we investigate different procedures for “tiling” the domain in two and three dimensions (2-D and 3-D), as this type of formal DDC construction is currently limited to 1-D. An optimal tiling is prescribed based on physical problem parameters and the number of available CPU cores, as each tiling provides distinct results in both accuracy and run time. We further extend the most efficient technique to 3-D for comparison, leading to an analytical discussion of the effect of dimensionality on strategies for implementing DDC schemes. Increasing computational resources (cores) within the DDC method produces a trade-off between inter-node communication and on-node work.For an optimally subdivided diffusion problem, the 2-D parallelized algorithm achieves nearly perfect linear speedup in comparison with the serial run-up to around 2700 cores, reducing a 5 h simulation to 8 s, while the 3-D algorithm maintains appreciable speedup up to 1700 cores.  more » « less
Award ID(s):
2107938 1911145 2049687 2129531 2049688
PAR ID:
10419490
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Geoscientific Model Development
Volume:
16
Issue:
3
ISSN:
1991-9603
Page Range / eLocation ID:
833 to 849
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Cloud microphysics is one of the most time‐consuming components in a climate model. In this study, we port the cloud microphysics parameterization in the Community Atmosphere Model (CAM), known as Parameterization of Unified Microphysics Across Scales (PUMAS), from CPU to GPU to seek a computational speedup. The directive‐based methods (OpenACC and OpenMP target offload) are determined as the best fit specifically for our development practices, which enable a single version of source code to run either on the CPU or GPU, and yield a better portability and maintainability. Their performance is first examined in a PUMAS stand‐alone kernel and the directive‐based methods can outperform a CPU node as long as there is enough computational burden on the GPU. A consistent behavior is observed when we run PUMAS on the GPU in a practical CAM simulation. A 3.6× speedup of the PUMAS execution time, including data movement between CPU and GPU, is achieved at a coarse horizontal resolution (8 NVIDIA V100 GPUs against 36 Intel Skylake CPU cores). This speedup further increases up to 5.4× at a high resolution (24 NVIDIA V100 GPUs against 108 Intel Skylake CPU cores), which highlights the fact that GPU favors larger problem size. This study demonstrates that using GPU in a CAM simulation can save noticeable computational costs even with a small portion of code being GPU‐enabled. Therefore, we are encouraged to port more parameterizations to GPU to take advantage of its computational benefit. 
    more » « less
  2. Abstract. Particle tracking is widely utilized to study transport features in a range of physical, chemical, and biological processes in oceanography. In this study, a new offline particle-tracking package, Tracker v1.1, is introduced, and its performance is evaluated in comparison to an online Eulerian dye, one online particle-tracking software package, and three offline particle-tracking software packages in a small, high-resolution model domain and a large coarser model domain. It was found that both particle and dye approaches give similar results across different model resolutions and domains when they were tracking the same water mass, as indicated by similar mean advection pathways and spatial distributions of dye and particles. The flexibility of offline particle tracking and its similarity against online dye and online particle tracking make it a useful tool to complement existing ocean circulation models. The new Tracker was shown to be a reliable particle-tracking package to complement the Regional Ocean Modeling System (ROMS) with the advantages of platform independence and speed improvements, especially in large model domains achieved by the nearest-neighbor search algorithm. Lastly, trade-offs of computational efficiency, modifiability, and ease of use that can influence the choice of which package to use are explored. The main value of the present study is that the different particle and dye tracking codes were all run on the same model output or within the model that generated the output. This allows some measure of intercomparison between the different tracking schemes, and we conclude that all choices that make each tracking package unique do not necessarily lead to very different results. 
    more » « less
  3. Simulations to calculate a single gravitational waveform (GW) can take several weeks. Yet, thousands of such simulations are needed for the detection and interpretation of gravitational waves. Future detectors will require even more accurate waveforms than those currently used. We present here the first large scale, adaptive mesh, multi-GPU numerical relativity (NR) code together with performance analysis and benchmarking. While comparisons are difficult to make, our GPU extension of the Dendro-GR NR code achieves a 6x speedup over existing state-of-the-art codes. We achieve 800 GFlops/s on a single NVIDIA A100 GPU with an overall 2.5x speedup over a two-socket, 128-core AMD EPYC 7763 CPU node with an equivalent CPU implementation. We present detailed performance analyses, parallel scalability results, and accuracy assessments for GWs computed for mass ratios q=1,2,4. We also present strong scalability up to 8 A100s and weak scaling up to 229,376 ×86 cores on the Texas Advanced Computing Center's Frontera system. 
    more » « less
  4. Abstract We develop a generalized interpolation material point method (GIMPM) for the shallow shelf approximation (SSA) of ice flow. The GIMPM, which can be viewed as a particle version of the finite element method, is used here to solve the shallow shelf approximations of the momentum balance and ice thickness evolution equations. We introduce novel numerical schemes for particle splitting and integration at domain boundaries to accurately simulate the spreading of an ice shelf. The advantages of the proposed GIMPM‐SSA framework include efficient advection of history or internal state variables without diffusion errors, automated tracking of the ice front and grounding line at sub‐element scales, and a weak formulation based on well‐established conventions of the finite element method with minimal additional computational cost. We demonstrate the numerical accuracy and stability of the GIMPM using 1‐D and 2‐D benchmark examples. We also compare the accuracy of the GIMPM with the standard material point method (sMPM) and a reweighted form of the sMPM. We find that the grid‐crossing error is very severe for SSA simulations with the sMPM, whereas the GIMPM successfully mitigates this error. While the grid‐crossing error can be reasonably reduced in the sMPM by implementing a simple material point reweighting scheme, this approach it not as accurate as the GIMPM. Thus, we illustrate that the GIMPM‐SSA framework is viable for the simulation of ice sheet‐shelf evolution and enables boundary tracking and error‐free advection of history or state variables, such as ice thickness or damage. 
    more » « less
  5. We present distributed distance-based control (DDC), a novel approach for controlling a multi-agent system, such that it achieves a desired formation, in a resource-constrained setting. Our controller is fully distributed and only requires local state-estimation and scalar measurements of inter-agent distances. It does not require an external localization system or inter-agent exchange of state information. Our approach uses spatial- predictive control (SPC), to optimize a cost function given strictly in terms of inter-agent distances and the distance to the target location. In DDC, each agent continuously learns and updates a very abstract model of the actual system, in the form of a dictionary of three independent key-value pairs (~s, d), where d is the partial derivative of the distance measurements along a spatial direction ~s. This is sufficient for an agent to choose the best next action. We validate our approach by using DDC to control a collection of Crazyflie drones to achieve formation flight and reach a target while maintaining flock formation. 
    more » « less