Stencil computations are widely used to simulate the change of state of physical systems across a multidimensional grid over multiple timesteps. The state-of-the-art techniques in this area fall into three groups: cache-aware tiled looping algorithms, cache-oblivious divide-and-conquer trapezoidal algorithms, and Krylov subspace methods. In this paper, we present two efficient parallel algorithms for performing linear stencil computations. Current direct solvers in this domain are computationally inefficient, and Krylov methods require manual labor and mathematical training. We solve these problems for linear stencils by using DFT preconditioning on a Krylov method to achieve a direct solver which is both fast and general. Indeed, while all currently available algorithms for solving general linear stencils perform Θ(NT) work, where N is the size of the spatial grid and T is the number of timesteps, our algorithms perform o(NT) work. To the best of our knowledge, we give the first algorithms that use fast Fourier transforms to compute final grid data by evolving the initial data for many timesteps at once. Our algorithms handle both periodic and aperiodic boundary conditions, and achieve polynomially better performance bounds (i.e., computational complexity and parallel runtime) than all other existing solutions. Initial experimental results show that implementations of our algorithms that evolve grids of roughly 10^7 cells for around 10^5 timesteps run orders of magnitude faster than state-of-the-art implementations for periodic stencil problems, and 1.3× to 8.5× faster for aperiodic stencil problems.
more »
« less
A Fast Algorithm for Aperiodic Linear Stencil Computation using Fast Fourier Transforms
Stencil computations are widely used to simulate the change of state of physical systems across a multidimensional grid over multiple timesteps. The state-of-the-art techniques in this area fall into three groups: cache-aware tiled looping algorithms, cache-oblivious divide-and-conquer trapezoidal algorithms, and Krylov subspace methods. In this article, we present two efficient parallel algorithms for performing linear stencil computations. Current direct solvers in this domain are computationally inefficient, and Krylov methods require manual labor and mathematical training. We solve these problems for linear stencils by using discrete Fourier transforms preconditioning on a Krylov method to achieve a direct solver that is both fast and general. Indeed, while all currently available algorithms for solving general linear stencils perform Θ(NT) work, whereNis the size of the spatial grid andTis the number of timesteps, our algorithms performo(NT) work. To the best of our knowledge, we give the first algorithms that use fast Fourier transforms to compute final grid data by evolving the initial data for many timesteps at once. Our algorithms handle both periodic and aperiodic boundary conditions and achieve polynomially better performance bounds (i.e., computational complexity and parallel runtime) than all other existing solutions. Initial experimental results show that implementations of our algorithms that evolve grids of roughly 107cells for around 105timesteps run orders of magnitude faster than state-of-the-art implementations for periodic stencil problems, and 1.3× to 8.5× faster for aperiodic stencil problems. Code Repository:https://github.com/TEAlab/FFTStencils
more »
« less
- PAR ID:
- 10542390
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- ACM Transactions on Parallel Computing
- Volume:
- 10
- Issue:
- 4
- ISSN:
- 2329-4949
- Page Range / eLocation ID:
- 1 to 34
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Numerical simulations have been an increasingly important tool in space physics. Here, we introduce an open-source three-dimensional compressible Hall-Magnetohydrodynamic (MHD) simulation codeLAPS(UCLA-Pseudo-Spectral,https://github.com/chenshihelio/LAPS). The code adopts a pseudo-spectral method based on Fourier Transform to evaluate spatial derivatives, and third-order explicit Runge-Kutta method for time advancement. It is parallelized using Message-Passing-Interface (MPI) with a “pencil” parallelization strategy and has very high scalability. The Expanding-Box-Model is implemented to incorporate spherical expansion effects of the solar wind. We carry out test simulations based on four classic (Hall)-MHD processes, namely, 1) incompressible Hall-MHD waves, 2) incompressible tearing mode instability, 3) Orszag-Tang vortex, and 4) parametric decay instability. The test results agree perfectly with theory predictions and results of previous studies. Given all its features,LAPSis a powerful tool for large-scale simulations of solar wind turbulence as well as other MHD and Hall-MHD processes happening in space.more » « less
-
Abstract High-fidelity simulators that connect theoretical models with observations are indispensable tools in many sciences. If the likelihood is known, inference can proceed using standard techniques. However, when the likelihood is intractable or unknown, a simulator makes it possible to infer the parameters of a theoretical model directly from real and simulated observations when coupled with machine learning. We introduce an extension of the recently proposed likelihood-free frequentist inference (LF2I) approach that makes it possible to construct confidence sets with thep-value function and to use the same function to check the coverage explicitly at any given parameter point. LikeLF2I, this extension yields provably valid confidence sets in parameter inference problems for which a high-fidelity simulator is available. The utility of our algorithm is illustrated by applying it to three pedagogically interesting examples: the first is from cosmology, the second from high-energy physics and astronomy, both with tractable likelihoods, while the third, with an intractable likelihood, is from epidemiology33Code to reproduce all of our results is available onhttps://github.com/AliAlkadhim/ALFFI..more » « less
-
Stencil computations are widely used in the scientific simulation domain, and their performance is critical to the overall efficiency of many large-scale numerical applications. Many optimization techniques, most of them varying strategies of tiling and parallelization, exist to systematically enhance the efficiency of stencil computations. However, the effective- ness of these optimizations vary significantly depending on the wide range of properties demonstrated by the different stencils. This paper studies several well-known optimization strategies for stencils and presents a new approach to effectively guide the composition of these optimizations, by modeling their interactions with four domain-level proper- ties of stencils: spatial dimensionality, temporal order, order of accuracy, and directional dependence. When using our prediction model to guide optimizations for five real-world stencil applications, we were able to identify optimization strategies that outperformed two highly optimized stencil libraries by an average of 2.4x.more » « less
-
Abstract Reliable studies of the long-term dynamics of planetary systems require numerical integrators that are accurate and fast. The challenge is often formidable because the chaotic nature of many systems requires relative numerical error bounds at or close to machine precision (∼10−16, double-precision arithmetic); otherwise, numerical chaos may dominate over physical chaos. Currently, the speed/accuracy demands are usually only met by symplectic integrators. For example, the most up-to-date long-term astronomical solutions for the solar system in the past (widely used in, e.g., astrochronology and high-precision geological dating) have been obtained using symplectic integrators. However, the source codes of these integrators are unavailable. Here I present the symplectic integratororbitN(lean version 1.0) with the primary goal of generating accurate and reproducible long-term orbital solutions for near-Keplerian planetary systems (here the solar system) with a dominant massM0. Among other features,orbitN-1.0includesM0’s quadrupole moment, a lunar contribution, and post-Newtonian corrections (1PN) due toM0(fast symplectic implementation). To reduce numerical round-off errors, Kahan compensated summation was implemented. I useorbitNto provide insight into the effect of various processes on the long-term chaos in the solar system. Notably, 1PN corrections have the opposite effect on chaoticity/stability on a 100 Myr versus Gyr timescale. For the current application,orbitNis about as fast as or faster (factor 1.15–2.6) than comparable integrators, depending on hardware.11The orbitN source code (C) is available athttp://github.com/rezeebe/orbitN.more » « less