skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A General Novel Parallel Framework for SPH-centric Algorithms
To date, large-scale fluid simulation with more details employing the Smooth Particle Hydrodynamics (SPH) method or its variants is ubiquitous in computer graphics and digital entertainment applications. Higher accuracy and faster speed are two key criteria evaluating possible improvement of the underlying algorithms within any available framework. Such requirements give rise to high-fidelity simulation with more particles and higher particle density that will unavoidably increase computational cost significantly. In this paper, we develop a new general GPGPU acceleration framework for SPH-centric simulations founded upon a novel neighbor traversal algorithm. Our novel parallel framework integrates several advanced characteristics of GPGPU architecture (e.g., shared memory and register memory). Additionally, we have designed a reasonable task assignment strategy, which makes sure that all the threads from the same CTA belong to the same cell of the grid. With this organization, big bunches of continuous neighboring data can be loaded to the shared memory of a CTA and used by all its threads. Our method has thus low global-memory bandwidth consumption. We have integrated our method into both WCSPH and PCISPH, that are two improved variants in recent years, and demonstrated its performance with several scenarios involving multiple-fluid interaction, dam break, and elastic solid. Through comprehensive tests validated in practice, our work can exhibit up to 2.18x speedup when compared with other state-of-the-art parallel frameworks.  more » « less
Award ID(s):
1715985
PAR ID:
10297833
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the ACM on Computer Graphics and Interactive Techniques
Volume:
2
Issue:
1
ISSN:
2577-6193
Page Range / eLocation ID:
1 to 16
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Graphics Processing Units (GPUs) have rapidly evolved to enable energy-efficient data-parallel computing for a broad range of scientific areas. While GPUs achieve exascale performance at a stringent power budget, they are also susceptible to soft errors, often caused by high-energy particle strikes, that can significantly affect the application output quality. Understanding the resilience of general purpose GPU applications is the purpose of this study. To this end, it is imperative to explore the range of application output by injecting faults at all the potential fault sites. This problem is especially challenging because unlike CPU applications, which are mostly single-threaded, GPGPU applications can contain hundreds to thousands of threads, resulting in a tremendously large fault site space - in the order of billions even for some simple applications. In this paper, we present a systematic way to progressively prune the fault site space aiming to dramatically reduce the number of fault injections such that assessment for GPGPU application error resilience can be practical. The key insight behind our proposed methodology stems from the fact that GPGPU applications spawn a lot of threads, however, many of them execute the same set of instructions. Therefore, several fault sites are redundant and can be pruned by a careful analysis of faults across threads and instructions. We identify important features across a set of 10 applications (16 kernels) from Rodinia and Polybench suites and conclude that threads can be first classified based on the number of the dynamic instructions they execute. We achieve significant fault site reduction by analyzing only a small subset of threads that are representative of the dynamic instruction behavior (and therefore error resilience behavior) of the GPGPU applications. Further pruning is achieved by identifying and analyzing: a) the dynamic instruction commonalities (and differences) across code blocks within this representative set of threads, b) a subset of loop iterations within the representative threads, and c) a subset of destination register bit positions. The above steps result in a tremendous reduction of fault sites by up to seven orders of magnitude. Yet, this reduced fault site space accurately captures the error resilience profile of GPGPU applications. 
    more » « less
  2. Abstract We propose a new approach for performing drained and undrained loading of elastoplastic geomaterials over large deformations using smoothed particle hydrodynamics (SPH), a meshfree continuum particle method, combined with the modified Cam Clay (MCC) model of critical state soil mechanics. The numerical approach draws upon a novel one‐particle two‐phase penalty‐method based formulation for handling undrained loading in saturated soils, which allows tracking of the buildup of pore‐water pressures under combined shearing and compression. Large‐scale parallelized simulations are employed to accommodate a significant number of degrees of freedom in a three‐dimensional setting. After verification and benchmark testing, the SPH based formulation is used to analyze the propagation of reverse faults through fluid‐saturated clay deposits and the rupture of strike‐slip faults across earthen embankments. The computational methodology tests the robustness of the meshfree approach in situations where the soil tends to dilate on the ‘dry’ side of the critical state line and to compact on the ‘wet’ side, but cannot, because of the incompressibility constraint imposed by undrained loading. Our results extend the current understanding of fault rupture modeling and further demonstrate the potential of our framework together with the SPH method for large deformation analyses of complex problems in geotechnics. 
    more » « less
  3. Transactional memory is a concurrency control mechanism that dynamically determines when threads may safely execute critical sections of code. It provides the performance of fine-grained locking mechanisms with the simplicity of coarse-grained locking mechanisms. With hardware based transactions, the protection of shared data accesses and updates can be evaluated at runtime so that only true collisions to shared data force serialization. This paper explores the use of transactional memory as an alternative to conventional synchronization mechanisms for managing the pending event set in a Time Warp synchronized parallel simulator. In particular, we explore the application of Intel’s hardware-based transactional memory (TSX) to manage shared access to the pending event set by the simulation threads. Comparison between conventional locking mechanisms and transactional memory access is performed to evaluate each within the warped Time Warp synchronized parallel simulation kernel. In this testing, evaluation of both forms of transactional memory found in the Intel Haswell processor, Hardware Lock Elision (HLE) and Restricted Transactional Memory (RTM), are evaluated. The results show that RTM generally outperforms conventional locking mechanisms and that HLE provides consistently better performance than conventional locking mechanisms, in some cases as much as 27%. 
    more » « less
  4. Na (Ed.)
    We conducted a systematic numerical investigation of spherical, prolate and oblate particles in an inertial shear flow between two parallel walls, using smoothed particle hydrodynamics (SPH). It was previously shown that above a critical Reynolds number, spherical particles experience a supercritical pitchfork bifurcation of the equilibrium position in shear flow between two parallel walls, namely that the central equilibrium position becomes unstable, leading to the emergence of two new off-centre stable positions (Foxet al.,J. Fluid Mech., vol. 915, 2021). This phenomenon was unexpected given the symmetry of the system. In addition to confirming this finding, we found, surprisingly, that ellipsoidal particles can also return to the centre position from the off-centre positions when the particle Reynolds number is further increased, while spherical particles become unstable under this increased Reynolds number. By utilizing both SPH and the finite element method for flow visualization, we explained the underlining mechanism of this reverse of bifurcation by altered streamwise vorticity and symmetry breaking of pressure. Furthermore, we expanded our investigation to include asymmetric particles, a novel aspect that had not been previously modelled, and we observed similar trends in particle dynamics for both symmetric and asymmetric ellipsoidal particles. While further validation through laboratory experiments is necessary, our research paves the road for development of new focusing and separation methods for shaped particles. 
    more » « less
  5. Smoothed-particle hydrodynamics (SPH) is a mesh-free method used to simulate volumetric media in fluids, astrophysics, and solid mechanics. Visualizing these simulations is problematic because these datasets often contain millions, if not billions of particles carrying physical attributes and moving over time. Radial basis functions (RBFs) are used to model particles, and overlapping particles are interpolated to reconstruct a high-quality volumetric field; however, this interpolation process is expensive and makes interactive visualization difficult. Existing RBF interpolation schemes do not account for color-mapped attributes and are instead constrained to visualizing just the density field. To address these challenges, we exploit ray tracing cores in modern GPU architectures to accelerate scalar field reconstruction. We use a novel RBF interpolation scheme to integrate per-particle colors and densities, and leverage GPU-parallel tree construction and refitting to quickly update the tree as the simulation animates over time or when the user manipulates particle radii. We also propose a Hilbert reordering scheme to cluster particles together at the leaves of the tree to reduce tree memory consumption. Finally, we reduce the noise of volumetric shadows by adopting a spatially temporal blue noise sampling scheme. Our method can provide a more detailed and interactive view of these large, volumetric, time-series particle datasets than traditional methods, leading to new insights into these physics simulations. 
    more » « less