skip to main content


Title: Penetration-free projective dynamics on the GPU
We present a GPU algorithm for deformable simulation. Our method offers good computational efficiency and penetration-free guarantee at the same time, which are not common with existing techniques. The main idea is an algorithmic integration of projective dynamics (PD) and incremental potential contact (IPC). PD is a position-based simulation framework, favored for its robust convergence and convenient implementation. We show that PD can be employed to handle the variational optimization with the interior point method e.g., IPC. While conceptually straightforward, this requires a dedicated rework over the collision resolution and the iteration modality to avoid incorrect collision projection with improved numerical convergence. IPC exploits a barrier-based formulation, which yields an infinitely large penalty when the constraint is on the verge of being violated. This mechanism guarantees intersection-free trajectories of deformable bodies during the simulation, as long as they are apart at the rest configuration. On the downside, IPC brings a large amount of nonlinearity to the system, making PD slower to converge. To mitigate this issue, we propose a novel GPU algorithm named A-Jacobi for faster linear solve at the global step of PD. A-Jacobi is based on Jacobi iteration, but it better harvests the computation capacity on modern GPUs by lumping several Jacobi steps into a single iteration. In addition, we also re-design the CCD root finding procedure by using a new minimum-gradient Newton algorithm. Those saved time budgets allow more iterations to accommodate stiff IPC barriers so that the result is both realistic and collision-free. Putting together, our algorithm simulates complicated models of both solids and shells on the GPU at an interactive rate or even in real time.  more » « less
Award ID(s):
2018069 2023780 2153863 2153851 1943199 1813624
NSF-PAR ID:
10346671
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
ACM Transactions on Graphics
Volume:
41
Issue:
4
ISSN:
0730-0301
Page Range / eLocation ID:
1 to 16
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we present a GPU algorithm for finite element hyperelastic simulation. We show that the interior-point method, known to be effective for robust collision resolution, can be coupled with non-Newton procedures and be massively sped up on the GPU. Newton's method has been widely chosen for the interior-point family, which fully solves a linear system at each step. After that, the active set associated with collision/contact constraints is updated. Mimicking this routine using a non-Newton optimization (like gradient descent or ADMM) unfortunately does not deliver expected accelerations. This is because the barrier functions employed in an interior-point method need to be updated at every iteration to strictly confine the search to the feasible region. The associated cost (e.g., per-iteration CCD) quickly overweights the benefit brought by the GPU, and a new parallelism modality is needed. Our algorithm is inspired by the domain decomposition method and designed to move interior-point-related computations to local domains as much as possible. We minimize the size of each domain (i.e., a stencil) by restricting it to a single element, so as to fully exploit the capacity of modern GPUs. The stencil-level results are integrated into a global update using a novel hybrid sweep scheme. Our algorithm is locally second-order offering better convergence. It enables simulation acceleration of up to two orders over its CPU counterpart. We demonstrate the scalability, robustness, efficiency, and quality of our algorithm in a variety of simulation scenarios with complex and detailed collision geometries.

     
    more » « less
  2. Simulating stiff materials in applications where deformations are either not significant or else can safely be ignored is a fundamental task across fields. Rigid body modeling has thus long remained a critical tool and is, by far, the most popular simulation strategy currently employed for modeling stiff solids. At the same time, rigid body methods continue to pose a number of well known challenges and trade-offs including intersections, instabilities, inaccuracies, and/or slow performances that grow with contact-problem complexity. In this paper we revisit the stiff body problem and present ABD, a simple and highly effective affine body dynamics framework, which significantly improves state-of-the-art for simulating stiff-body dynamics. We trace the challenges in rigid-body methods to the necessity of linearizing piecewise-rigid trajectories and subsequent constraints. ABD instead relaxes the unnecessary (and unrealistic) constraint that each body's motion be exactly rigid with a stiff orthogonality potential, while preserving the rigid body model's key feature of a small coordinate representation. In doing so ABD replaces piecewise linearization with piecewise linear trajectories. This, in turn, combines the best of both worlds: compact coordinates ensure small, sparse system solves, while piecewise-linear trajectories enable efficient and accurate constraint (contact and joint) evaluations. Beginning with this simple foundation, ABD preserves all guarantees of the underlying IPC model we build it upon, e.g., solution convergence, guaranteed non-intersection, and accurate frictional contact. Over a wide range and scale of simulation problems we demonstrate that ABD brings orders of magnitude performance gains (two- to three-orders on the CPU and an order more when utilizing the GPU, obtaining 10, 000× speedups) over prior IPC-based methods, while maintaining simulation quality and nonintersection of trajectories. At the same time ABD has comparable or faster timings when compared to state-of-the-art rigid body libraries optimized for performance without guarantees, and successfully and efficiently solves challenging simulation problems where both classes of prior rigid body simulation methods fail altogether. 
    more » « less
  3. Quasi-Newton methods still face significant challenges in training large-scale neural networks due to additional compute costs in the Hessian related computations and instability issues in stochastic training. A well-known method, L-BFGS that efficiently approximates the Hessian using history parameter and gradient changes, suffers convergence instability in stochastic training. So far, attempts that adapt L-BFGS to large-scale stochastic training incur considerable extra overhead, which offsets its convergence benefits in wall-clock time. In this paper, we propose mL-BFGS, a lightweight momentum-based L-BFGS algorithm that paves the way for quasi-Newton (QN) methods in large-scale distributed deep neural network (DNN) optimization. mL-BFGS introduces a nearly cost-free momentum scheme into L-BFGS update and greatly reduces stochastic noise in the Hessian, therefore stabilizing convergence during stochastic optimization. For model training at a large scale, mL-BFGS approximates a block-wise Hessian, thus enabling distributing compute and memory costs across all computing nodes. We provide a supporting convergence analysis for mL-BFGS in stochastic settings. To investigate mL-BFGS’s potential in large-scale DNN training, we train benchmark neural models using mL-BFGS and compare performance with baselines (SGD, Adam, and other quasi-Newton methods). Results show that mL-BFGS achieves both noticeable iteration-wise and wall-clock speedup. 
    more » « less
  4. Throughout many scientific and engineering fields, including control theory, quantum mechanics, advanced dynamics, and network theory, a great many important applications rely on the spectral decomposition of matrices. Traditional methods such as the power iteration method, Jacobi eigenvalue method, and QR decomposition are commonly used to compute the eigenvalues and eigenvectors of a square and symmetric matrix. However, these methods suffer from certain drawbacks: in particular, the power iteration method can only find the leading eigen-pair (i.e., the largest eigenvalue and its corresponding eigenvector), while the Jacobi and QR decomposition methods face significant performance limitations when facing with large scale matrices. Typically, even producing approximate eigenpairs of a general square matrix requires at least O(N^3) time complexity, where N is the number of rows of the matrix. In this work, we exploit the newly developed memristor technology to propose a low-complexity, scalable memristor-based method for deriving a set of dominant eigenvalues and eigenvectors for real symmetric non-negative matrices. The time complexity for our proposed algorithm is O(N^2 /Δ) (where Δ governs the accuracy). We present experimental studies to simulate the memristor-supporting algorithm, with results demonstrating that the average error for our method is within 4%, while its performance is up to 1.78X better than traditional methods. 
    more » « less
  5. Recent developments in the computational automated design of electromagnetic devices, otherwise known as inverse design, have significantly enhanced the design process for nanophotonic systems. Inverse design can both reduce design time considerably and lead to high-performance, nonintuitive structures that would otherwise have been impossible to develop manually. Despite the successes enjoyed by structure optimization techniques, most approaches leverage electromagnetic solvers that require significant computational resources and suffer from slow convergence and numerical dispersion. Recently, a fast simulation and boundary-based inverse design approach based on boundary integral equations was demonstrated for two-dimensional nanophotonic problems. In this work, we introduce a new full-wave three-dimensional simulation and boundary-based optimization framework for nanophotonic devices also based on boundary integral methods, which achieves high accuracy even at coarse mesh discretizations while only requiring modest computational resources. The approach has been further accelerated by leveraging GPU computing, a sparse block-diagonal preconditioning strategy, and a matrix-free implementation of the discrete adjoint method. As a demonstration, we optimize three different devices: a 1:2 1550 nm power splitter and two nonadiabatic mode-preserving waveguide tapers. To the best of our knowledge, the tapers, which span 40 wavelengths in the silicon material, are the largest silicon photonic waveguiding devices to have been optimized using full-wave 3D solution of Maxwell’s equations. 
    more » « less