skip to main content


Title: A Scalable Hierarchical Semi-Separable Library for Heterogeneous Clusters
We present a scalable distributed memory library for generating and computations involving structured dense matrices, such as those produced by boundary integral equation formulations. Such matrices are dense, but have special structure that can be exploited to obtain efficient storage and matrix-vector product evaluations and consequently the fast solution of linear systems. At the core of the methods we use is the observation that off-diagonal matrix blocks of such matrices have a low numerical rank, and that this property can be exploited in a multi-level fashion. In this work we focus on the Hierarchically Semi-Separable (HSS) representation. We present algorithms for building and using HSS representations that are parallelized using MPI and CUDA to leverage state-of-the-art heterogeneous clusters. The efficiency of our methods and implementation is demonstrated on large dense matrices obtained from a boundary integral equation formulation of the Laplace equation with Dirichlet boundary conditions. We demonstrate excellent (linear) scalability on up to 128 GPUs on 128 nodes. Our codes will lay the foundation for fast direct solvers for elliptic problems.  more » « less
Award ID(s):
1464244 1643056
NSF-PAR ID:
10067882
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
46th International Conference on Parallel Processing (ICPP)
Page Range / eLocation ID:
513 to 522
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Consider the elastic scattering of a time-harmonic wave by multiple well-separated rigid particles with smooth boundaries in two dimensions. Instead of using the complex Green's tensor of the elastic wave equation, we utilize the Helmholtz decomposition to convert the boundary value problem of the elastic wave equation into a coupled boundary value problem of the Helmholtz equation. Based on single, double, and combined layer potentials with the simpler Green's function of the Helmholtz equation, we present three different boundary integral equations for the coupled boundary value problem. The well-posedness of the new integral equations is established. Computationally, a scattering matrix based method is proposed to evaluate the elastic wave for arbitrarily shaped particles. The method uses the local expansion for the incident wave and the multipole expansion for the scattered wave. The linear system of algebraic equations is solved by GMRES with fast multipole method (FMM) acceleration. Numerical results show that the method is fast and highly accurate for solving elastic scattering problems with multiple particles. 
    more » « less
  2. Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrix-vector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent hand-crafting these algorithms and implementations is necessary, what structural prior they encode, and how much knowledge is required to automatically learn a fast algorithm for a provided structured transform. Motivated by a characterization of fast matrix-vector multiplication as products of sparse matrices, we introduce a parameterization of divide-and-conquer methods that is capable of representing a large class of transforms. This generic formulation can automatically learn an efficient algorithm for many important transforms; for example, it recovers the O(N logN) Cooley-Tukey FFT algorithm to machine precision, for dimensions N up to 1024. Furthermore, our method can be incorporated as a lightweight replacement of generic matrices in machine learning pipelines to learn efficient and compressible transformations. On a standard task of compressing a single hidden-layer network, our method exceeds the classification accuracy of unconstrained matrices on CIFAR-10 by 3.9 points—the first time a structured approach has done so—with 4X faster inference speed and 40X fewer parameters. 
    more » « less
  3. We have developed a memory and operation-count efficient 2.5D inversion algorithm of electrical resistivity (ER) data that can handle fine discretization domains imposed by other geophysical (e.g, ground penetrating radar or seismic) data. Due to numerical stability criteria and available computational memory, joint inversion of different types of geophysical data can impose different grid discretization constraints on the model parameters. Our algorithm enables the ER data sensitivities to be directly joined with other geophysical data without the need of interpolating or coarsening the discretization. We have used the adjoint method directly in the discretized Maxwell’s steady state equation to compute the data sensitivity to the conductivity. In doing so, we make no finite-difference approximation on the Jacobian of the data and avoid the need to store large and dense matrices. Rather, we exploit matrix-vector multiplication of sparse matrices and find successful convergence using gradient descent for our inversion routine without having to resort to the Hessian of the objective function. By assuming a 2.5D subsurface, we are able to linearly reduce memory requirements when compared to a 3D gradient descent inversion, and by a power of two when compared to storing a 2D Hessian. Moreover, our method linearly outperforms operation counts when compared with 3D Gauss-Newton conjugate-gradient schemes, which scales cubically in our favor with respect to the thickness of the 3D domain. We physically appraise the domain of the recovered conductivity using a cutoff of the electric current density present in our survey. We evaluate two case studies to assess the validity of our algorithm. First, on a 2.5D synthetic example, and then on field data acquired in a controlled alluvial aquifer, where we were able to match the recovered conductivity to borehole observations. 
    more » « less
  4. We present two (a decoupled and a coupled) integral-equation-based methods for the Morse-Ingard equations subject to Neumann boundary conditions on the exterior domain. Both methods are based on second-kind integral equation (SKIE) formulations. The coupled method is well-conditioned and can achieve high accuracy. The decoupled method has lower computational cost and more flexibility in dealing with the boundary layer; however, it is prone to the ill-conditioning of the decoupling transform and cannot achieve as high accuracy as the coupled method. We show numerical examples using a Nyström method based on quadrature-by-expansion (QBX) with fast-multipole acceleration. We demonstrate the accuracy and efficiency of the solvers in both two and three dimensions with complex geometry. 
    more » « less
  5. Grid-free Monte Carlo methods such aswalk on spherescan be used to solve elliptic partial differential equations without mesh generation or global solves. However, such methods independently estimate the solution at every point, and hence do not take advantage of the high spatial regularity of solutions to elliptic problems. We propose a fast caching strategy which first estimates solution values and derivatives at randomly sampled points along the boundary of the domain (or a local region of interest). These cached values then provide cheap, output-sensitive evaluation of the solution (or its gradient) at interior points, via a boundary integral formulation. Unlike classic boundary integral methods, our caching scheme introduces zero statistical bias and does not require a dense global solve. Moreover we can handle imperfect geometry (e.g., with self-intersections) and detailed boundary/source terms without repairing or resampling the boundary representation. Overall, our scheme is similar in spirit tovirtual point lightmethods from photorealistic rendering: it suppresses the typical salt-and-pepper noise characteristic of independent Monte Carlo estimates, while still retaining the many advantages of Monte Carlo solvers: progressive evaluation, trivial parallelization, geometric robustness,etc.We validate our approach using test problems from visual and geometric computing.

     
    more » « less