skip to main content

Title: A Scalable Hierarchical Semi-Separable Library for Heterogeneous Clusters
We present a scalable distributed memory library for generating and computations involving structured dense matrices, such as those produced by boundary integral equation formulations. Such matrices are dense, but have special structure that can be exploited to obtain efficient storage and matrix-vector product evaluations and consequently the fast solution of linear systems. At the core of the methods we use is the observation that off-diagonal matrix blocks of such matrices have a low numerical rank, and that this property can be exploited in a multi-level fashion. In this work we focus on the Hierarchically Semi-Separable (HSS) representation. We present algorithms for building and using HSS representations that are parallelized using MPI and CUDA to leverage state-of-the-art heterogeneous clusters. The efficiency of our methods and implementation is demonstrated on large dense matrices obtained from a boundary integral equation formulation of the Laplace equation with Dirichlet boundary conditions. We demonstrate excellent (linear) scalability on up to 128 GPUs on 128 nodes. Our codes will lay the foundation for fast direct solvers for elliptic problems.
; ; ;
Award ID(s):
1464244 1643056
Publication Date:
Journal Name:
46th International Conference on Parallel Processing (ICPP)
Page Range or eLocation-ID:
513 to 522
Sponsoring Org:
National Science Foundation
More Like this
  1. Consider the elastic scattering of a time-harmonic wave by multiple well-separated rigid particles with smooth boundaries in two dimensions. Instead of using the complex Green's tensor of the elastic wave equation, we utilize the Helmholtz decomposition to convert the boundary value problem of the elastic wave equation into a coupled boundary value problem of the Helmholtz equation. Based on single, double, and combined layer potentials with the simpler Green's function of the Helmholtz equation, we present three different boundary integral equations for the coupled boundary value problem. The well-posedness of the new integral equations is established. Computationally, a scattering matrix based method is proposed to evaluate the elastic wave for arbitrarily shaped particles. The method uses the local expansion for the incident wave and the multipole expansion for the scattered wave. The linear system of algebraic equations is solved by GMRES with fast multipole method (FMM) acceleration. Numerical results show that the method is fast and highly accurate for solving elastic scattering problems with multiple particles.
  2. Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrix-vector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent hand-crafting these algorithms and implementations is necessary, what structural prior they encode, and how much knowledge is required to automatically learn a fast algorithm for a provided structured transform. Motivated by a characterization of fast matrix-vector multiplication as products of sparse matrices, we introduce a parameterization of divide-and-conquer methods that is capable of representing a large class of transforms. This generic formulation can automatically learn an efficient algorithm for many important transforms; for example, it recovers the O(N logN) Cooley-Tukey FFT algorithm to machine precision, for dimensions N up to 1024. Furthermore, our method can be incorporated as a lightweight replacement of generic matrices in machine learning pipelines to learn efficient and compressible transformations. On a standard task of compressing a single hidden-layer network, our method exceeds the classification accuracy of unconstrained matrices on CIFAR-10 by 3.9 points—the first time a structured approach has done so—with 4X faster inference speed andmore »40X fewer parameters.« less
  3. We have developed a memory and operation-count efficient 2.5D inversion algorithm of electrical resistivity (ER) data that can handle fine discretization domains imposed by other geophysical (e.g, ground penetrating radar or seismic) data. Due to numerical stability criteria and available computational memory, joint inversion of different types of geophysical data can impose different grid discretization constraints on the model parameters. Our algorithm enables the ER data sensitivities to be directly joined with other geophysical data without the need of interpolating or coarsening the discretization. We have used the adjoint method directly in the discretized Maxwell’s steady state equation to compute the data sensitivity to the conductivity. In doing so, we make no finite-difference approximation on the Jacobian of the data and avoid the need to store large and dense matrices. Rather, we exploit matrix-vector multiplication of sparse matrices and find successful convergence using gradient descent for our inversion routine without having to resort to the Hessian of the objective function. By assuming a 2.5D subsurface, we are able to linearly reduce memory requirements when compared to a 3D gradient descent inversion, and by a power of two when compared to storing a 2D Hessian. Moreover, our method linearly outperforms operationmore »counts when compared with 3D Gauss-Newton conjugate-gradient schemes, which scales cubically in our favor with respect to the thickness of the 3D domain. We physically appraise the domain of the recovered conductivity using a cutoff of the electric current density present in our survey. We evaluate two case studies to assess the validity of our algorithm. First, on a 2.5D synthetic example, and then on field data acquired in a controlled alluvial aquifer, where we were able to match the recovered conductivity to borehole observations.« less
  4. Recent developments in the computational automated design of electromagnetic devices, otherwise known as inverse design, have significantly enhanced the design process for nanophotonic systems. Inverse design can both reduce design time considerably and lead to high-performance, nonintuitive structures that would otherwise have been impossible to develop manually. Despite the successes enjoyed by structure optimization techniques, most approaches leverage electromagnetic solvers that require significant computational resources and suffer from slow convergence and numerical dispersion. Recently, a fast simulation and boundary-based inverse design approach based on boundary integral equations was demonstrated for two-dimensional nanophotonic problems. In this work, we introduce a new full-wave three-dimensional simulation and boundary-based optimization framework for nanophotonic devices also based on boundary integral methods, which achieves high accuracy even at coarse mesh discretizations while only requiring modest computational resources. The approach has been further accelerated by leveraging GPU computing, a sparse block-diagonal preconditioning strategy, and a matrix-free implementation of the discrete adjoint method. As a demonstration, we optimize three different devices: a 1:2 1550 nm power splitter and two nonadiabatic mode-preserving waveguide tapers. To the best of our knowledge, the tapers, which span 40 wavelengths in the silicon material, are the largest silicon photonic waveguiding devices to havemore »been optimized using full-wave 3D solution of Maxwell’s equations.« less
  5. The Cable equation is one of the most fundamental equations for modeling neuronal dynamics. In this article, we consider a high order compact finite difference numerical solution for the fractional Cable equation, which is a generalization of the classical Cable equation by taking into account the anomalous diffusion in the movement of the ions in neuronal system. The resulting finite difference scheme is unconditionally stable and converges with the convergence order ofin maximum norm, 1‐norm and 2‐norm. Furthermore, we present a fast solution technique to accelerate Toeplitz matrix‐vector multiplications arising from finite difference discretization. This fast solution technique is based on a fast Fourier transform and depends on the special structure of coefficient matrices, and it helps to reduce the computational work fromrequired by traditional methods towithout using any lossy compression, whereandτis the size of time step,andhis the size of space step. Moreover, we give a compact finite difference scheme and consider its stability analysis for two‐dimensional fractional Cable equation. The applicability and accuracy of the scheme are demonstrated by numerical experiments to support our theoretical analysis.