skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Title: A Scalable Hierarchical Semi-Separable Library for Heterogeneous Clusters
We present a scalable distributed memory library for generating and computations involving structured dense matrices, such as those produced by boundary integral equation formulations. Such matrices are dense, but have special structure that can be exploited to obtain efficient storage and matrix-vector product evaluations and consequently the fast solution of linear systems. At the core of the methods we use is the observation that off-diagonal matrix blocks of such matrices have a low numerical rank, and that this property can be exploited in a multi-level fashion. In this work we focus on the Hierarchically Semi-Separable (HSS) representation. We present algorithms for building and using HSS representations that are parallelized using MPI and CUDA to leverage state-of-the-art heterogeneous clusters. The efficiency of our methods and implementation is demonstrated on large dense matrices obtained from a boundary integral equation formulation of the Laplace equation with Dirichlet boundary conditions. We demonstrate excellent (linear) scalability on up to 128 GPUs on 128 nodes. Our codes will lay the foundation for fast direct solvers for elliptic problems.  more » « less
Award ID(s):
1464244 1643056
PAR ID:
10067882
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
46th International Conference on Parallel Processing (ICPP)
Page Range / eLocation ID:
513 to 522
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Consider the elastic scattering of a time-harmonic wave by multiple well-separated rigid particles with smooth boundaries in two dimensions. Instead of using the complex Green's tensor of the elastic wave equation, we utilize the Helmholtz decomposition to convert the boundary value problem of the elastic wave equation into a coupled boundary value problem of the Helmholtz equation. Based on single, double, and combined layer potentials with the simpler Green's function of the Helmholtz equation, we present three different boundary integral equations for the coupled boundary value problem. The well-posedness of the new integral equations is established. Computationally, a scattering matrix based method is proposed to evaluate the elastic wave for arbitrarily shaped particles. The method uses the local expansion for the incident wave and the multipole expansion for the scattered wave. The linear system of algebraic equations is solved by GMRES with fast multipole method (FMM) acceleration. Numerical results show that the method is fast and highly accurate for solving elastic scattering problems with multiple particles. 
    more » « less
  2. Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrix-vector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent hand-crafting these algorithms and implementations is necessary, what structural prior they encode, and how much knowledge is required to automatically learn a fast algorithm for a provided structured transform. Motivated by a characterization of fast matrix-vector multiplication as products of sparse matrices, we introduce a parameterization of divide-and-conquer methods that is capable of representing a large class of transforms. This generic formulation can automatically learn an efficient algorithm for many important transforms; for example, it recovers the O(N logN) Cooley-Tukey FFT algorithm to machine precision, for dimensions N up to 1024. Furthermore, our method can be incorporated as a lightweight replacement of generic matrices in machine learning pipelines to learn efficient and compressible transformations. On a standard task of compressing a single hidden-layer network, our method exceeds the classification accuracy of unconstrained matrices on CIFAR-10 by 3.9 points—the first time a structured approach has done so—with 4X faster inference speed and 40X fewer parameters. 
    more » « less
  3. We present two (a decoupled and a coupled) integral-equation-based methods for the Morse-Ingard equations subject to Neumann boundary conditions on the exterior domain. Both methods are based on second-kind integral equation (SKIE) formulations. The coupled method is well-conditioned and can achieve high accuracy. The decoupled method has lower computational cost and more flexibility in dealing with the boundary layer; however, it is prone to the ill-conditioning of the decoupling transform and cannot achieve as high accuracy as the coupled method. We show numerical examples using a Nyström method based on quadrature-by-expansion (QBX) with fast-multipole acceleration. We demonstrate the accuracy and efficiency of the solvers in both two and three dimensions with complex geometry. 
    more » « less
  4. The distance matrix of a dataset X of n points with respect to a distance function f represents all pairwise distances between points in X induced by f. Due to their wide applicability, distance matrices and related families of matrices have been the focus of many recent algorithmic works. We continue this line of research and take a broad view of algorithm design for distance matrices with the goal of designing fast algorithms, which are specifically tailored for distance matrices, for fundamental linear algebraic primitives. Our results include efficient algorithms for computing matrix-vector products for a wide class of distance matrices, such as the l1 metric for which we get a linear runtime, as well as a quadratic lower bound for any algorithm which computes a matrix-vector product for the l_infty case. Our upper bound results have many further downstream applications, including the fastest algorithm for computing a relative error low-rank approximation for the distance matrix induced by l1 and l2 functions and the fastest algorithm for computing an additive error lowrank approximation for the l2 metric, in addition to applications for fast matrix multiplication among others. We also give algorithms for constructing distance matrices and show that one can construct an approximate l2 distance matrix in time faster than the bound implied by the Johnson-Lindenstrauss lemma. 
    more » « less
  5. The number of non-negative integer matrices with given row and column sums features in a variety of problems in mathematics and statistics but no closed-form expression for it is known, so we rely on approximations. In this paper, we describe a new such approximation, motivated by consideration of the statistics of matrices with non-integer numbers of columns. This estimate can be evaluated in time linear in the size of the matrix and returns results of accuracy as good as or better than existing linear-time approximations across a wide range of settings. We show that the estimate is asymptotically exact in the regime of sparse tables, while empirically performing at least as well as other linear-time estimates in the regime of dense tables. We also use the new estimate as the starting point for an improved numerical method for either counting or sampling matrices with given margins using sequential importance sampling. Code implementing our methods is available. 
    more » « less