Consider the elastic scattering of a timeharmonic wave by multiple wellseparated rigid particles with smooth boundaries in two dimensions. Instead of using the complex Green's tensor of the elastic wave equation, we utilize the Helmholtz decomposition to convert the boundary value problem of the elastic wave equation into a coupled boundary value problem of the Helmholtz equation. Based on single, double, and combined layer potentials with the simpler Green's function of the Helmholtz equation, we present three different boundary integral equations for the coupled boundary value problem. The wellposedness of the new integral equations is established. Computationally, a scattering matrix based method is proposed to evaluate the elastic wave for arbitrarily shaped particles. The method uses the local expansion for the incident wave and the multipole expansion for the scattered wave. The linear system of algebraic equations is solved by GMRES with fast multipole method (FMM) acceleration. Numerical results show that the method is fast and highly accurate for solving elastic scattering problems with multiple particles.
more »
« less
A Scalable Hierarchical SemiSeparable Library for Heterogeneous Clusters
We present a scalable distributed memory library for generating and computations involving structured dense matrices, such as those produced by boundary integral equation formulations. Such matrices are dense, but have special structure that can be exploited to obtain efficient storage and matrixvector product evaluations and consequently the fast solution of linear systems. At the core of the methods we use is the observation that offdiagonal matrix blocks of such matrices have a low numerical rank, and that this property can be exploited in a multilevel fashion. In this work we focus on the Hierarchically SemiSeparable (HSS) representation. We present algorithms for building and using HSS representations that are parallelized using MPI and CUDA to leverage stateoftheart heterogeneous clusters. The efficiency of our methods and implementation is demonstrated on large dense matrices obtained from a boundary integral equation formulation of the Laplace equation with Dirichlet boundary conditions. We demonstrate excellent (linear) scalability on up to 128 GPUs on 128 nodes. Our codes will lay the foundation for fast direct solvers for elliptic problems.
more »
« less
 NSFPAR ID:
 10067882
 Date Published:
 Journal Name:
 46th International Conference on Parallel Processing (ICPP)
 Page Range / eLocation ID:
 513 to 522
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We present two (a decoupled and a coupled) integralequationbased methods for the MorseIngard equations subject to Neumann boundary conditions on the exterior domain. Both methods are based on secondkind integral equation (SKIE) formulations. The coupled method is wellconditioned and can achieve high accuracy. The decoupled method has lower computational cost and more flexibility in dealing with the boundary layer; however, it is prone to the illconditioning of the decoupling transform and cannot achieve as high accuracy as the coupled method. We show numerical examples using a Nyström method based on quadraturebyexpansion (QBX) with fastmultipole acceleration. We demonstrate the accuracy and efficiency of the solvers in both two and three dimensions with complex geometry.more » « less

Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrixvector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent handcrafting these algorithms and implementations is necessary, what structural prior they encode, and how much knowledge is required to automatically learn a fast algorithm for a provided structured transform. Motivated by a characterization of fast matrixvector multiplication as products of sparse matrices, we introduce a parameterization of divideandconquer methods that is capable of representing a large class of transforms. This generic formulation can automatically learn an efficient algorithm for many important transforms; for example, it recovers the O(N logN) CooleyTukey FFT algorithm to machine precision, for dimensions N up to 1024. Furthermore, our method can be incorporated as a lightweight replacement of generic matrices in machine learning pipelines to learn efficient and compressible transformations. On a standard task of compressing a single hiddenlayer network, our method exceeds the classification accuracy of unconstrained matrices on CIFAR10 by 3.9 points—the first time a structured approach has done so—with 4X faster inference speed and 40X fewer parameters.more » « less

The distance matrix of a dataset X of n points with respect to a distance function f represents all pairwise distances between points in X induced by f. Due to their wide applicability, distance matrices and related families of matrices have been the focus of many recent algorithmic works. We continue this line of research and take a broad view of algorithm design for distance matrices with the goal of designing fast algorithms, which are specifically tailored for distance matrices, for fundamental linear algebraic primitives. Our results include efficient algorithms for computing matrixvector products for a wide class of distance matrices, such as the l1 metric for which we get a linear runtime, as well as a quadratic lower bound for any algorithm which computes a matrixvector product for the l_infty case. Our upper bound results have many further downstream applications, including the fastest algorithm for computing a relative error lowrank approximation for the distance matrix induced by l1 and l2 functions and the fastest algorithm for computing an additive error lowrank approximation for the l2 metric, in addition to applications for fast matrix multiplication among others. We also give algorithms for constructing distance matrices and show that one can construct an approximate l2 distance matrix in time faster than the bound implied by the JohnsonLindenstrauss lemma.more » « less

The number of nonnegative integer matrices with given row and column sums features in a variety of problems in mathematics and statistics but no closedform expression for it is known, so we rely on approximations. In this paper, we describe a new such approximation, motivated by consideration of the statistics of matrices with noninteger numbers of columns. This estimate can be evaluated in time linear in the size of the matrix and returns results of accuracy as good as or better than existing lineartime approximations across a wide range of settings. We show that the estimate is asymptotically exact in the regime of sparse tables, while empirically performing at least as well as other lineartime estimates in the regime of dense tables. We also use the new estimate as the starting point for an improved numerical method for either counting or sampling matrices with given margins using sequential importance sampling. Code implementing our methods is available.more » « less