A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices

Sao, Piyush; Li, Xiaoye; Vuduc, Richard

Citation Details

We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm uses a three-dimensional PI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced per-process communication. We also analyze the asymptotic improvements for planar graphs (e.g., from 2D grid or mesh domains) and certain non-planar graphs (specifically for 3D grids and meshes). For planar graphs with n vertices, our algorithm reduces communication volume asymptotically in n by a factor of O(sqrt(logn)) and latency by a factor of O(logn). For non-planar cases, our algorithm can reduce the per-process communication volume by 3× and latency by O(n^1/3) times. In all cases, the memory needed to achieve these gains is a constant factor. We implemented our algorithm by extending the 2D data structure used in SuperLU_DIST. Our new 3D code achieves speedups up to 27× for planar graphs and up to 3.3× for non-planar graphs over the baseline 2D SuperLU_DIST when run on 24,000 cores of a Cray XC30. more »

Award ID(s):: 1710371

PAR ID:: 10066186

Author(s) / Creator(s):: Sao, Piyush; Li, Xiaoye; Vuduc, Richard

Date Published:: 2018-05-01

Journal Name:: Proceedings - IEEE International Parallel and Distributed Processing Symposium

ISSN:: 1530-2075

Page Range / eLocation ID:: 908-919

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Conference Paper:
The DOI is not currently available.

More Like this