Evolution of the SLATE linear algebra library

Gates, Mark  (ORCID:0000000329961641); Abdelfattah, Ahmad  (ORCID:0000000150544784); Akbudak, Kadir  (ORCID:0000000210571590); Al_Farhan, Mohammed; Alomairy, Rabab  (ORCID:0000000199116094); Bielich, Daniel  (ORCID:000000015731368X); Burgess, Treece; Cayrols, Sébastien  (ORCID:0000000337408985); Lindquist, Neil; Sukkari, Dalal; YarKhan, Asim  (ORCID:0000000239019695)

doi:10.1177/10943420241286531

Citation Details

Evolution of the SLATE linear algebra library

SLATE (Software for Linear Algebra Targeting Exascale) is a distributed, dense linear algebra library targeting both CPU-only and GPU-accelerated systems, developed over the course of the Exascale Computing Project (ECP). While it began with several documents setting out its initial design, significant design changes occurred throughout its development. In some cases, these were anticipated: an early version used a simple consistency flag that was later replaced with a full-featured consistency protocol. In other cases, performance limitations and software and hardware changes prompted a redesign. Sequential communication tasks were parallelized; host-to-host MPI calls were replaced with GPU device-to-device MPI calls; more advanced algorithms such as Communication Avoiding LU and the Random Butterfly Transform (RBT) were introduced. Early choices that turned out to be cumbersome, error prone, or inflexible have been replaced with simpler, more intuitive, or more flexible designs. Applications have been a driving force, prompting a lighter weight queue class, nonuniform tile sizes, and more flexible MPI process grids. Of paramount importance has been building a portable library that works across several different GPU architectures – AMD, Intel, and NVIDIA – while keeping a clean and maintainable codebase. Here we explore the evolving design choices and their effects, both in terms of performance and software sustainability. more »

Award ID(s):: 2004541

PAR ID:: 10544667

Author(s) / Creator(s):: Gates, Mark ; Abdelfattah, Ahmad ; Akbudak, Kadir ; Al_Farhan, Mohammed ; Alomairy, Rabab ; Bielich, Daniel ; Burgess, Treece ; Cayrols, Sébastien ; Lindquist, Neil ; Sukkari, Dalal ; YarKhan, Asim

Publisher / Repository:: SAGE Publications

Date Published:: 2024-09-27

Journal Name:: The International Journal of High Performance Computing Applications

Volume:: 39

Issue:: 1

ISSN:: 1094-3420

Format(s):: Medium: X Size: p. 3-17

Size(s):: p. 3-17

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1177/10943420241286531

More Like this