NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Extending MAGMA Portability with OneAPI

https://doi.org/10.1109/WACCPD56842.2022.00008

Fortenberry, Anna; Tomov, Stanimire (November 2022, SC 2022 Workshop on Accelerator Programming Using Directives (WACCPD))

Full Text Available
A Python Library for Matrix Algebra on GPU and Multicore Architectures

https://doi.org/10.1109/MASS56207.2022.00121

Nance, Delario; Tomov, Stanimire; Wong, Kwai (October 2022, 2022 IEEE 19th International Conference on Mobile Ad Hoc and Smart Systems (MASS))

Full Text Available
PAQR: Pivoting Avoiding QR factorization

https://doi.org/10.1109/IPDPS54959.2023.00040

Sid-Lakhdar, Wissam; Cayrols, Sebastien; Bielich, Daniel; Abdelfattah, Ahmad; Luszczek, Piotr; Gates, Mark; Tomov, Stanimire; Johansen, Hans; Williams-Young, David; Davis, Timothy; et al (May 2023, 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS))
Matrix multiplication on batches of small matrices in half and half-complex precisions

https://doi.org/10.1016/j.jpdc.2020.07.001

Abdelfattah, Ahmad; Tomov, Stanimire; Dongarra, Jack (November 2020, Journal of Parallel and Distributed Computing)

Full Text Available
Exploiting Block Structures of KKT Matrices for Efficient Solution of Convex Optimization Problems

https://doi.org/10.1109/ACCESS.2021.3106054

Iqbal, Zafar; Nooshabadi, Saeid; Yamazaki, Ichitaro; Tomov, Stanimire; Dongarra, Jack (January 2021, IEEE Access)
Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs

https://doi.org/10.1109/HPEC43674.2020.9286214

Brown, Cade; Abdelfattah, Ahmad; Tomov, Stanimire; Dongarra, Jack (September 2020, 2020 IEEE High Performance Extreme Computing Conference (HPEC))
null (Ed.)
Dense linear algebra (DLA) has historically been in the vanguard of software that must be adapted first to hardware changes. This is because DLA is both critical to the accuracy and performance of so many different types of applications, and because they have proved to be outstanding vehicles for finding and implementing solutions to the problems that novel architectures pose. Therefore, in this paper we investigate the portability of the MAGMA DLA library to the latest AMD GPUs.We use auto tools to convert the CUDA code in MAGMA to the Heterogeneous-Computing Interface for Portability (HIP) language. MAGMA provides LAPACK for GPUs and benchmarks for fundamental DLA routines ranging from BLAS to dense factorizations, linear systems and eigen-problem solvers. We port these routines to HIP and quantify currently achievable performance through the MAGMA benchmarks for the main workload algorithms on MI25 and MI50 AMD GPUs. Comparison with performance roofline models and theoretical expectations are used to identify current limitations and directions for future improvements.
more » « less
Full Text Available
Asynchronous SGD for DNN Training on Shared-memory Parallel Architectures

https://doi.org/10.1109/IPDPSW50202.2020.00168

Lopez, Florent; Chow, Edmond; Tomov, Stanimire; Dongarra, Jack (May 2020, 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW))
null (Ed.)
Full Text Available
A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines

https://doi.org/10.1145/3431921

Abdelfattah, Ahmad; Costa, Timothy; Dongarra, Jack; Gates, Mark; Haidar, Azzam; Hammarling, Sven; Higham, Nicholas J.; Kurzak, Jakub; Luszczek, Piotr; Tomov, Stanimire; et al (June 2021, ACM Transactions on Mathematical Software)
null (Ed.)
This article describes a standard API for a set of Batched Basic Linear Algebra Subprograms (Batched BLAS or BBLAS). The focus is on many independent BLAS operations on small matrices that are grouped together and processed by a single routine, called a Batched BLAS routine. The matrices are grouped together in uniformly sized groups, with just one group if all the matrices are of equal size. The aim is to provide more efficient, but portable, implementations of algorithms on high-performance many-core platforms. These include multicore and many-core CPU processors, GPUs and coprocessors, and other hardware accelerators with floating-point compute facility. As well as the standard types of single and double precision, we also include half and quadruple precision in the standard. In particular, half precision is used in many very large scale applications, such as those associated with machine learning.
more » « less
Full Text Available
Project-Based Research and Training in High Performance Data Sciences, Data Analytics, and Machine Learning

https://doi.org/10.22369/issn.2153-4136/11/1/7

Wong, Kwai; Tomov, Stanimire; Dongarra, Jack (January 2020, The Journal of Computational Science Education)

Full Text Available
MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing

Nichols, Daniel; Tomov, Nathalie; Betancourt, Frank; Tomov, Stanimire; Wong, Kwai; Dongarra, Jack (December 2019, Proceedings International Conference on High Performance Computing)

In this paper, we present work towards the development of a new data analytics and machine learning (ML) framework, called MagmaDNN. Our main goal is to provide scalable, high-performance data analytics and ML solutions for scientific applications running on current and upcoming heterogeneous many-core GPU-accelerated architectures. To this end, since many of the functionalities needed are based on standard linear algebra (LA) routines, we designed MagmaDNN to derive its performance power from the MAGMA library. The close integration provides the fundamental (scalable high-performance) LA routines available in MAGMA as a backend to MagmaDNN. We present some design issues for performance and scalability that are specific to ML using Deep Neural Networks (DNN), as well as the MagmaDNN designs towards overcoming them. In particular, MagmaDNN uses well established HPC techniques from the area of dense LA, including task-based parallelization, DAG representations, scheduling, mixed-precision algorithms, asynchronous solvers, and autotuned hyperparameter optimization. We illustrate these techniques and their incorporation and use to outperform other frameworks, currently available.
more » « less
Full Text Available

« Prev Next »

Search for: All records