Search for: All records

Creators/Authors contains: "Dongarra, J."

« Prev Next »

Total Resources

21

Resource Type
Conference Paper

15

Conference Proceeding

0

Dataset

0

Journal Article

6

Workshop Report

0

Availability
Full Text / Resource Available

21

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers

Abdelfattah, A. ; Ghysels, P. ; Boukaram, W. ; Tomov, S. ; Li, X. ; Dongarra, J. ( November 2022 , International Conference for High Performance Computing Networking Storage and Analysis)

Many scientific applications rely on sparse direct solvers for their numerical robustness. However, performance optimization for these solvers remains a challenging task, especially on GPUs. This is due to workloads of small dense matrices that are different in size. Matrix decompositions on such irregular workloads are rarely addressed on GPUs. This paper addresses irregular workloads of matrix computations on GPUs, and their application to accelerate sparse direct solvers. We design an interface for the basic matrix operations supporting problems of different sizes. The interface enables us to develop irrLU-GPU, an LU decomposition on matrices of different sizes. We demonstrate the impact of irrLU-GPU on sparse direct LU solvers using NVIDIA and AMD GPUs. Experimental results are shown for a sparse direct solver based on a multifrontal sparse LU decomposition applied to linear systems arising from the simulation, using finite element discretization on unstructured meshes, of a high-frequency indefinite Maxwell problem.
more » « less
Full Text Available
Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs

https://doi.org/10.1109/CLUSTER51413.2022.00029

Cayrols, S. ; Li, J. ; Bosilca, G. ; Tomov, S. ; Ayala, A. ; Dongarra, J. ( September 2022 , 2022 IEEE International Conference on Cluster Computing (CLUSTER))

In the context of parallel applications, communication is a critical part of the infrastructure and a potential bottleneck. The traditional approach to tackle communication challenges consists of redesigning algorithms so that the complexity or the communication volume is reduced. However, there are algorithms like the Fast Fourier Transform (FFT) where reducing the volume of communication is very challenging yet can reap large benefit in terms of time-to-completion. In this paper, we revisit the implementation of the MPI all-to-all routine at the core of 3D FFTs by using advanced MPI features, such as One-Sided Communication, and integrate data compression during communication to reduce the volume of data exchanged. Since some compression techniques are ‘lossy’ in the sense that they involve a loss of accuracy, we study the impact of lossy compression in heFFTe, the state-of-the-art FFT library for large scale 3D FFTs on hybrid architectures with GPUs. Consequently, we design an approximate FFT algorithm that trades off user-controlled accuracy for speed. We show that we speedup the 3D FFTs proportionally to the compression rate. In terms of accuracy, comparing our approach with a reduced precision execution, where both the data and the computation are in reduced precision, we show that when the volume of communication is compressed to the size of the reduced precision data, the approximate FFT algorithm is as fast as the one in reduced precision while the accuracy is one order of magnitude better.
more » « less
Full Text Available
Accelerating restarted GMRES with mixed precision arithmetic

Lindquist, N. ; Luszczek, P. ; and Dongarra, J. ( June 2021 , IEEE transactions on parallel and distributed systems)

The generalized minimum residual method (GMRES) is a commonly used iterative Krylov solver for sparse, non-symmetric systems of linear equations. Like other iterative solvers, data movement dominates its run time. To improve this performance, we propose running GMRES in reduced precision with key operations remaining in full precision. Additionally, we provide theoretical results linking the convergence of finite precision GMRES with classical Gram-Schmidt with reorthogonalization (CGSR) and its infinite precision counterpart which helps justify the convergence of this method to double-precision accuracy. We tested the mixed-precision approach with a variety of matrices and preconditioners on a GPU-accelerated node. Excluding the incomplete LU factorization without fill in (ILU(0)) preconditioner, we achieved average speedups ranging from 8 to 61 percent relative to comparable double-precision implementations, with the simpler preconditioners achieving the higher speedups.
more » « less
Full Text Available
Replacing Pivoting in Distributed Gaussian Elimination with Randomized Techniques

Lindquist, N. ; Luszczek, P. ; Dongarra, J. ( November 2020 , 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA))
null (Ed.)
Full Text Available
Improving the Performance of the GMRES Method using Mixed-Precision Techniques

Lindquist, N. ; Luszczek, P. ; Dongarra, J. ( August 2020 , Smoky Mountains Computational Sciences & Engineering Conference (SMC2020))
null (Ed.)
Full Text Available
Translational Process: Mathematical Software Perspective

Dongarra, J. ; Gates, M. ; Luszczek, P. ; Tomov, S. ( September 2020 , Journal of computational science)
null (Ed.)
Full Text Available
Scalable Data Generation for Evaluating Mixed-Precision Solvers

Luszczek, P. ; Tsai, Y. ; Lindquist, N. ; Anzt, H. ; Dongarra, J. ( September 2020 , 2020 IEEE High Performance Extreme Computing Conference (HPEC))
null (Ed.)
Full Text Available
Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices using GPUs

Abdelfattah, A. ; Tomov, S. ; Dongarra, J. ( June 2020 , International Conference on Computational Science (ICCS 2020))
null (Ed.)
Full Text Available
Using Arm Scalable Vector Extension to Optimize Open MPI

Zhong, D. ; Shamis, P. ; Cao, Q. ; Bosilca, G. ; Dongarra, J. ( May 2020 , 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020))
null (Ed.)
Full Text Available
MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing

https://doi.org/10.1007/978-3-030-34356-9_37

Nichols, D. ; Tomov, N. ; Betancourt, F. ; Tomov, S. ; Wong, K. ; Dongarra, J. ( December 2019 , Lecture notes in computer science)

In this paper, we present work towards the development of a new data analytics and machine learning (ML) framework, called MagmaDNN. Our main goal is to provide scalable, high-performance data analytics and ML solutions for scientific applications running on current and upcoming heterogeneous many-core GPU-accelerated architectures. To this end, since many of the functionalities needed are based on standard linear algebra (LA) routines, we designed MagmaDNN to derive its performance power from the MAGMA library. The close integration provides the fundamental (scalable high-performance) LA routines available in MAGMA as a backend to MagmaDNN. We present some design issues for performance and scalability that are specific to ML using Deep Neural Networks (DNN), as well as the MagmaDNN designs towards overcoming them. In particular, MagmaDNN uses well established HPC techniques from the area of dense LA, including task-based parallelization, DAG representations, scheduling, mixed-precision algorithms, asynchronous solvers, and autotuned hyperparameter optimization. We illustrate these techniques and their incorporation and use to outperform other frameworks, currently available.
more » « less
Full Text Available

« Prev Next »