Search for: All records

Creators/Authors contains: "Abdelfattah, A."

« Prev Next »

Total Resources

10

Resource Type
Conference Paper

7

Conference Proceeding

0

Dataset

0

Journal Article

3

Workshop Report

0

Availability
Full Text / Resource Available

10

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers

Abdelfattah, A. ; Ghysels, P. ; Boukaram, W. ; Tomov, S. ; Li, X. ; Dongarra, J. ( November 2022 , International Conference for High Performance Computing Networking Storage and Analysis)

Many scientific applications rely on sparse direct solvers for their numerical robustness. However, performance optimization for these solvers remains a challenging task, especially on GPUs. This is due to workloads of small dense matrices that are different in size. Matrix decompositions on such irregular workloads are rarely addressed on GPUs. This paper addresses irregular workloads of matrix computations on GPUs, and their application to accelerate sparse direct solvers. We design an interface for the basic matrix operations supporting problems of different sizes. The interface enables us to develop irrLU-GPU, an LU decomposition on matrices of different sizes. We demonstrate the impact of irrLU-GPU on sparse direct LU solvers using NVIDIA and AMD GPUs. Experimental results are shown for a sparse direct solver based on a multifrontal sparse LU decomposition applied to linear systems arising from the simulation, using finite element discretization on unstructured meshes, of a high-frequency indefinite Maxwell problem.
more » « less
Full Text Available
Gpu-based homotopy continuation for minimal problems in computer vision.

Chien, C. ; Fan H. ; Abdelfattah, A. ; Tsigaridas, E. ; Tomov, S. ; Kimia, B. ( January 2022 , Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition)

Systems of polynomial equations arise frequently in computer vision, especially in multiview geometry problems. Traditional methods for solving these systems typically aim to eliminate variables to reach a univariate polynomial, e.g., a tenth-order polynomial for 5-point pose estimation, using clever manipulations, or more generally using Grobner basis, resultants, and elimination templates, leading to successful algorithms for multiview geometry and other problems. However, these methods do not work when the problem is complex and when they do, they face efficiency and stability issues. Homotopy Continuation (HC) can solve more complex problems without the stability issues, and with guarantees of a global solution, but they are known to be slow. In this paper we show that HC can be parallelized on a GPU, showing significant speedups up to 56 times on polynomial benchmarks. We also show that GPU-HC can be generically applied to a range of computer vision problems, including 4-view triangulation and trifocal pose estimation with unknown focal length, which cannot be solved with elimination template but they can be efficiently solved with HC. GPU-HC opens the door to easy formulation and solution of a range of computer vision problems.
more » « less
Full Text Available
Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices using GPUs

Abdelfattah, A. ; Tomov, S. ; Dongarra, J. ( June 2020 , International Conference on Computational Science (ICCS 2020))
null (Ed.)
Full Text Available
GPU algorithms for efficient exascale discretizations

Abdelfattah, A. ; Barra, V. ; Beams, N. ; Bleile, R. ; Brown J. ; Camier, J. ; Carson R. ; Chalmers, N. ; Dobrev, V. ; Dudouit, Y. ; et al ( September 2021 , Parallel computing)

In this paper, we describe the research and development activities in the Center for Efficient Exascale Discretization within the US Exascale Computing Project, targeting state-of-the-art high-order finite-element algorithms for high-order applications on GPU-accelerated platforms. We discuss the GPU developments in several components of the CEED software stack, including the libCEED, MAGMA, MFEM, libParanumal, and Nek projects. We report performance and capability improvements in several CEED-enabled applications on both NVIDIA and AMD GPU systems.
more » « less
Full Text Available
Fast Batched Matrix Multiplication for Small Sizes using Half Precision Arithmetic on GPUs

Abdelfattah, A. ; Tomov, S. ; Dongarra, J. ( May 2019 , 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS))

Full Text Available
Massively Parallel Automated Software Tuning

Kurzak, J. ; Tsai, M. ; Gates, M. ; Abdelfattah, A. ; Dongarra, J. ( August 2019 , International Conference on Parallel Processing)

This article presents an implementation of a distributed autotuning engine developed as part of the Bench-testing OpenN Software Autotuning Infrastructure project. The system is geared towards performance optimization of computational kernels for graphics processing units, and allows for the deployment of vast autotuning sweeps to massively parallel machines. The software implements dynamic work scheduling to distributed-memory resources and takes advantage of multithreading for parallel compilation and dispatches kernel launches to multiple accelerators. This paper lays out the main design principles of the system and discusses the basic mechanics of the initial implementation. Preliminary performance results are presented, encountered challenges are discussed, and the future directions are outlined.
more » « less
Full Text Available
Progressive Optimization of Batched LU Factorization on GPUs

Abdelfattah, A. ; Tomov, S. ; Dongarra, J. ( September 2018 , IEEE High Performance Extreme Computing Conference (HPEC’18))

Full Text Available
Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices

Masliah, I. ; Abdelfattah, A ; Haidar, A. ; Tomov, S. ; Baboulin, M. ; Falcou, J. ; Dongarra, J. ( January 2019 , Parallel computing)

Full Text Available
Efficient exascale discretizations: High-order finite element methods

Kolev, T. ; Fischer, P. ; Min, M. ; Dongarra, J. ; Brown, J. ; Dobrev, V. ; Warburton, T. ; Tomov, S. ; Shephard, M. ; Abdelfattah, A. ; et al ( January 2021 , The international journal of high performance computing applications)

Efficient exploitation of exascale architectures requires rethinking of the numerical algorithms used in many large-scale applications. These architectures favor algorithms that expose ultra fine-grain parallelism and maximize the ratio of floating point operations to energy intensive data movement. One of the few viable approaches to achieve high efficiency in the area of PDE discretizations on unstructured grids is to use matrix-free/partially assembled high-order finite element methods, since these methods can increase the accuracy and/or lower the computational time due to reduced data motion. In this paper we provide an overview of the research and development activities in the Center for Efficient Exascale Discretizations (CEED), a co-design center in the Exascale Computing Project that is focused on the development of next-generation discretization software and algorithms to enable a wide range of finite element applications to run efficiently on future hardware. CEED is a research partnership involving more than 30 computational scientists from two US national labs and five universities, including members of the Nek5000, MFEM, MAGMA and PETSc projects. We discuss the CEED co-design activities based on targeted benchmarks, miniapps and discretization libraries and our work on performance optimizations for large-scale GPU architectures. We also provide a broad overview of research and development activities in areas such as unstructured adaptive mesh refinement algorithms, matrix-free linear solvers, high-order data visualization, and list examples of collaborations with several ECP and external applications.
more » « less
Full Text Available
Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU clusters

Yamazaki, I. ; Abdelfattah, A. ; Ida, A. ; Ohshima, S. ; Tomov, S. ; Yokota, R. ; Dongarra, J. ( May 2018 , IEEE International Parallel and Distributed Processing Symposium (IPDPS))

ppohBEM is an open-source software package im- plementing the boundary element method. One of its main software tasks is the solution of the dense linear system of equations, for which, ppohBEM relies on another software package called HACApK. To reduce the cost of solving the linear system, HACApK hierarchically compresses the coefficient matrix using adaptive cross approximation. This hierarchical compression greatly reduces the storage and time complexities of the solver and enables the solution of large-scale boundary value problems. To extend the capability of ppohBEM, in this paper, we carefully port the HACApK’s linear solver onto GPU clusters. Though the potential of the GPUs has been widely accepted in high-performance computing, it is still a challenge to utilize the GPUs for a solver, like HACApK’s, that requires fine-grained computation and global communication. First, to utilize the GPUs, we integrate the batched GPU kernel that was recently released in the MAGMA software package. We discuss several techniques to improve the performance of the batched kernel. We then study various techniques to address the inter-GPU communication and study their effects on state-of- the-art GPU clusters. We believe that the techniques studied in this paper are of interest to a wide range of software packages running on GPUs, especially with the increasingly complex node architectures and the growing costs of the communication. We also hope that our efforts to integrate the GPU kernel or to setup the inter-GPU communication will influence the design of the future-generation batched kernels or the communication layer within a software stack.
more » « less
Full Text Available