Search for: All records

Creators/Authors contains: "Tomov, S."

« Prev Next »

Total Resources

15

Resource Type
Conference Paper

10

Conference Proceeding

0

Dataset

0

Journal Article

5

Workshop Report

0

Availability
Full Text / Resource Available

15

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers

Abdelfattah, A. ; Ghysels, P. ; Boukaram, W. ; Tomov, S. ; Li, X. ; Dongarra, J. ( November 2022 , International Conference for High Performance Computing Networking Storage and Analysis)

Many scientific applications rely on sparse direct solvers for their numerical robustness. However, performance optimization for these solvers remains a challenging task, especially on GPUs. This is due to workloads of small dense matrices that are different in size. Matrix decompositions on such irregular workloads are rarely addressed on GPUs. This paper addresses irregular workloads of matrix computations on GPUs, and their application to accelerate sparse direct solvers. We design an interface for the basic matrix operations supporting problems of different sizes. The interface enables us to develop irrLU-GPU, an LU decomposition on matrices of different sizes. We demonstrate the impact of irrLU-GPU on sparse direct LU solvers using NVIDIA and AMD GPUs. Experimental results are shown for a sparse direct solver based on a multifrontal sparse LU decomposition applied to linear systems arising from the simulation, using finite element discretization on unstructured meshes, of a high-frequency indefinite Maxwell problem.
more » « less
Full Text Available
Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs

https://doi.org/10.1109/CLUSTER51413.2022.00029

Cayrols, S. ; Li, J. ; Bosilca, G. ; Tomov, S. ; Ayala, A. ; Dongarra, J. ( September 2022 , 2022 IEEE International Conference on Cluster Computing (CLUSTER))

In the context of parallel applications, communication is a critical part of the infrastructure and a potential bottleneck. The traditional approach to tackle communication challenges consists of redesigning algorithms so that the complexity or the communication volume is reduced. However, there are algorithms like the Fast Fourier Transform (FFT) where reducing the volume of communication is very challenging yet can reap large benefit in terms of time-to-completion. In this paper, we revisit the implementation of the MPI all-to-all routine at the core of 3D FFTs by using advanced MPI features, such as One-Sided Communication, and integrate data compression during communication to reduce the volume of data exchanged. Since some compression techniques are ‘lossy’ in the sense that they involve a loss of accuracy, we study the impact of lossy compression in heFFTe, the state-of-the-art FFT library for large scale 3D FFTs on hybrid architectures with GPUs. Consequently, we design an approximate FFT algorithm that trades off user-controlled accuracy for speed. We show that we speedup the 3D FFTs proportionally to the compression rate. In terms of accuracy, comparing our approach with a reduced precision execution, where both the data and the computation are in reduced precision, we show that when the volume of communication is compressed to the size of the reduced precision data, the approximate FFT algorithm is as fast as the one in reduced precision while the accuracy is one order of magnitude better.
more » « less
Full Text Available
Gpu-based homotopy continuation for minimal problems in computer vision.

Chien, C. ; Fan H. ; Abdelfattah, A. ; Tsigaridas, E. ; Tomov, S. ; Kimia, B. ( January 2022 , Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition)

Systems of polynomial equations arise frequently in computer vision, especially in multiview geometry problems. Traditional methods for solving these systems typically aim to eliminate variables to reach a univariate polynomial, e.g., a tenth-order polynomial for 5-point pose estimation, using clever manipulations, or more generally using Grobner basis, resultants, and elimination templates, leading to successful algorithms for multiview geometry and other problems. However, these methods do not work when the problem is complex and when they do, they face efficiency and stability issues. Homotopy Continuation (HC) can solve more complex problems without the stability issues, and with guarantees of a global solution, but they are known to be slow. In this paper we show that HC can be parallelized on a GPU, showing significant speedups up to 56 times on polynomial benchmarks. We also show that GPU-HC can be generically applied to a range of computer vision problems, including 4-view triangulation and trifocal pose estimation with unknown focal length, which cannot be solved with elimination template but they can be efficiently solved with HC. GPU-HC opens the door to easy formulation and solution of a range of computer vision problems.
more » « less
Full Text Available
Translational Process: Mathematical Software Perspective

Dongarra, J. ; Gates, M. ; Luszczek, P. ; Tomov, S. ( September 2020 , Journal of computational science)
null (Ed.)
Full Text Available
Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices using GPUs

Abdelfattah, A. ; Tomov, S. ; Dongarra, J. ( June 2020 , International Conference on Computational Science (ICCS 2020))
null (Ed.)
Full Text Available
GPU algorithms for efficient exascale discretizations

Abdelfattah, A. ; Barra, V. ; Beams, N. ; Bleile, R. ; Brown J. ; Camier, J. ; Carson R. ; Chalmers, N. ; Dobrev, V. ; Dudouit, Y. ; et al ( September 2021 , Parallel computing)

In this paper, we describe the research and development activities in the Center for Efficient Exascale Discretization within the US Exascale Computing Project, targeting state-of-the-art high-order finite-element algorithms for high-order applications on GPU-accelerated platforms. We discuss the GPU developments in several components of the CEED software stack, including the libCEED, MAGMA, MFEM, libParanumal, and Nek projects. We report performance and capability improvements in several CEED-enabled applications on both NVIDIA and AMD GPU systems.
more » « less
Full Text Available
MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing

https://doi.org/10.1007/978-3-030-34356-9_37

Nichols, D. ; Tomov, N. ; Betancourt, F. ; Tomov, S. ; Wong, K. ; Dongarra, J. ( December 2019 , Lecture notes in computer science)

In this paper, we present work towards the development of a new data analytics and machine learning (ML) framework, called MagmaDNN. Our main goal is to provide scalable, high-performance data analytics and ML solutions for scientific applications running on current and upcoming heterogeneous many-core GPU-accelerated architectures. To this end, since many of the functionalities needed are based on standard linear algebra (LA) routines, we designed MagmaDNN to derive its performance power from the MAGMA library. The close integration provides the fundamental (scalable high-performance) LA routines available in MAGMA as a backend to MagmaDNN. We present some design issues for performance and scalability that are specific to ML using Deep Neural Networks (DNN), as well as the MagmaDNN designs towards overcoming them. In particular, MagmaDNN uses well established HPC techniques from the area of dense LA, including task-based parallelization, DAG representations, scheduling, mixed-precision algorithms, asynchronous solvers, and autotuned hyperparameter optimization. We illustrate these techniques and their incorporation and use to outperform other frameworks, currently available.
more » « less
Full Text Available
Hands-on Research and Training in High-Performance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments

Wong, K. ; Tomov, S. ; Dongarra, J. ( June 2019 , ISC High Performance)

Full Text Available
Fast Batched Matrix Multiplication for Small Sizes using Half Precision Arithmetic on GPUs

Abdelfattah, A. ; Tomov, S. ; Dongarra, J. ( May 2019 , 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS))

Full Text Available
MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing

Nichols, D. ; Tomov, N.S. ; Betancourt, F. ; Tomov, S. ; Wong, K. ; Dongarra, J. ( June 2019 , ISC High Performance)

Full Text Available

« Prev Next »