skip to main content


Title: Accelerating restarted GMRES with mixed precision arithmetic
The generalized minimum residual method (GMRES) is a commonly used iterative Krylov solver for sparse, non-symmetric systems of linear equations. Like other iterative solvers, data movement dominates its run time. To improve this performance, we propose running GMRES in reduced precision with key operations remaining in full precision. Additionally, we provide theoretical results linking the convergence of finite precision GMRES with classical Gram-Schmidt with reorthogonalization (CGSR) and its infinite precision counterpart which helps justify the convergence of this method to double-precision accuracy. We tested the mixed-precision approach with a variety of matrices and preconditioners on a GPU-accelerated node. Excluding the incomplete LU factorization without fill in (ILU(0)) preconditioner, we achieved average speedups ranging from 8 to 61 percent relative to comparable double-precision implementations, with the simpler preconditioners achieving the higher speedups.  more » « less
Award ID(s):
2004541
NSF-PAR ID:
10329101
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE transactions on parallel and distributed systems
Volume:
33
Issue:
4
ISSN:
2161-9883
Page Range / eLocation ID:
1027-1037
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Non-stationary regularizing preconditioners have recently been proposed for the acceleration of classical iterative methods for the solution of linear discrete ill-posed problems. This paper explores how these preconditioners can be combined with the flexible GMRES iterative method. A new structure-respecting strategy to construct a sequence of regularizing preconditioners is proposed. We show that flexible GMRES applied with these preconditioners is able to restore images that have been contaminated by strongly non-symmetric blur, while several other iterative methods fail to do this. 
    more » « less
  2. In this paper, we aim at solving the Biot model under stabilized finite element discretizations. To solve the resulting generalized saddle point linear systems, some iterative methods are proposed and compared. In the first method, we apply the GMRES algorithm as the outer iteration. In the second method, the Uzawa method with variable relaxation parameters is employed as the outer iteration method. In the third approach, Uzawa method is treated as a fixed-point iteration, the outer solver is the so-called Anderson acceleration. In all these methods, the inner solvers are preconditioners for the generalized saddle point problem. In the preconditioners, the Schur complement approximation is derived by using Fourier analysis approach. These preconditioners are implemented exactly or inexactly. Extensive experiments are given to justify the performance of the proposed preconditioners and to compare all the algorithms. 
    more » « less
  3. We present a computational study of several preconditioning techniques for the GMRES algorithm applied to the stochastic diffusion equation with a lognormal coefficient discretized with the stochastic Galerkin method. The clear block structure of the system matrix arising from this type of discretization motivates the analysis of preconditioners designed according to a field-splitting strategy of the stochastic variables. This approach is inspired by a similar procedure used within the framework of physics based preconditioners for deterministic problems, and its application to stochastic PDEs represents the main novelty of this work. Our numerical investigation highlights the superior properties of the field-split type preconditioners over other existing strategies in terms of computational time and stochastic parameter dependence. 
    more » « less
  4. The use of batched matrix computations recently gained a lot of interest for applications, where the same operation is applied to many small independent matrices. The batched computational pattern is frequently encountered in applications of data analytics, direct/iterative solvers and preconditioners, computer vision, astrophysics, and more, and often requires specific designs for vectorization and extreme parallelism to map well on today's high-end many-core architectures. This has led to the development of optimized software for batch computations, and to an ongoing community effort to develop standard interfaces for batched linear algebra software. Furthering these developments, we present GPU design and optimization techniques for high-performance batched one-sided factorizations of millions of tiny matrices (of size 32 and less). We quantify the effects and relevance of different techniques in order to select the best-performing LU, QR, and Cholesky factorization designs. While we adapt common optimization techniques, such as optimal memory traffic, register blocking, and concurrency control, we also show that a different mindset and techniques are needed when matrices are tiny, and in particular, sub-vector/warp in size. The proposed routines are part of the MAGMA library and deliver significant speedups compared to their counterparts in currently available vendor-optimized libraries. Notably, we tune the developments for the newest V100 GPU from NVIDIA to show speedups of up to 11.8×. 
    more » « less
  5. In this work we present a framework of designing iterative techniques for image deblurring in inverse problem. The new framework is based on two observations about existing methods. We used Landweber method as the basis to develop and present the new framework but note that the framework is applicable to other iterative techniques. First, we observed that the iterative steps of Landweber method consist of a constant term, which is a low-pass filtered version of the already blurry observation. We proposed a modification to use the observed image directly. Second, we observed that Landweber method uses an estimate of the true image as the starting point. This estimate, however, does not get updated over iterations. We proposed a modification that updates this estimate as the iterative process progresses. We integrated the two modifications into one framework of iteratively deblurring images. Finally, we tested the new method and compared its performance with several existing techniques, including Landweber method, Van Cittert method, GMRES (generalized minimal residual method), and LSQR (least square), to demonstrate its superior performance in image deblurring. 
    more » « less