skip to main content

Title: Improving Performance and Scalability of Algebraic Multigrid through a Specialized MATVEC
Algebraic Multigrid (AMG) is an extremely popular linear system solver and/or preconditioner approach for matrices obtained from the discretization of elliptic operators. However, its performance and scalability for large systems obtained from unstructured discretizations seem less consistent than for geometric multigrid (GMG). To a large extent, this is due to loss of sparsity at the coarser grids and the resulting increased cost and poor scalability of the matrix-vector multiplication. While there have been attempts to address this concern by designing sparsification algorithms, these affect the overall convergence. In this work, we focus on designing a specialized matrix-vector multiplication (matvec) that achieves high performance and scalability for a large variation in the levels of sparsity. We evaluate distributed and shared memory implementations of our matvec operator and demonstrate the improvements to its scalability and performance in AMG hierarchy and finally, we compare it with PETSc.  more » « less
Award ID(s):
1464244 1643056
Author(s) / Creator(s):
Date Published:
Journal Name:
IEEE High Performance Extreme Computing Conference
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Sparse matrices are very common types of information used in scientific and machine learning applications including deep neural networks. Sparse data representations lead to storage efficiencies by avoiding storing zero values. However, sparse representations incur metadata computational overheads – soft- ware first needs to find row/column locations of non-zero val- ues before performing necessary computations. Such metadata accesses involve indirect memory accesses (of the form a[b[i]] where a[.] and b[.] are large arrays) and they are cache and prefetch-unfriendly, resulting in frequent load stalls. In this paper, we will explore a dedicated hardware for a memory-side accelerator called Hardware Helper Thread (HHT) that performs all the necessary index computations to fetch only the nonzero elements from sparse matrix and sparse vector and supply those values to the primary core, creating heterogeneity within a single CPU core. We show both performance gains and energy savings of HHT for sparse matrix-dense vector multiplication (SpMV) and sparse matrix- sparse vector multiplication (SpMSpV). The ASIC HHT shows average performance gains ranging between 1.7 and 3.5 de- pending on the sparsity levels, vector-widths used by RISCV vector instructions and if the Vector (in Matrix-Vector multi- plication) is sparse or dense. We also show energy savings of 19% on average when ASIC HHT is used compared to baseline (for SpMV), and the HHT requires 38.9% of a RISCV core area 
    more » « less
  2. Abstract

    In a recent article, one of the authors developed a multigrid technique for coarse‐graining dynamic powergrid models. A key component in this technique is a relaxation‐based coarsening of the graph Laplacian given by the powergrid network and its weighted graph, which is represented by the admittance matrix. In this article, we use this coarsening strategy to develop a multigrid method for solving a static system of nonlinear equations that arises through Ohm's law, the so‐called powerflow equations. These static equations are tightly knitted to the dynamic model in that the full powergrid model is an algebraic‐differential system with the powerflow equations describing the algebraic constraints. We assume that the dynamic model corresponds to a stable operating powergrid, and thus, the powerflow equations are associated with a physically stable system. This stability permits the coarsening of the powerflow equations to be based on an approximate graph Laplacian, which is embedded in the powerflow system. By algebraically constructing a hierarchy of approximate weighted graph Laplacians, a hierarchy of nonlinear powerflow equations immediately becomes apparent. This latter hierarchy can then be used in a full approximation scheme (FAS) framework that leads to a nonlinear solver with generally a larger basin of attraction than Newton's method. Given the algebraic multigrid (AMG) coarsening of the approximate Laplacians, the solver is an AMG‐FAS scheme. Alternatively, using the coarse‐grid nodes and interpolation operators generated for the hierarchy of approximate graph Laplacians, a multiplicative‐correction scheme can be derived. The derivation of both schemes will be presented and analyzed, and numerical examples to demonstrate the performance of these schemes will be given.

    more » « less
  3. Summary

    This article develops an algebraic multigrid (AMG) method for solving systems of elliptic boundary‐value problems. It is well known that multigrid for systems of elliptic equations faces many challenges that do not arise for most scalar equations. These challenges include strong intervariable couplings, multidimensional and possibly large near‐nullspaces, analytically unknown near‐nullspaces, delicate selection of coarse degrees of freedom (CDOFs), and complex construction of intergrid operators. In this article, we consider only the selection of CDOFs and the construction of the interpolation operator. The selection is an extension of the Ruge–Stuben algorithm using a new strength of connection measure taken between nodal degrees of freedom, that is, between all degrees of freedom located at a gridpoint to all degrees of freedom at another gridpoint. This measure is based on a local correlation matrix generated for a set of smoothed test vectors derived from a relaxation‐based procedure. With this measure, selection of the CDOFs is then determined by the number of strongly correlated connections at each node, with the selection processed by a Ruge–Stuben coloring scheme. Having selected the CDOFs, the interpolation operator is constructed using a bootstrap AMG (BAMG) procedure. We apply the BAMG procedure either over the smoothed test vectors to obtain an intervariable interpolation scheme or over the like‐variable components of the smoothed test vectors to obtain an intravariable interpolation scheme. Moreover, comparing the correlation measured between the intravariable couplings with the correlation between all couplings, a mixed intravariable and intervariable interpolation scheme is developed. We further examine an indirect BAMG method that explicitly uses the coefficients of the system operator in constructing the interpolation weights. Finally, based on a weak approximation criterion, we consider a simple scheme to adapt the order of the interpolation (i.e., adapt the caliber or maximum number of coarse‐grid points that a fine‐grid point can interpolate from) over the computational domain.

    more » « less
  4. Abstract

    Problems arising in Earth's mantle convection involve finding the solution to Stokes systems with large viscosity contrasts. These systems contain localized features which, even with adaptive mesh refinement, result in linear systems that can be on the order of 109or more unknowns. One common approach for preconditioning to the velocity block of these systems is to apply an Algebraic Multigrid (AMG) V‐cycle (as is done in the ASPECT software, for example), however, we find that AMG is lacking robustness with respect to problem size and number of parallel processes. Additionally, we see an increase in iteration counts with refinement when using AMG. In contrast, the Geometric Multigrid (GMG) method, by using information about the geometry of the problem, should offer a more robust option.Here we present a matrix‐free GMG V‐cycle which works on adaptively refined, distributed meshes, and we will compare it against the current AMG preconditioner (Trilinos ML) used in theASPECT1software. We will demonstrate the robustness of GMG with respect to problem size and show scaling up to 114,688 cores and 217 billion unknowns. All computations are run using the open‐source, finite element librarydeal.II.2

    more » « less
  5. This work studies three multigrid variants for matrix-free finite-element computations on locally refined meshes: geometric local smoothing, geometric global coarsening (both h -multigrid), and polynomial global coarsening (a variant of p -multigrid). We have integrated the algorithms into the same framework—the open source finite-element library deal.II —, which allows us to make fair comparisons regarding their implementation complexity, computational efficiency, and parallel scalability as well as to compare the measurements with theoretically derived performance metrics. Serial simulations and parallel weak and strong scaling on up to 147,456 CPU cores on 3,072 compute nodes are presented. The results obtained indicate that global-coarsening algorithms show a better parallel behavior for comparable smoothers due to the better load balance, particularly on the expensive fine levels. In the serial case, the costs of applying hanging-node constraints might be significant, leading to advantages of local smoothing, even though the number of solver iterations needed is slightly higher. When using p - and h -multigrid in sequence ( hp -multigrid), the results indicate that it makes sense to decrease the degree of the elements first from a performance point of view due to the cheaper transfer. 
    more » « less