NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Scalability of Hybrid SpMV on Intel Xeon Phi Knights Landing

Page, Brian A.; Kogge, Peter M. (July 2019, International Conference on High Performance Computing & Simulation)

SpMV, the product of a sparse matrix and a dense vector, is emblematic of a new class of applications that are memory bandwidth and communication, not flop, driven. Sparsity and randomness in such computations play havoc with performance, especially when strong, instead of weak, scaling is attempted. In this study we develop and evaluate a hybrid implementation for strong scaling of the Compressed Vectorization-oriented sparse Row (CVR) approach to SpMV on a cluster of Intel Xeon Phi Knights Landing (KNL) processors. We show how our hybrid SpMV implementation achieves increased computational performance, yet does not address the dominant communication overhead factor at extreme scale. Issues with workload distribution, data placement, and remote reductions are assessed over a range of matrix characteristics. Our results indicate that as P 􀀀! 1 communication overhead is by far the dominant factor despite improved computational performance.
more » « less
Full Text Available
Introducing Streaming into Linear Algebra-based Sparse Graph Algorithms

Kogge, Peter M.; Butcher, Neil A.; Page, Brian A. (July 2019, International Conference on High Performance Computing & Simulation)

GraphBLAS is a new package designed to provide a standard set of building blocks for graph algorithms based formally in the language of linear algebra. This paper suggests some extensions of the underlying math that would enhance GraphBLAS’ ability to stream updates into a computation without a bulk recomputation, and at greatly reduced computational complexity. The process is applied to several examples.
more » « less
Full Text Available
Optimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM

https://doi.org/10.1145/3225058.3225116

Butcher, Neil; Olivier, Stephen L.; Berry, Jonathan; Hammond, Simon D.; Kogge, Peter M. (August 2018, International Conference on Parallel Processing)

Technologies such as Multi-Channel DRAM (MCDRAM) or High Bandwidth Memory (HBM) provide significantly more bandwidth than conventional memory. This trend has raised questions about how applications should manage data transfers between levels.This paper focuses on evaluating different usage modes of the MCDRAM in Intel Knights Landing (KNL) manycore processors. We evaluate these usage modes with a sorting kernel and a sortingbased streaming benchmark. We develop a performance model for the benchmark and use experimental evidence to demonstrate the correctness of the model. The model projects near-optimal numbers of copy threads for memory bandwidth bound computations. We demonstrate on KNL up to a 1.9X speedup for sort when the problem does not fit in MCDRAM over an OpenMP GNU sort that does not use MCDRAM.
more » « less
Full Text Available
Scalability of Hybrid Sparse Matrix Dense Vector (SpMV) Multiplication

https://doi.org/10.1109/HPCS.2018.00072

Page, Brian A.; Kogge, Peter M. (July 2018, International Conference on High Performance Computing & Simulation)

SpMV, the product of a sparse matrix and a dense vector, is emblematic of a new class of applications that are memory bandwidth and communication, not flop, driven. Sparsity and randomness in such computations play havoc with conventional implementations, especially when strong, instead of weak, scaling is attempted. This paper studies improved hybrid SpMV codes that have better performance, especially for the sparsest of such problems. Issues with both data placement and remote reductions are modeled over a range of matrix characteristics. Those factors that limit strong scalability are quantified.
more » « less
Full Text Available
A Case for Migrating Execution for Irregular Applications

https://doi.org/10.1145/3149704.3149770

Kogge, Peter M.; Kuntz, Shannon K. (November 2017, Seventh Workshop on Irregular Applications: Architectures and Algorithms)

Modern supercomputers have millions of cores, each capable of executing one or more threads of program execution. In these computers the site of execution for program threads rarely, if ever, changes from the node in which they were born. This paper discusses the advantages that may accrue when thread states migrate freely from node to node, especially when migration is managed by hardware without requiring software intervention. Emphasis is on supporting the growing classes of algorithms where there is significant sparsity, irregularity, and lack of locality in the memory reference patterns. Evidence is drawn from reformulation of several kernels into a migrating thread context approximating that of an emerging architecture with such capabilities.
more » « less
Full Text Available
Graph Analytics: Complexity, Scalability, and Architectures

Kogge, Peter M. (May 2017, HPBDC Workshop at Int. Parallel and Dist. Processing Conf.)

Big Data as expressed as “Big Graphs” are growing in importance. Looking forward, there is also increasing interest in streaming versions of the associated analytics. This paper develops an initial template for the relationship between “traditional” batch graph problems, and streaming forms. Variations of streaming problems are discussed, along with their relationship to existing benchmarks. Also included is a discussion of classes of parallel architectures (including newly emerging ones) and how such kernels are liable to scale on them. Preliminary projections for some of these systems is presented.
more » « less
Full Text Available

Search for: All records