Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Sage (Ed.)The convergence of extremely high levels of hardware concurrency and the effective overlap of computation and communication in asynchronous executions has resulted in increasing nondeterminism in High-Performance Computing (HPC) applications. Nondeterminism can manifest at multiple levels: from low-level communication primitives to libraries to application-level functions. No matter its source, nondeterminism can drastically increase the cost of result reproducibility, debugging workflows, testing parallel programs, or ensuring fault-tolerance. Nondeterministic executions of HPC applications can be modeled as event graphs, and the applications’ nondeterministic behavior can be understood and, in some cases, mitigated using graph comparison algorithms. However, a connection between graph comparison algorithms and approaches to understanding nondeterminism in HPC still needs to be established. This survey article moves the first steps toward establishing a connection between graph comparison algorithms and nondeterminism in HPC with its three contributions: it provides a survey of different graph comparison algorithms and a timeline for each category’s significant works; it discusses how existing graph comparison methods do not fully support properties needed to understand nondeterministic patterns in HPC applications; and it presents the open challenges that should be addressed to leverage the power of graph comparisons for the study of nondeterminism in HPC applications.more » « less
-
IEEE Computer Society (Ed.)
An official website of the United States government

Full Text Available