Performance and power modeling and prediction using MuMMI and 10 machine learning methods

Wu, Xingfu  (ORCID:0000000181505171); Taylor, Valerie; Lan, Zhiling

doi:10.1002/cpe.7254

Citation Details

Performance and power modeling and prediction using MuMMI and 10 machine learning methods

Summary Energy‐efficient scientific applications require insight into how high performance computing system features impact the applications' power and performance. This insight can result from the development of performance and power models. In this article, we use the modeling and prediction tool MuMMI (Multiple Metrics Modeling Infrastructure) and 10 machine learning methods to model and predict performance and power consumption and compare their prediction error rates. We use an algorithm‐based fault‐tolerant linear algebra code and a multilevel checkpointing fault‐tolerant heat distribution code to conduct our modeling and prediction study on the Cray XC40 Theta and IBM BG/Q Mira at Argonne National Laboratory and the Intel Haswell cluster Shepard at Sandia National Laboratories. Our experimental results show that the prediction error rates in performance and power using MuMMI are less than 10% for most cases. By utilizing the models for runtime, node power, CPU power, and memory power, we identify the most significant performance counters for potential application optimizations, and we predict theoretical outcomes of the optimizations. Based on two collected datasets, we analyze and compare the prediction accuracy in performance and power consumption using MuMMI and 10 machine learning methods. more »

Award ID(s):: 2119056 2119203

PAR ID:: 10443320

Author(s) / Creator(s):: Wu, Xingfu ; Taylor, Valerie ; Lan, Zhiling

Publisher / Repository:: Wiley Blackwell (John Wiley & Sons)

Date Published:: 2022-08-05

Journal Name:: Concurrency and Computation: Practice and Experience

Volume:: 35

Issue:: 15

ISSN:: 1532-0626

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1002/cpe.7254

More Like this