NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Evaluating Energy Efficiency of Ai Accelerators Using Two Mlperf Benchmarks

https://doi.org/10.1109/CCGRID64434.2025.00035

Ferdaus, Farah; Wu, Xingfu; Taylor, Valerie; Lan, Zhiling; Shanmugavelu, Sanjif; Vishwanath, Venkatram; Papka, Michael E (May 2025, IEEE)

Free, publicly-accessible full text available May 19, 2026
Transfer-learning-based Autotuning using Gaussian Copula

https://doi.org/10.1145/3577193.3593712

Randall, Thomas; Koo, Jaehoon; Videau, Brice; Kruse, Michael; Wu, Xingfu; Hovland, Paul; Hall, Mary; Ge, Rong; Balaprakash, Prasanna (June 2023, ACM)

As diverse high-performance computing (HPC) systems are built, many opportunities arise for applications to solve larger problems than ever before. Given the significantly increased complexity of these HPC systems and application tuning, empirical performance tuning, such as autotuning, has emerged as a promising approach in recent years. Despite its effectiveness, autotuning is often a computationally expensive approach. Transfer learning (TL)-based autotuning seeks to address this issue by leveraging the data from prior tuning. Current TL methods for autotuning spend significant time modeling the relationship between parameter configurations and performance, which is ineffective for few-shot (that is, few empirical evaluations) tuning on new tasks. We introduce the first generative TL-based autotuning approach based on the Gaussian copula (GC) to model the high-performing regions of the search space from prior data and then generate high-performing configurations for new tasks. This allows a sampling-based approach that maximizes few-shot performance and provides the first probabilistic estimation of the few-shot budget for effective TL-based autotuning. We compare our generative TL approach with state-of-the-art autotuning techniques on several benchmarks. We find that the GC is capable of achieving 64.37% of peak few-shot performance in its first evaluation. Furthermore, the GC model can determine a few-shot transfer budget that yields up to 33.39X speedup, a dramatic improvement over the 20.58X speedup using prior techniques.
more » « less
Full Text Available
Performance and power modeling and prediction using MuMMI and 10 machine learning methods

https://doi.org/10.1002/cpe.7254

Wu, Xingfu; Taylor, Valerie; Lan, Zhiling (August 2022, Concurrency and Computation: Practice and Experience)

Summary Energy‐efficient scientific applications require insight into how high performance computing system features impact the applications' power and performance. This insight can result from the development of performance and power models. In this article, we use the modeling and prediction tool MuMMI (Multiple Metrics Modeling Infrastructure) and 10 machine learning methods to model and predict performance and power consumption and compare their prediction error rates. We use an algorithm‐based fault‐tolerant linear algebra code and a multilevel checkpointing fault‐tolerant heat distribution code to conduct our modeling and prediction study on the Cray XC40 Theta and IBM BG/Q Mira at Argonne National Laboratory and the Intel Haswell cluster Shepard at Sandia National Laboratories. Our experimental results show that the prediction error rates in performance and power using MuMMI are less than 10% for most cases. By utilizing the models for runtime, node power, CPU power, and memory power, we identify the most significant performance counters for potential application optimizations, and we predict theoretical outcomes of the optimizations. Based on two collected datasets, we analyze and compare the prediction accuracy in performance and power consumption using MuMMI and 10 machine learning methods.
more » « less

Search for: All records