Grey-Box Machine Learning Prediction of Parallel Application Scaling

Alasandagutti, Akhil; Bridges, Patrick G; Estrada, Trilce

Citation Details

This content will become publicly available on December 17, 2026

Grey-Box Machine Learning Prediction of Parallel Application Scaling

Accurate prediction of parallel application performance in HPC systems is essential for efficient resource allocation and system design. Classical performance models estimate of speedup based on theoretical assumptions, but their applicability is limited by parameter estimation, data acquisition, and real-world system issues such as latency and network congestion. This paper describes performance prediction using classical performance models boosted by a trainable machine learning framework. Domain-informed machine-learning models estimate the overhead of an application for a given problem size and resource configuration as a coefficient of the estimated speedup provided by performance laws. We evaluate this approach on two HPC mini-applications and two full applications with varying patterns of computation and communication and also evaluate the prediction accuracy on runs with varying processors-per-node configurations. Our results show that this method significantly improves the accuracy of performance predictions over standard analytical models and black-box regressors, while remaining robust even with limited training data. more »

Award ID(s):: 2103510

PAR ID:: 10644609

Author(s) / Creator(s):: Alasandagutti, Akhil; Bridges, Patrick G; Estrada, Trilce

Publisher / Repository:: Proceedings of the 32st IEEE International Conference on High Performance Computing, Data, and Analytics

Date Published:: 2025-12-17

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on December 17, 2026
Conference Paper:
The DOI is not currently available.

More Like this