Scaling Laws for the Workload Throughput of Emerging Heterogeneous Clusters

Alasandagutti, Akhil; Suetterlein, Joshua; Firoz, Jesun; Young, Stephen; Manzano, Joseph; Stewart, Jason R; Bridges, Patrick G; Estrada, Trilce; Barker, Kevin

doi:10.1109/CCGRID64434.2025.00025

Citation Details

This content will become publicly available on May 19, 2026

Scaling Laws for the Workload Throughput of Emerging Heterogeneous Clusters

Not AvailableNext-generation HPC clusters are evolving into highly heterogeneous systems that integrate traditional computing resources with emerging accelerator technologies such as quantum processors, neuromorphic units, dataflow architectures, and specialized AI accelerators within a unified infrastructure. These advanced systems enable workloads to dynamically utilize different accelerators during various computation phases, creating complex execution patterns. The performance of the workloads can therefore be impacted by many factors, including how the accelerators are shared, their utilization, and their placement within the system. Moreover, effects such as the system and network state due to the overall system load can significantly impact the job completion rate. Understanding, identifying, and quantifying the impact of the most critical factors (e.g., the number of allocated accelerators) will help decide the investment decisions for accelerator acquisition and deployment that can improve the overall system throughput. This paper extensively studies these complex interactions among advanced accelerators within an HPC cluster and various workloads. We introduce a novel analytical model which predicts the speedup of a workload given an accelerator/system configuration. This model can be used to quantify the effect of augmenting additional accelerators on job performance running on an HPC cluster. We validate the model using both simulated and real environments. more »

Award ID(s):: 2103510 1807563

PAR ID:: 10650836

Author(s) / Creator(s):: Alasandagutti, Akhil ; Suetterlein, Joshua ; Firoz, Jesun ; Young, Stephen ; Manzano, Joseph ; Stewart, Jason R ; Bridges, Patrick G ; Estrada, Trilce ; Barker, Kevin

Publisher / Repository:: IEEE

Date Published:: 2025-05-19

Page Range / eLocation ID:: 73 to 82

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on May 19, 2026
Conference Paper:
https://doi.org/10.1109/CCGRID64434.2025.00025

More Like this