Scheduling Beyond CPUs for HPC

Fan, Yuping; Lan, Zhiling; Rich, Paul; Allcock, William; Papka, Michael; Austin, Brian; Paul, David

doi:10.1145/3307681.3325401

Citation Details

Scheduling Beyond CPUs for HPC

High performance computing (HPC) is undergoing significant changes. The emerging HPC applications comprise both compute- and data-intensive applications. To meet the intense I/O demand from emerging data-intensive applications, burst buffers are deployed in production systems. Existing HPC schedulers are mainly CPU-centric. The extreme heterogeneity of hardware devices, combined with workload changes, forces the schedulers to consider multiple resources (e.g., burst buffers) beyond CPUs, in decision making. In this study, we present a multi-resource scheduling scheme named BBSched that schedules user jobs based on not only their CPU requirements, but also other schedulable resources such as burst buffer. BBSched formulates the scheduling problem into a multi-objective optimization (MOO) problem and rapidly solves the problem using a multi-objective genetic algorithm. The multiple solutions generated by BBSched enables system managers to explore potential tradeoffs among various resources, and therefore obtains better utilization of all the resources. The trace-driven simulations with real system workloads demonstrate that BBSched improves scheduling performance by up to 41% compared to existing methods, indicating that explicitly optimizing multiple resources beyond CPUs is essential for HPC scheduling. more »

Award ID(s):: 1717763

PAR ID:: 10183669

Author(s) / Creator(s):: Fan, Yuping; Lan, Zhiling; Rich, Paul; Allcock, William; Papka, Michael; Austin, Brian; Paul, David

Date Published:: 2019-07-01

Journal Name:: HPDC '19: Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing

Page Range / eLocation ID:: 97-108

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3307681.3325401

More Like this