While High Performance Computing systems are increasingly
based on heterogeneous cores, their eectiveness depends on how
well the scheduler can allocate workloads onto appropriate computing devices
and how communication and computation can be overlapped. With
dierent types of resources integrated into one system, the complexity of
the scheduler correspondingly increases. Moreover, for applications with
varying problem sizes on dierent heterogeneous resources, the optimal
scheduling approach may vary accordingly. We thus present PDAWL, an
event-driven prole-based Iterative Dynamic Adaptive Work-Load balance
scheduling approach to dynamically and adaptively adjust workload
to eciently utilize heterogeneous resources. It combines online scheduling
(DAWL), which can adaptively adjust workload based on available
real time heterogeneous resources, with an oine machine learning (prolebased
estimation model) which can build a device-specic communication
computation estimation model. Our scheduling approach is tested on
control-regular applications, Stencil kernel (based on a Jacobi Algorithm)
and Sparse Matrix-Vector Multiplication (SpMV) in an event-driven runtime
system. Experimental results show that PDAWL is either on-par or
far outperforms whichever yields the best results (CPU or GPU).
more »
« less
PDAWL: Profile-based Iterative Dynamic Adaptive WorkLoad Balance on Heterogeneous Architectures
While High Performance Computing systems are increas-ingly based on heterogeneous cores, their effectiveness depends on howwell the scheduler can allocate workloads onto appropriate computing de-vices and how communication and computation can be overlapped. Withdifferent types of resources integrated into one system, the complexity ofthe scheduler correspondingly increases. Moreover, for applications withvarying problem sizes on different heterogeneous resources, the optimalscheduling approach may vary accordingly. We thus present PDAWL, anevent-driven profile-based Iterative Dynamic Adaptive Work-Load bal-ance scheduling approach to dynamically and adaptively adjust workloadto efficiently utilize heterogeneous resources. It combines online schedul-ing (DAWL), which can adaptively adjust workload based on availablereal time heterogeneous resources, with an offline machine learning (profile-based estimation model) which can build a device-specific communica-tion computation estimation model. Our scheduling approach is tested oncontrol-regular applications, Stencil kernel (based on a Jacobi Algorithm)and Sparse Matrix-Vector Multiplication (SpMV) in an event-driven run-time system. Experimental results show that PDAWL is either on-par orfar outperforms whichever yields the best results (CPU or GPU).
more »
« less
- Award ID(s):
- 1763793
- NSF-PAR ID:
- 10154660
- Date Published:
- Journal Name:
- 23rd Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP 2020)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Abstract: Radio access network (RAN) in 5G is expected to satisfy the stringent delay requirements of a variety of applications. The packet scheduler plays an important role by allocating spectrum resources to user equipments (UEs) at each transmit time interval (TTI). In this paper, we show that optimal scheduling is a challenging combinatorial optimization problem, which is hard to solve within the channel coherence time with conventional optimization methods. Rule-based scheduling methods, on the other hand, are hard to adapt to the time-varying wireless channel conditions and various data request patterns of UEs. Recently, integrating artificial intelligence (AI) into wireless networks has drawn great interest from both academia and industry. In this paper, we incorporate deep reinforcement learning (DRL) into the design of cellular packet scheduling. A delay-aware cell traffic scheduling algorithm is developed to map the observed system state to scheduling decision. Due to the huge state space, a recurrent neural network (RNN) is utilized to approximate the optimal action-policy function. Different from conventional rule-based scheduling methods, the proposed scheme can learn from the interactions with the environment and adaptively choosing the best scheduling decision at each TTI. Simulation results show that the DRL-based packet scheduling can achieve the lowest average delay compared with several conventional approaches. Meanwhile, the UEs' average queue lengths can also be significantly reduced. The developed method also exhibits great potential in real-time scheduling in delay-sensitive scenarios.more » « less
-
Mobile applications have become increasingly sophisticated. Emerging cognitive assistance applications can involve multiple computationally intensive modules working continuously and concurrently, further straining the already limited resources on these mobile devices. While computation offloading to the edge or the cloud is still the de facto solution, existing approaches are limited by intra-application operations only or edge-/cloud-centric scheduling. Instead, we argue that operating system level coordination is needed on the mobile side to adequately support the prospects of multi-application offloading. Specifically, both the local mobile system resource and the network bandwidth to reach the cloud need to be allocated intelligently among concurrent offloading jobs. In this paper, we build a system-level scheduler service, LinkShare, that wraps over the operating system scheduler to coordinate among multiple offloading requests. We further study the scheduling requirements and suitable metrics, and find that the most intuitive approaches of minimizing the end-to- end processing time or earliest-deadline first scheduling do not work well. Instead, LinkShare adopts earliest-deadline first with limited sharing (EDF-LS), that balances real-time requirements and fairness. Extensive evaluation of an Android implementation of LinkShare shows that adding this additional scheduler is essential, and that EDF-LS reduces the deadline miss events by up to 30% compared to the baseline.more » « less
-
This paper proposes long-term reliability management for spatial multitasking GPU architectures. Specifically, we focus on electromigration (EM)-induced long-term failure of the GPU's power delivery network. A distributed power delivery network model at functional unit granularity is developed and used for our EM analysis of GPU architectures. We use a recently proposed physics-based EM reliability model and consider the EM-induced time-to-failure at the GPU system level as a reliability resource. For GPU scheduling, we mainly focus on spatial multitasking, which allows GPU computing resources to be partitioned among multiple applications. We find that the existing reliability-agnostic thread block scheduler for spatial multitasking is effective in achieving high GPU utilization, but poor reliability. We develop and implement a long-term reliability-aware thread block scheduler in GPGPU-Sim, and compare it against existing reliability-agnostic scheduler. We evaluate several use cases of spatial multitasking and find that our proposed scheduler achieves up to 30\% improvement in long-term reliability.more » « less
-
With the wide adoption of deep neural network (DNN) models for various applications, enterprises, and cloud providers have built deep learning clusters and increasingly deployed specialized accelerators, such as GPUs and TPUs, for DNN training jobs. To arbitrate cluster resources among multi-user jobs, existing schedulers fall short, either lacking fine-grained heterogeneity awareness or hardly generalizable to various scheduling policies. To fill this gap, we propose a novel design of a task-level heterogeneity-aware scheduler, Hadar, based on an online optimization framework that can express other scheduling algorithms. Hadar leverages the performance traits of DNN jobs on a heterogeneous cluster, characterizes the task-level performance heterogeneity in the optimization problem, and makes scheduling decisions across both spatial and temporal dimensions. The primal-dual framework is employed, with our design of a dual subroutine, to solve the optimization problem and guide the scheduling design. Extensive trace-driven simulations with representative DNN models have been conducted to demonstrate that Hadar improves the average job completion time (JCT) by 3× over an Apache YARN-based resource manager used in production. Moreover, Hadar outperforms Gavel[1], the state-of-the-art heterogeneity-aware scheduler, by 2.5× for the average JCT, and shortens the queuing delay by 13% and improve FTF (Finish-Time-Fairness) by 1.5%.more » « less