Union: An Automatic Workload Manager for Accelerating Network Simulation

Wang, Xin; Mubarak, Misbah; Kang, Yao; Ross, Robert B.; Lan, Zhiling

doi:10.1109/IPDPS47924.2020.00089

Citation Details

Union: An Automatic Workload Manager for Accelerating Network Simulation

With the rapid growth of the machine learning applications, the workloads of future HPC systems are anticipated to be a mix of scientific simulation, big data analytics, and machine learning applications. Simulation is a great research vehicle to understand the performance implications of co-running scientific applications with big data and machine learning workloads on large-scale systems. In this paper, we present Union, a workload manager that provides an automatic framework to facilitate hybrid workload simulation in CODES. Furthermore, we use Union, along with CODES, to investigate various hybrid workloads composed of traditional simulation applications and emerging learning applications on two dragonfly systems. The experiment results show that both message latency and communication time are important performance metrics to evaluate network interference. Network interference on HPC applications is more reflected by the message latency variation, whereas ML application performance depends more on the communication time. more »

Award ID(s):: 1717763 1618776

PAR ID:: 10183521

Author(s) / Creator(s):: Wang, Xin; Mubarak, Misbah; Kang, Yao; Ross, Robert B.; Lan, Zhiling

Date Published:: 2020-05-01

Journal Name:: 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Page Range / eLocation ID:: 821 to 830

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/IPDPS47924.2020.00089

More Like this