The Effect of System Utilization on Application Performance Variability

Li, Boyang; Chunduri, Sudheer; Harms, Kevin; Fan, Yuping; Lan, Zhiling

Citation Details

Application performance variability caused by network contention is a major issue on dragonfly based systems. This work-in-progress study makes two contributions. First, we analyze real workload logs and conduct application experiments on the production system Theta at Argonne to evaluate application performance variability. We find a strong correlation between system utilization and performance variability where a high system utilization (e.g., above 95%) can cause up to 21% degradation in application performance. Next, driven by this key finding, we investigate a scheduling policy to mitigate workload interference by leveraging the fact that production systems often exhibit diurnal utilization behavior and not all users are in a hurry for job completion. Preliminary results show that this scheduling design is capable of improving system productivity (measured by scheduling makespan) as well as improving user-level scheduling metrics such as user wait time and job slowdown. more »

Award ID(s):: 1717763 1422009

NSF-PAR ID:: 10097525

Author(s) / Creator(s):: Li, Boyang; Chunduri, Sudheer; Harms, Kevin; Fan, Yuping; Lan, Zhiling

Date Published:: 2019-06-01

Journal Name:: ACM Digital Library

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this