ESCHER: expressive scheduling with ephemeral resources

Bhardwaj, Romil; Tumanov, Alexey; Wang, Stephanie; Liaw, Richard; Moritz, Philipp; Nishihara, Robert; Stoica, Ion

doi:10.1145/3542929.3563498

Citation Details

ESCHER: expressive scheduling with ephemeral resources

As distributed applications become increasingly complex, so do their scheduling requirements. This development calls for cluster schedulers that are not only general, but also evolvable. Unfortunately, most existing cluster schedulers are not evolvable: when confronted with new requirements, they need major rewrites to support these requirements. Examples include gang-scheduling support in Kubernetes [6, 39] or task-affinity in Spark [39]. Some cluster schedulers [14, 30] expose physical resources to applications to address this. While these approaches are evolvable, they push the burden of implementing scheduling mechanisms in addition to the policies entirely to the application. ESCHER is a cluster scheduler design that achieves both evolvability and application-level simplicity. ESCHER uses an abstraction exposed by several recent frameworks (which we call ephemeral resources) that lets the application express scheduling constraints as resource requirements. These requirements are then satisfied by a simple mechanism matching resource demands to available resources. We implement ESCHER on Kubernetes and Ray, and show that this abstraction can be used to express common policies offered by monolithic schedulers while allowing applications to easily create new custom policies hitherto unsupported. more »

Award ID(s):: 1730628

PAR ID:: 10399382

Author(s) / Creator(s):: Bhardwaj, Romil; Tumanov, Alexey; Wang, Stephanie; Liaw, Richard; Moritz, Philipp; Nishihara, Robert; Stoica, Ion

Date Published:: 2022-11-07

Journal Name:: SoCC '22: Proceedings of the 13th Symposium on Cloud Computing

Page Range / eLocation ID:: 47 to 62

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3542929.3563498

More Like this