Reinforcement learning for resource management in multi-tenant serverless platforms

Qiu, Haoran; Mao, Weichao; Patke, Archit; Wang, Chen; Franke, Hubertus; Kalbarczyk, Zbigniew T.; Başar, Tamer; Iyer, Ravishankar K.

doi:10.1145/3517207.3526971

Citation Details

Reinforcement learning for resource management in multi-tenant serverless platforms

Serverless Function-As-A-Service (FaaS) is an emerging cloud computing paradigm that frees application developers from infrastructure management tasks such as resource provisioning and scaling. To reduce the tail latency of functions and improve resource utilization, recent research has been focused on applying online learning algorithms such as reinforcement learning (RL) to manage resources. Compared to existing heuristics-based resource management approaches, RL-based approaches eliminate humans in the loop and avoid the painstaking generation of heuristics. In this paper, we show that the state-of-The-Art single-Agent RL algorithm (S-RL) suffers up to 4.6x higher function tail latency degradation on multi-Tenant serverless FaaS platforms and is unable to converge during training. We then propose and implement a customized multi-Agent RL algorithm based on Proximal Policy Optimization, i.e., multi-Agent PPO (MA-PPO). We show that in multi-Tenant environments, MA-PPO enables each agent to be trained until convergence and provides online performance comparable to S-RL in single-Tenant cases with less than 10% degradation. Besides, MA-PPO provides a 4.4x improvement in S-RL performance (in terms of function tail latency) in multi-Tenant cases. more »

Award ID(s):: 2029049

PAR ID:: 10358759

Author(s) / Creator(s):: Qiu, Haoran; Mao, Weichao; Patke, Archit; Wang, Chen; Franke, Hubertus; Kalbarczyk, Zbigniew T.; Başar, Tamer; Iyer, Ravishankar K.

Date Published:: 2022-04-05

Journal Name:: EuroMLSys 2022 - Proceedings of the 2nd European Workshop on Machine Learning and Systems

Page Range / eLocation ID:: 20 to 28

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3517207.3526971

More Like this