SIMPPO: a scalable and incremental online learning framework for serverless resource management

Qiu, Haoran; Mao, Weichao; Patke, Archit; Wang, Chen; Franke, Hubertus; Kalbarczyk, Zbigniew T.; Başar, Tamer; Iyer, Ravishankar K.

doi:10.1145/3542929.3563475

Citation Details

SIMPPO: a scalable and incremental online learning framework for serverless resource management

Serverless Function-as-a-Service (FaaS) offers improved programmability for customers, yet it is not server-“less” and comes at the cost of more complex infrastructure management (e.g., resource provisioning and scheduling) for cloud providers. To maintain function service-level objectives (SLOs) and improve resource utilization efficiency, recent research has been focused on applying online learning algorithms such as reinforcement learning (RL) to manage resources. Compared to rule-based solutions with heuristics, RL-based approaches eliminate humans in the loop and avoid the painstaking generation of heuristics. Despite the initial success of applying RL, we first show in this paper that the state-of-the-art single-agent RL algorithm (S-RL) suffers up to 4.8x higher p99 function latency degradation on multi-tenant serverless FaaS platforms compared to isolated environments and is unable to converge during training. We then design and implement a scalable and incremental multi-agent RL framework based on Proximal Policy Optimization (SIMPPO). Our experiments on widely used serverless benchmarks demonstrate that in multi-tenant environments, SIMPPO enables each RL agent to efficiently converge during training and provides online function latency performance comparable to that of S-RL trained in isolation (which we refer to as the baseline for assessing RL performance) with minor degradation (<9.2%). In addition, SIMPPO reduces the p99 function latency by 4.5x compared to S-RL in multi-tenant cases. more »

Award ID(s):: 2029049

PAR ID:: 10465146

Author(s) / Creator(s):: Qiu, Haoran; Mao, Weichao; Patke, Archit; Wang, Chen; Franke, Hubertus; Kalbarczyk, Zbigniew T.; Başar, Tamer; Iyer, Ravishankar K.

Date Published:: 2022-11-07

Journal Name:: Proceedings of the 13th ACM Symposium on Cloud Computing (SoCC 2022)

Page Range / eLocation ID:: 306 to 322

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3542929.3563475

More Like this