Characterizing Power Management Opportunities for LLMs in the Cloud

Patel, Pratyush; Choukse, Esha; Zhang, Chaojie; Goiri, Íñigo; Warrier, Brijesh; Mahalingam, Nithish; Bianchini, Ricardo

doi:10.1145/3620666.3651329

Citation Details

Characterizing Power Management Opportunities for LLMs in the Cloud

Recent innovation in large language models (LLMs), and their myriad use cases have rapidly driven up the compute demand for datacenter GPUs. Several cloud providers and other enterprises plan to substantially grow their datacenter capacity to support these new workloads. A key bottleneck resource in datacenters is power, which LLMs are quickly saturating due to their rapidly increasing model sizes.We extensively characterize the power consumption patterns of a variety of LLMs and their configurations. We identify the differences between the training and inference power consumption patterns. Based on our analysis, we claim that the average and peak power utilization in LLM inference clusters should not be very high. Our deductions align with data from production LLM clusters, revealing that inference workloads offer substantial headroom for power oversubscription. However, the stringent set of telemetry and controls that GPUs offer in a virtualized environment make it challenging to build a reliable and robust power management framework.We leverage the insights from our characterization to identify opportunities for better power management. As a detailed use case, we propose a new framework called POLCA, which enables power oversubscription in LLM inference clouds. POLCA is robust, reliable, and readily deployable. Using open-source models to replicate the power patterns observed in production, we simulate POLCA and demonstrate that we can deploy 30% more servers in existing clusters with minimal performance loss. more »

Award ID(s):: 2104548

PAR ID:: 10505206

Author(s) / Creator(s):: Patel, Pratyush; Choukse, Esha; Zhang, Chaojie; Goiri, Íñigo; Warrier, Brijesh; Mahalingam, Nithish; Bianchini, Ricardo

Publisher / Repository:: ACM

Date Published:: 2024-04-27

Journal Name:: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3

ISBN:: 9798400703867

Page Range / eLocation ID:: 207 to 222

Format(s):: Medium: X

Location:: La Jolla CA USA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3620666.3651329

More Like this