ECLIP: Energy-efficient and Practical Co-Location of ML Inference on Spatially Partitioned GPUs

Quach, Ryan; Wang, Yidi; Jahanshahi, Ali; Wong, Daniel; Kim, Hyoseung

Citation Details

This content will become publicly available on August 6, 2026

ECLIP: Energy-efficient and Practical Co-Location of ML Inference on Spatially Partitioned GPUs

As AI inference becomes mainstream, research has begun to focus on improving the energy consumption of inference servers. Inference kernels commonly underutilize a GPU’s compute resources and waste power from idling components. To improve utilization and energy efficiency, multiple models can co-locate and share the GPU. However, typical GPU spatial partitioning techniques often experience significant overheads when reconfiguring spatial partitions, which can waste additional energy through repartitioning overheads or non-optimal partition configurations. In this paper, we present ECLIP, a framework to enable low-overhead energy-efficient kernel-wise resource partitioning between co-located inference kernels. ECLIP minimizes repartitioning overheads by pre-allocating pools of CU masked streams and assigns optimal CU assignments to groups of kernels through our resource allocation optimizer. Overall, ECLIP achieves an average of 13% improvement to throughput and 25% improvement to energy efficiency. more »

Award ID(s):: 2312395 1943265 1955650

PAR ID:: 10650344

Author(s) / Creator(s):: Quach, Ryan; Wang, Yidi; Jahanshahi, Ali; Wong, Daniel; Kim, Hyoseung

Publisher / Repository:: IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)

Date Published:: 2025-08-06

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on August 6, 2026
Conference Paper:
The DOI is not currently available.

More Like this