CPElide: Efficient Multi-Chiplet GPU Implicit Synchronization

Dalmia, Preyesh; Shashi_Kumar, Rajesh; Sinclair, Matthew D

Citation Details

Chiplets are transforming computer system designs, allowing system designers to combine heterogeneous computing resources at unprecedented scales. Breaking larger, monolithic chips into smaller, connected chiplets helps performance continue scaling, avoids die size limitations, improves yield, and reduces design and integration costs. However, chiplet-based designs introduce an additional level of hierarchy, which causes indirection and non-uniformity. This clashes with typical heterogeneous systems: unlike CPU-based multi-chiplet systems, heterogeneous systems do not have significant OS support or complex coherence protocols to mitigate the impact of this indirection. Thus, exploiting locality across application phases is harder in multi-chiplet heterogeneous systems. We propose CPElide, which utilizes information already available in heterogeneous systems’ embedded microprocessor (the command processor) to track inter-chiplet data dependencies and aggressively perform implicit synchronization only when necessary, instead of conservatively like the state-of-the-art HMG. Across 24 workloads CPElide improves average performance (13%, 19%), energy (14%, 11%), and network traffic (14%, 17%), respectively, over current approaches and HMG. more »

Award ID(s):: 2238608

PAR ID:: 10542855

Author(s) / Creator(s):: Dalmia, Preyesh; Shashi_Kumar, Rajesh; Sinclair, Matthew D

Publisher / Repository:: IEEE

Date Published:: 2024-11-04

Subject(s) / Keyword(s):: GPGPU Chiplets Synchronization Coherence

Format(s):: Medium: X

Location:: IEEE/ACM International Symposium on Microarchitecture (MICRO)

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this