High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

Moses, William S; Ivanov, Ivan R; Domke, Jens; Endo, Toshio; Doerfert, Johannes; Zinenko, Oleksandr

doi:10.1145/3572848.3577475

Citation Details

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

While parallelism remains the main source of performance,architectural implementations and programming modelschange with each new hardware generation, often leadingto costly application re-engineering. Most tools for perfor-mance portability require manual and costly application port-ing to yet another programming model.We propose an alternative approach that automaticallytranslates programs written in one programming model(CUDA), into another (CPU threads) based on Polygeist/MLIR.Our approach includes a representation of parallel constructsthat allows conventional compiler transformations to ap-ply transparently and without modification a nd enablesparallelism-specific optimizations. We evaluate our frame-work by transpiling and optimizing the CUDA Rodinia bench-mark suite for a multi-core CPU and achieve a 58% geomeanspeedup over handwritten OpenMP code. Further, we showhow CUDA kernels from PyTorch can efficiently run andscale on the CPU-only Supercomputer Fugaku without userintervention. Our PyTorch compatibility layer making use oftranspiled CUDA PyTorch kernels outperforms the PyTorchCPU native backend by 2.7×. more »

Award ID(s):: 2103942

PAR ID:: 10635281

Author(s) / Creator(s):: Moses, William S; Ivanov, Ivan R; Domke, Jens; Endo, Toshio; Doerfert, Johannes; Zinenko, Oleksandr

Corporate Creator(s):: NA

Editor(s):: NA

Publisher / Repository:: ACM

Date Published:: 2023-02-21

Edition / Version:: -

Volume:: -

Issue:: -

ISSN:: -

ISBN:: 9798400700156

Page Range / eLocation ID:: 119 to 134

Subject(s) / Keyword(s):: Polygeist, MLIR, CUDA, Barrier Synchronization

Format(s):: Medium: X Size: - Other: -

Size(s):: -

Location:: Montreal QC Canada

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Conference Paper:
https://doi.org/10.1145/3572848.3577475

More Like this