Dynamically Fusing Python HPC Kernels

Al_Awar, Nader; Naeem, Muhammad Hannan; Almgren-Bell, James; Biros, George; Gligoric, Milos

doi:10.1145/3728959

Citation Details

This content will become publicly available on June 22, 2026

Dynamically Fusing Python HPC Kernels

Recent trends in high-performance computing show an increase in the adoption of performance portable frameworks such as Kokkos and interpreted languages such as Python. PyKokkos follows these trends and enables programmers to write performance-portable kernels in Python which greatly increases productivity. One issue that programmers still face is how to organize parallel code, as splitting code into separate kernels simplifies testing and debugging but may result in suboptimal performance. To enable programmers to organize kernels in any way they prefer while ensuring good performance, we present PyFuser, a program analysis framework for automatic fusion of performance portable PyKokkos kernels. PyFuser dynamically traces kernel calls and lazily fuses them once the result is requested by the application. PyFuser generates fused kernels that execute faster due to better reuse of data, improved compiler optimizations, and reduced kernel launch overhead, while not requiring any changes to existing PyKokkos code. We also introduce automated code transformations that further optimize the fused kernels generated by PyFuser. Our experiments show that on average PyFuser achieves speedups compared to unfused kernels of 3.8x on NVIDIA and AMD GPUs, as well as Intel and AMD CPUs. more »

Award ID(s):: 2107291 2217696 2313027 2403036

PAR ID:: 10628337

Author(s) / Creator(s):: Al_Awar, Nader; Naeem, Muhammad Hannan; Almgren-Bell, James; Biros, George; Gligoric, Milos

Publisher / Repository:: ACM

Date Published:: 2025-06-22

Journal Name:: Proceedings of the ACM on Software Engineering

Volume:: 2

Issue:: ISSTA

ISSN:: 2994-970X

Page Range / eLocation ID:: 1865 to 1886

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 22, 2026
Journal Article:
https://doi.org/10.1145/3728959

More Like this