Fuzzing MLIR Compilers with Custom Mutation Synthesis

Limpanukorn, Ben; Wang, Jiyuan; Kang, Hong Jin; Zhou, Zitong; Kim, Miryung

doi:10.1109/ICSE55347.2025.00037

Compiler technologies in deep learning and domain-specific hardware acceleration are increasingly adopting extensible compiler frameworks such as Multi-Level Intermediate Representation (MLIR) to facilitate more efficient development. With MLIR, compiler developers can easily define their own custom IRs in the form of MLIR dialects. However, the diversity and rapid evolution of such custom IRs make it impractical to manually write a custom test generator for each dialect. To address this problem, we design a new test generator called SynthFuzz that combines grammar-based fuzzing with custom mutation synthesis. The key essence of SynthFuzz is two fold: (1) It automatically infers parameterized context-dependent custom mutations from existing test cases. (2) It then concretizes the mutation's content depending on the target context and reduces the chance of inserting invalid edits by performing k - ancestor and prefix/postfix matching. It obviates the need to manually define custom mutation operators for each dialect. We compare SynthFuzz to three baselines: Grammarinator-a grammar-based fuzzer without custom mutations, MLIRSmith-a custom test generator for MLIR core dialects, and NeuRI-a custom test generator for ML models with parameterization of tensor shapes. We conduct this comprehensive comparison on four different MLIR projects. Each project defines a new set of MLIR dialects where manually writing a custom test generator would take weeks of effort. Our evaluation shows that SynthFuzz on average improves MLIR dialect pair coverage by 1.75 ×, which increases branch coverage by 1.22 ×. Further, we show that our context dependent custom mutation increases the proportion of valid tests by up to 1.11 ×, indicating that SynthFuzz correctly concretizes its parameterized mutations with respect to the target context. Parameterization of the mutations reduces the fraction of tests violating the base MLIR constraints by 0.57 ×, increasing the time spent fuzzing dialect-specific code.

More Like this