NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

EqMap: FPGA LUT Remapping using E-Graphs

https://doi.org/10.1109/ICCAD66269.2025.11240672

Hofmann, Matthew; Gokmen, Berk; Zhang, Zhiru (October 2025, IEEE)

Full Text Available
SmoothE: Differentiable E-Graph Extraction

https://doi.org/10.1145/3669940.3707262

Cai, Yaohui; Yang, Kaixin; Deng, Chenhui; Yu, Cunxi; Zhang, Zhiru (March 2025, ACM)

Full Text Available
Differentiable Combinatorial Scheduling at Scale

Liu, Mingju; Li, Yingjie; Yin, Jiaqi; Zhang, Zhiru; Yu, Cunxi (July 2024, International Conference on Machine Learning (ICML))

Full Text Available
Allo: A Programming Model for Composable Accelerator Design

https://doi.org/10.1145/3656401

Chen, Hongzheng; Zhang, Niansong; Xiang, Shaojie; Zeng, Zhichen; Dai, Mengjia; Zhang, Zhiru (June 2024, Proceedings of the ACM on Programming Languages)

Special-purpose hardware accelerators are increasingly pivotal for sustaining performance improvements in emerging applications, especially as the benefits of technology scaling continue to diminish. However, designers currently lack effective tools and methodologies to construct complex, high-performance accelerator architectures in a productive manner. Existing high-level synthesis (HLS) tools often require intrusive source-level changes to attain satisfactory quality of results. Despite the introduction of several new accelerator design languages (ADLs) aiming to enhance or replace HLS, their advantages are more evident in relatively simple applications with a single kernel. Existing ADLs prove less effective for realistic hierarchical designs with multiple kernels, even if the design hierarchy is flattened. In this paper, we introduce Allo, a composable programming model for efficient spatial accelerator design. Allo decouples hardware customizations, including compute, memory, communication, and data type from algorithm specification, and encapsulates them as a set of customization primitives. Allo preserves the hierarchical structure of an input program by combining customizations from different functions in a bottom-up, type-safe manner. This approach facilitates holistic optimizations that span across function boundaries. We conduct comprehensive experiments on commonly-used HLS benchmarks and several realistic deep learning models. Our evaluation shows that Allo can outperform state-of-the-art HLS tools and ADLs on all test cases in the PolyBench. For the GPT2 model, the inference latency of the Allo generated accelerator is 1.7x faster than the NVIDIA A100 GPU with 5.4x higher energy efficiency, demonstrating the capability of Allo to handle large-scale designs.
more » « less
Full Text Available
UniSparse: An Intermediate Language for General Sparse Format Customization

https://doi.org/10.1145/3649816

Liu, Jie; Zhao, Zhongyuan; Ding, Zijian; Brock, Benjamin; Rong, Hongbo; Zhang, Zhiru (April 2024, Proceedings of the ACM on Programming Languages)

The ongoing trend of hardware specialization has led to a growing use of custom data formats when processing sparse workloads, which are typically memory-bound. These formats facilitate optimized software/hardware implementations by utilizing sparsity pattern- or target-aware data structures and layouts to enhance memory access latency and bandwidth utilization. However, existing sparse tensor programming models and compilers offer little or no support for productively customizing the sparse formats. Additionally, because these frameworks represent formats using a limited set of per-dimension attributes, they lack the flexibility to accommodate numerous new variations of custom sparse data structures and layouts. To overcome this deficiency, we propose UniSparse, an intermediate language that provides a unified abstraction for representing and customizing sparse formats. Unlike the existing attribute-based frameworks, UniSparse decouples the logical representation of the sparse tensor (i.e., the data structure) from its low-level memory layout, enabling the customization of both. As a result, a rich set of format customizations can be succinctly expressed in a small set of well-defined query, mutation, and layout primitives. We also develop a compiler leveraging the MLIR infrastructure, which supports adaptive customization of formats, and automatic code generation of format conversion and compute operations for heterogeneous architectures. We demonstrate the efficacy of our approach through experiments running commonly-used sparse linear algebra operations with specialized formats on multiple different hardware targets, including an Intel CPU, an NVIDIA GPU, an AMD Xilinx FPGA, and a simulated processing-in-memory (PIM) device.
more » « less
Full Text Available
Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training

https://doi.org/10.1145/3620665.3640399

Chen, Hongzheng; Yu, Cody Hao; Zheng, Shuai; Zhang, Zhen; Zhang, Zhiru; Wang, Yida (April 2024, International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'2024))

Full Text Available
Formal Verification of Source-to-Source Transformations for HLS

https://doi.org/10.1145/3626202.3637563

Pouchet, Louis-Noël; Tucker, Emily; Zhang, Niansong; Chen, Hongzheng; Pal, Debjit; Rodríguez, Gabriel; Zhang, Zhiru (March 2024, International Symposium on Field Programmable Gate Arrays (FPGA'2024))
Equality Saturation for Datapath Synthesis: A Pathway to Pareto Optimality

https://doi.org/10.1109/DAC56929.2023.10247948

Ustun, Ecenur; Yu, Cunxi; Zhang, Zhiru (July 2023, Proceedings ACM IEEE Design Automation Conference)
An Intermediate Language for General Sparse Format Customization

https://doi.org/10.1109/LCA.2023.3262610

Liu, Jie; Zhao, Zhongyuan; Ding, Zijian; Brock, Benjamin; Rong, Hongbo; Zhang, Zhiru (January 2023, IEEE Computer Architecture Letters)

Full Text Available
Exact Memory- and Communication-aware Scheduling of DNNs on Pipelined Edge TPUs

https://doi.org/10.1109/SEC54971.2022.00023

Yin, Jiaqi; Zhang, Zhiru; Yu, Cunxi (December 2022, 2022 IEEE/ACM 7th Symposium on Edge Computing (SEC))

Full Text Available

« Prev Next »

Search for: All records