NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PQA: Exploring the Potential of Product Quantization in DNN Hardware Acceleration

https://doi.org/10.1145/3656643

Abouelhamayed, Ahmed; Cui, Angela; Fernandez-marques, Javier; Lane, Nicholas; Abdelfattah, Mohamed (March 2025, ACM Transactions on Reconfigurable Technology and Systems)

Conventional multiply-accumulate (MAC) operations have long dominated computation time for deep neural networks (DNNs), especially convolutional neural networks (CNNs). Recently, product quantization (PQ) has been applied to these workloads, replacing MACs with memory lookups to pre-computed dot products. To better understand the efficiency tradeoffs of product-quantized DNNs (PQ-DNNs), we create a custom hardware accelerator to parallelize and accelerate nearest-neighbor search and dot-product lookups. Additionally, we perform an empirical study to investigate the efficiency–accuracy tradeoffs of different PQ parameterizations and training methods. We identify PQ configurations that improve performance-per-area for ResNet20 by up to 3.1×, even when compared to a highly optimized conventional DNN accelerator, with similar improvements on two additional compact DNNs. When comparing to recent PQ solutions, we outperform prior work by 4× in terms of performance-per-area with a 0.6% accuracy degradation. Finally, we reduce the bitwidth of PQ operations to investigate the impact on both hardware efficiency and accuracy. With only 2–6-bit precision on three compact DNNs, we were able to maintain DNN accuracy eliminating the need for DSPs.
more » « less
Free, publicly-accessible full text available March 31, 2026
BBS: Bi-Directional Bit-Level Sparsity for Deep Learning Acceleration

https://doi.org/10.1109/MICRO61859.2024.00048

Chen, Yuzong; Meng, Jian; Seo, Jae-sun; Abdelfattah, Mohamed S (November 2024, IEEE)

Free, publicly-accessible full text available November 2, 2025
FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search

Dotzel, Jordan; Wu, Gang; Li, Andrew; Umar, Muhammad; Ni, Yun; Abdelfattah, Mohamed S; Zhang, Zhiru; Cheng, Liqun; Dixon, Martin G; Jouppi, Norman P; et al (September 2024, Openreview)

Full Text Available
Kratos: An FPGA Benchmark for Unrolled DNNs with Fine-Grained Sparsity and Mixed Precision

https://doi.org/10.1109/FPL64840.2024.00030

Dai, Xilai; Chen, Yuzong; Abdelfattah, Mohamed S (September 2024, IEEE)

Full Text Available
M4BRAM: Mixed-Precision Matrix-Matrix Multiplication in FPGA Block RAMs

https://doi.org/10.1109/ICFPT59805.2023.00013

Chen, Yuzong; Dotzel, Jordan; Abdelfattah, Mohamed S. (December 2023, 2023 International Conference on Field Programmable Technology (ICFPT))

Full Text Available
BRAMAC: Compute-in-BRAM Architectures for Multiply-Accumulate on FPGAs

https://doi.org/10.1109/FCCM57271.2023.00015

Chen, Yuzong; Abdelfattah, Mohamed S. (May 2023, Proceedings Annual IEEE Symposium on Field Programmable Custom Computing Machines)

Full Text Available

Search for: All records