NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Mithril: A Scalable System for Deep GNN Training

https://doi.org/10.1109/HPCA61900.2025.00082

Chen, Jingji; Chen, Zhuoming; Qian, Xuehai (March 2025, IEEE)

Free, publicly-accessible full text available March 1, 2026
NAPA: Intermediate-Level Variational Native-Pulse Ansatz for Variational Quantum Algorithms

https://doi.org/10.1109/TCAD.2024.3355277

Liang, Zhiding; Cheng, Jinglei; Ren, Hang; Wang, Hanrui; Hua, Fei; Song, Zhixin; Ding, Yongshan; Chong, Frederic T.; Han, Song; Qian, Xuehai; et al (January 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
DyNNamic: Dynamically Reshaping, High Data-Reuse Accelerator for Compact DNNs

https://doi.org/10.1109/TC.2022.3184272

Hanson, Edward; Li, Shiyu; Qian, Xuehai; Li, Hai Helen; Chen, Yiran (March 2023, IEEE Transactions on Computers)

Full Text Available
GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity

https://doi.org/10.1109/TPAMI.2021.3089687

Niu, Wei; Li, Zhengang; Ma, Xiaolong; Dong, Peiyan; Zhou, Gang; Qian, Xuehai; Lin, Xue; Wang, Yanzhi; Ren, Bin (October 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence)

It is appealing but challenging to achieve real-time deep neural network (DNN) inference on mobile devices because even the powerful modern mobile devices are considered “resource-constrained” when executing large-scale DNNs. It necessitates the sparse model inference via weight pruning, i.e., DNN weight sparsity, and it is desirable to design a new DNN weight sparsity scheme that can facilitate real-time inference on mobile devices while preserving a high sparse model accuracy. This paper designs a novel mobile inference acceleration framework GRIM that is General to both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) and that achieves Real-time execution and high accuracy, leveraging fine-grained structured sparse model Inference and compiler optimizations for Mobiles. We start by proposing a new fine-grained structured sparsity scheme through the Block-based Column-Row (BCR) pruning. Based on this new fine-grained structured sparsity, our GRIM framework consists of two parts: (a) the compiler optimization and code generation for real-time mobile inference; and (b) the BCR pruning optimizations for determining pruning hyperparameters and performing weight pruning. We compare GRIM with Alibaba MNN, TVM, TensorFlow-Lite, a sparse implementation based on CSR, PatDNN, and ESE (a representative FPGA inference acceleration framework for RNNs), and achieve up to 14.08× speedup.
more » « less
Full Text Available
ESCALATE: Boosting the Efficiency of Sparse CNN Accelerator with Kernel Decomposition

https://doi.org/10.1145/3466752.3480043

Li, Shiyu; Hanson, Edward; Qian, Xuehai; Li, Hai "Helen"; Chen, Yiran (October 2021, IEEE/ACM International Symposium on Microarchitecture)
null (Ed.)
The ever-growing parameter size and computation cost of Convolutional Neural Network (CNN) models hinder their deployment onto resource-constrained platforms. Network pruning techniques are proposed to remove the redundancy in CNN parameters and produce a sparse model. Sparse-aware accelerators are also proposed to reduce the computation cost and memory bandwidth requirements of inference by leveraging the model sparsity. The irregularity of sparse patterns, however, limits the efficiency of those designs. Researchers proposed to address this issue by creating a regular sparsity pattern through hardware-aware pruning algorithms. However, the pruning rate of these solutions is largely limited by the enforced sparsity patterns. This limitation motivates us to explore other compression methods beyond pruning. With two decoupled computation stages, we found that kernel decomposition could potentially take the processing of the sparse pattern off from the critical path of inference and achieve a high compression ratio without enforcing the sparse patterns. To exploit these advantages, we propose ESCALATE, an algorithm-hardware co-design approach based on kernel decomposition. At algorithm level, ESCALATE reorganizes the two computation stages of the decomposed convolution to enable a stream processing of the intermediate feature map. We proposed a hybrid quantization to exploit the different reuse frequency of each part of the decomposed weight. At architecture level, ESCALATE proposes a novel ‘Basis-First’ dataflow and its corresponding microarchitecture design to maximize the benefits brought by the decomposed convolution.
more » « less
Full Text Available
Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

Chang, Sung-En; Li, Yanyu; Sun, Mengshu; Shi, Runbin; So, Hayden K.-H.; Qian, Xuehai; Wang, Yanzhi; Lin, Xue (February 2021, IEEE International Symposium on High Performance Computer Architecture (HPCA))
null (Ed.)
Full Text Available
Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

https://doi.org/10.1109/HPCA51647.2021.00027

Chang, Sung-En; Li, Yanyu; Sun, Mengshu; Shi, Runbin; So, Hayden K.-H.; Qian, Xuehai; Wang, Yanzhi; Lin, Xue (February 2021, Proc. of High Performance Computing Architecture (HPCA))
null (Ed.)
Full Text Available
AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators

https://doi.org/10.1109/HPCA47549.2020.00036

Song, Linghao; Chen, Fan; Zhuo, Youwei; Qian, Xuehai; Li, Hai; Chen, Yiran (February 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA))

Deep neural network (DNN) accelerators as an example of domain-specific architecture have demonstrated great success in DNN inference. However, the architecture acceleration for equally important DNN training has not yet been fully studied. With data forward, error backward and gradient calculation, DNN training is a more complicated process with higher computation and communication intensity. Because the recent research demonstrates a diminishing specialization return, namely, “accelerator wall”, we believe that a promising approach is to explore coarse-grained parallelism among multiple performance-bounded accelerators to support DNN training. Distributing computations on multiple heterogeneous accelerators to achieve high throughput and balanced execution, however, remaining challenging. We present ACCPAR, a principled and systematic method of determining the tensor partition among heterogeneous accelerator arrays. Compared to prior empirical or unsystematic methods, ACCPAR considers the complete tensor partition space and can reveal previously unknown new parallelism configurations. ACCPAR optimizes the performance based on a cost model that takes into account both computation and communication costs of a heterogeneous execution environment. Hence, our method can avoid the drawbacks of existing approaches that use communication as a proxy of the performance. The enhanced flexibility of tensor partitioning in ACCPAR allows the flexible ratio of computations to be distributed among accelerators with different performances. The proposed search algorithm is also applicable to the emerging multi-path patterns in modern DNNs such as ResNet. We simulate ACCPAR on a heterogeneous accelerator array composed of both TPU-v2 and TPU-v3 accelerators for the training of large-scale DNN models such as Alexnet, Vgg series and Resnet series. The average performance improvements of the state-of-the-art “one weird trick” (OWT) and HYPAR, and ACCPAR, normalized to the baseline data parallelism scheme where each accelerator replicates the model and processes different input data in parallel, are 2.98×, 3.78×, and 6.30×, respectively.
more » « less
Full Text Available
Non-Structured DNN Weight Pruning--Is It Beneficial in Any Platform?

https://doi.org/10.1109/TNNLS.2021.3063265

Ma, Xiaolong; Lin, Sheng; Ye, Shaokai; He, Zhezhi; Zhang, Linfeng; Yuan, Geng; Tan, Sia Huat; Li, Zhengang; Fan, Deliang; Qian, Xuehai; et al (March 2021, IEEE Transactions on Neural Networks and Learning Systems)
null (Ed.)
Full Text Available
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

Niu, Wei; Ma, Xiaolong; Lin, Sheng; Wang, Shihao; Qian, Xuehai; Lin, Xue; Wang, Yanzhi; Ren, Bin (March 2020, The Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS))

Full Text Available

« Prev Next »

Search for: All records