NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DiTile-DGNN: An Efficient Accelerator for Distributed Dynamic Graph Neural Network Inference

https://doi.org/10.1145/3695053.3731017

Yang, Jiaqi; Zheng, Hao; Louri, Ahmed (June 2025, ACM)

Free, publicly-accessible full text available June 20, 2026
A High-Performance and Flexible Accelerator for Dynamic Graph Convolutional Networks

https://doi.org/10.23919/DATE64628.2025.10992726

Zhao, Yingnan; Wang, Ke; Louri, Ahmed (March 2025, IEEE)

Free, publicly-accessible full text available March 31, 2026
Chiplet-GAN: Chiplet-based Accelerator Design for Scalable Generative Adversarial Network Inference

Chen, Yuechen; Louri, Ahmed; Lombardi, Fabrizio; Liu, Shanshan (August 2024, IEEE Circuits and System)

Generative adversarial networks (GANs) have emerged as a powerful solution for generating synthetic data when the availability of large, labeled training datasets is limited or costly in large-scale machine learning systems. Recent advancements in GAN models have extended their applications across diverse domains, including medicine, robotics, and content synthesis. These advanced GAN models have gained recognition for their excellent accuracy by scaling the model. However, existing accelerators face scalability challenges when dealing with large-scale GAN models. As the size of GAN models increases, the demand for computation and communication resources during inference continues to grow. To address this scalability issue, this article proposes Chiplet-GAN, a chiplet-based accelerator design for GAN inference. Chiplet-GAN enables scalability by adding more chiplets to the system, thereby supporting the scaling of computation capabilities. To handle the increasing communication demand as the system and model scale, a novel interconnection network with adaptive topology and passive/active network links is developed to provide adequate communication support for Chiplet-GAN. Coupled with workload partition and allocation algorithms, Chiplet-GAN reduces execution time and energy consumption for GAN inference workloads as both model and chiplet-system scales. Evaluation results using various GAN models show the effectiveness of Chiplet-GAN. On average, compared to GANAX, SpAtten, and Simba, the Chiplet-GAN reduces execution time and energy consumption by 34% and 21%, respectively. Furthermore, as the system scales for large-scale GAN model inference, Chiplet-GAN achieves reductions in execution time of up to 63% compared to the Simba, a chiplet-based accelerator.
more » « less
Full Text Available
An Efficient Hardware Accelerator Design for Dynamic Graph Convolutional Network (DGCN) Inference

https://doi.org/10.1145/3649329.3658254

Zhao, Yingnan; Wang, Ke; Yang, Jiaqi; Louri, Ahmed (June 2024, ACM)

Full Text Available
OPT-GCN: A Unified and Scalable Chiplet-based Accelerator for High-Performance and Energy-Efficient GCN Computation

https://doi.org/10.1109/TCAD.2024.3401543

Zhao, Yingnan; Wang, Ke; Louri, Ahmed (May 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

As the size of real-world graphs continues to grow at an exponential rate, performing the Graph Convolutional Network (GCN) inference efficiently is becoming increasingly challenging. Prior works that employ a unified computing engine with a predefined computation order lack the necessary flexibility and scalability to handle diverse input graph datasets. In this paper, we introduce OPT-GCN, a chiplet-based accelerator design that performs GCN inference efficiently while providing flexibility and scalability through an architecture-algorithm co-design. On the architecture side, the proposed design integrates a unified computing engine in each chiplet and an active interposer, both of which are adaptable to efficiently perform the GCN inference and facilitate data communication. On the algorithm side, we propose dynamic scheduling and mapping algorithms to optimize memory access and on-chip computations for diverse GCN applications. Experimental results show that the proposed design provides a memory access reduction by a factor of 11.3×, 3.4×, 1.4× energy savings of 15.2×, 3.7×, 1.6× on average compared to HyGCN, AWB-GCN, and GCNAX, respectively.
more » « less
Full Text Available
Morph-GCNX: A Universal Architecture for High-Performance and Energy-Efficient Graph Convolutional Network Acceleration

https://doi.org/10.1109/TSUSC.2023.3313880

Wang, Ke; Zheng, Hao; Li, Jiajun; Louri, Ahmed (September 2023, IEEE Transactions on Sustainable Computing)

Full Text Available
Venus: A Versatile Deep Neural Network Accelerator Architecture Design for Multiple Applications

https://doi.org/10.1109/DAC56929.2023.10247897

Yang, Jiaqi; Zheng, Hao; Louri, Ahmed (July 2023, IEEE)
Nanoscale Accelerators for Artificial Neural Networks

https://doi.org/10.1109/MNANO.2022.3208757

Niknia, Farzad; Wang, Ziheng; Liu, Shanshan; Louri, Ahmed; Lombardi, Fabrizio (December 2022, IEEE Nanotechnology Magazine)

Full Text Available
FSA: An Efficient Fault-tolerant Systolic Array-based DNN Accelerator Architecture

https://doi.org/10.1109/ICCD56317.2022.00086

Zhao, Yingnan; Wang, Ke; Louri, Ahmed (October 2022, IEEE International Conference on Computer Design (ICCD))

Full Text Available
Adapt-Flow: A Flexible DNN Accelerator Architecture for Heterogeneous Dataflow Implementation

https://doi.org/10.1145/3526241.3530311

Yang, Jiaqi; Zheng, Hao; Louri, Ahmed (June 2022, Great Lakes Symposium on VLSI)

Full Text Available

« Prev Next »

Search for: All records