NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Balanced Sparse Matrix Convolution Accelerator for Efficient CNN Training

https://doi.org/10.1109/TCSI.2024.3430831

Chen, Yuechen; Louri, Ahmed; Liu, Shanshan; Lombardi, Fabrizio (October 2024, IEEE Transactions on Circuits and Systems I: Regular Papers)

Full Text Available
Chiplet-GAN: Chiplet-based Accelerator Design for Scalable Generative Adversarial Network Inference

Chen, Yuechen; Louri, Ahmed; Lombardi, Fabrizio; Liu, Shanshan (August 2024, IEEE Circuits and System)

Generative adversarial networks (GANs) have emerged as a powerful solution for generating synthetic data when the availability of large, labeled training datasets is limited or costly in large-scale machine learning systems. Recent advancements in GAN models have extended their applications across diverse domains, including medicine, robotics, and content synthesis. These advanced GAN models have gained recognition for their excellent accuracy by scaling the model. However, existing accelerators face scalability challenges when dealing with large-scale GAN models. As the size of GAN models increases, the demand for computation and communication resources during inference continues to grow. To address this scalability issue, this article proposes Chiplet-GAN, a chiplet-based accelerator design for GAN inference. Chiplet-GAN enables scalability by adding more chiplets to the system, thereby supporting the scaling of computation capabilities. To handle the increasing communication demand as the system and model scale, a novel interconnection network with adaptive topology and passive/active network links is developed to provide adequate communication support for Chiplet-GAN. Coupled with workload partition and allocation algorithms, Chiplet-GAN reduces execution time and energy consumption for GAN inference workloads as both model and chiplet-system scales. Evaluation results using various GAN models show the effectiveness of Chiplet-GAN. On average, compared to GANAX, SpAtten, and Simba, the Chiplet-GAN reduces execution time and energy consumption by 34% and 21%, respectively. Furthermore, as the system scales for large-scale GAN model inference, Chiplet-GAN achieves reductions in execution time of up to 63% compared to the Simba, a chiplet-based accelerator.
more » « less
Full Text Available
ASIC Design of Nanoscale Artificial Neural Networks for Inference/Training by Floating-Point Arithmetic

https://doi.org/10.1109/TNANO.2024.3367916

Niknia, Farzad; Wang, Ziheng; Liu, Shanshan; Reviriego, Pedro; Louri, Ahmed; Lombardi, Fabrizio (January 2024, IEEE Transactions on Nanotechnology)

Full Text Available
Fault Tolerance in Triplet Network Training: Analysis, Evaluation and Protection Methods

https://doi.org/10.1109/TETC.2024.3481962

Wang, Ziheng; Niknia, Farzad; Liu, Shanshan; Reviriego, Pedro; Louri, Ahmed; Lombardi, Fabrizio (January 2024, IEEE Transactions on Emerging Topics in Computing)

Full Text Available
Floating-Point Formats and Arithmetic for Highly Accurate Multi-Layer Perceptrons

https://doi.org/10.1109/NANO58406.2023.10231201

Niknia, Farzad; Wang, Ziheng; Liu, Shanshan; Reviriego, Pedro; Louri, Ahmed; Lombardi, Fabrizio (July 2023, IEEE)

The data precision can significantly affect the accuracy and overhead metrics of hardware accelerators for different applications such as artificial neural networks (ANNs). This paper evaluates the inference and training of multi-layer perceptrons (MLPs), in which initially IEEE standard floating-point (FP) precisions (half, single and double) are utilized separately and then compared with mixed-precision FP formats. The mixed-precision calculations are investigated for three critical propagation modules (activation functions, weight updates, and accumulation units). Compared with applying a simple low-precision format, the mixed-precision format prevents an accuracy loss and the occurrence of overflow/underflow in the MLPs while potentially incurring in less hardware overhead in terms of area/power. As the multiply-accumulation is the most dominant operation in trending ANNs, a fully pipelined hardware implementation for the fused multiply-add units is proposed for different IEEE FP formats to achieve a very high operating frequency.
more » « less
Full Text Available
Error-Resilient Data Compression With Tunstall Codes

https://doi.org/10.1109/TCSI.2023.3245022

Liu, Shanshan; Reviriego, Pedro; Ullah, Anees; Louri, Ahmed; Lombardi, Fabrizio (May 2023, IEEE Transactions on Circuits and Systems I: Regular Papers)

Full Text Available
Slack-Aware Packet Approximation for Energy-Efficient Network-on-Chips

https://doi.org/10.1109/TSUSC.2022.3213469

Chen, Yuechen; Louri, Ahmed; Liu, Shanshan; Lombardi, Fabrizio (January 2023, IEEE Transactions on Sustainable Computing)

Full Text Available
A Technique for Approximate Communication in Network-on-Chips for Image Classification

https://doi.org/10.1109/TETC.2022.3162165

Chen, Yuechen; Liu, Shanshan; Lombardi, Fabrizio; Louri, Ahmed (January 2023, IEEE Transactions on Emerging Topics in Computing)

Full Text Available
Nanoscale Accelerators for Artificial Neural Networks

https://doi.org/10.1109/MNANO.2022.3208757

Niknia, Farzad; Wang, Ziheng; Liu, Shanshan; Louri, Ahmed; Lombardi, Fabrizio (December 2022, IEEE Nanotechnology Magazine)

Full Text Available
Approximate Network-on-Chips with Application to Image Classification

https://doi.org/10.1109/NAS55553.2022.9925540

Chen, Yuechen; Louri, Ahmed; Liu, Shanshan; Lombardi, Fabrizio (October 2022, IEEE International Conference on Networking, Architecture, and Storage (NAS))

Full Text Available

« Prev Next »

Search for: All records