NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Balanced Sparse Matrix Convolution Accelerator for Efficient CNN Training

https://doi.org/10.1109/TCSI.2024.3430831

Chen, Yuechen; Louri, Ahmed; Liu, Shanshan; Lombardi, Fabrizio (October 2024, IEEE Transactions on Circuits and Systems I: Regular Papers)

Full Text Available
Chiplet-GAN: Chiplet-based Accelerator Design for Scalable Generative Adversarial Network Inference

Chen, Yuechen; Louri, Ahmed; Lombardi, Fabrizio; Liu, Shanshan (August 2024, IEEE Circuits and System)

Generative adversarial networks (GANs) have emerged as a powerful solution for generating synthetic data when the availability of large, labeled training datasets is limited or costly in large-scale machine learning systems. Recent advancements in GAN models have extended their applications across diverse domains, including medicine, robotics, and content synthesis. These advanced GAN models have gained recognition for their excellent accuracy by scaling the model. However, existing accelerators face scalability challenges when dealing with large-scale GAN models. As the size of GAN models increases, the demand for computation and communication resources during inference continues to grow. To address this scalability issue, this article proposes Chiplet-GAN, a chiplet-based accelerator design for GAN inference. Chiplet-GAN enables scalability by adding more chiplets to the system, thereby supporting the scaling of computation capabilities. To handle the increasing communication demand as the system and model scale, a novel interconnection network with adaptive topology and passive/active network links is developed to provide adequate communication support for Chiplet-GAN. Coupled with workload partition and allocation algorithms, Chiplet-GAN reduces execution time and energy consumption for GAN inference workloads as both model and chiplet-system scales. Evaluation results using various GAN models show the effectiveness of Chiplet-GAN. On average, compared to GANAX, SpAtten, and Simba, the Chiplet-GAN reduces execution time and energy consumption by 34% and 21%, respectively. Furthermore, as the system scales for large-scale GAN model inference, Chiplet-GAN achieves reductions in execution time of up to 63% compared to the Simba, a chiplet-based accelerator.
more » « less
Full Text Available
Slack-Aware Packet Approximation for Energy-Efficient Network-on-Chips

https://doi.org/10.1109/TSUSC.2022.3213469

Chen, Yuechen; Louri, Ahmed; Liu, Shanshan; Lombardi, Fabrizio (January 2023, IEEE Transactions on Sustainable Computing)

Full Text Available
A Technique for Approximate Communication in Network-on-Chips for Image Classification

https://doi.org/10.1109/TETC.2022.3162165

Chen, Yuechen; Liu, Shanshan; Lombardi, Fabrizio; Louri, Ahmed (January 2023, IEEE Transactions on Emerging Topics in Computing)

Full Text Available
Approximate Network-on-Chips with Application to Image Classification

https://doi.org/10.1109/NAS55553.2022.9925540

Chen, Yuechen; Louri, Ahmed; Liu, Shanshan; Lombardi, Fabrizio (October 2022, IEEE International Conference on Networking, Architecture, and Storage (NAS))

Full Text Available
Learning-Based Quality Management for Approximate Communication in Network-on-Chips

https://doi.org/10.1109/TCAD.2020.3012235

Chen, Yuechen; Louri, Ahmed (November 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)
null (Ed.)
Full Text Available
An Approximate Communication Framework for Network-on-Chips

https://doi.org/10.1109/TPDS.2020.2968068

Chen, Yuechen; Louri, Ahmed (June 2020, IEEE Transactions on Parallel and Distributed Systems)

Full Text Available
An online quality management framework for approximate communication in network-on-chips

https://doi.org/10.1145/3330345.3330365

Chen, Yuechen; Louri, Ahmed (June 2019, ACM International Conference on Supercomputing)

Approximate communication is being seriously considered as an effective technique for reducing power consumption and improving the communication efficiency of network-on-chips (NoCs). A major problem faced by these techniques is quality control: how do we ensure that the network will transmit data with sufficient accuracy for applications to produce acceptable results? Previous methods that addressed this issue require each application to calculate the approximation level for every piece of approximable data, which takes hundreds of cycles. So the approximation information is often not available when a request packet is transmitted. Therefore, the reply packet with the approximable data is transmitted with unnecessarily absolute accuracy, reducing the effectiveness of approximate communication. In this paper, we propose a hardware-based quality management framework for approximate communication to minimize the time needed for the approximation level calculation. The proposed framework employs a configuration algorithm to continuously adjust the quality of every piece of data based on the difference between the output quality and the application's quality requirement. When the proposed framework is implemented in a network, every request packet can be transmitted with the updated approximation level. This framework results in fewer flits in each data packet and reduces traffic in NoCs while meeting the quality requirements of applications. Our cycle-accurate simulation using the AxBench benchmark suite shows that the proposed online quality management framework can reduce network latency by up to 52% and dynamic power consumption by 59% compared to previous approximate communication techniques while ensuring 95% output quality. This hardware-software codesign incurs 1% area overhead over previous techniques.
more » « less
Full Text Available
DEC-NoC: An Approximate Framework Based on Dynamic Error Control with Applications to Energy-Efficient NoCs

https://doi.org/10.1109/ICCD.2018.00078

Chen, Yuechen; Reza, Md Farhadur; Louri, Ahmed (October 2018, 2018 IEEE 36th International Conference on Computer Design)

Network-on-Chips (NoCs) have emerged as the standard on-chip communication fabrics for multi/many core systems and system on chips. However, as the number of cores on chip increases, so does power consumption. Recent studies have shown that NoC power consumption can reach up to 40% of the overall chip power. Considerable research efforts have been deployed to significantly reduce NoC power consumption. In this paper, we build on approximate computing techniques and propose an approximate communication methodology called DEC-NoC for reducing NoC power consumption. The proposed DEC-NoC leverages applications' error tolerance and dynamically reduces the amount of error checking and correction in packet transmission, which results in a significant reduction in the number of retransmitted packets. The reduction in packet retransmission results in reduced power consumption. Our cycle accurate simulation using PARSEC benchmark suites shows that DEC-NoC achieves up to 56% latency reduction and up to 58% dynamic power reduction compared to NoC architectures with conventional error control techniques.
more » « less
Full Text Available

Search for: All records