NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learning-Enabled Denial-of-Service (DoS) Attack Detection and Mitigation for Chiplet-Based Hybrid Interconnection Network

https://doi.org/10.1145/3716368.3735226

Mahmud, Md Tareq; Wang, Ke (June 2025, ACM)

Free, publicly-accessible full text available June 29, 2026
Decision-driven fault-tolerant architecture for vision transformers with real-time error mitigation

https://doi.org/10.52953/ZBPE9349

Gudluru, Indhuja; Shen, Chunyuan; Wang, Ke (June 2025, ITU Journal on Future and Evolving Technologies)

Vision Transformers (ViTs) have evolved in the field of computer vision by transitioning traditional Convolutional Neural Networks (CNNs) into attention-based architectures. This architecture processes input images as sequences of patches. ViTs achieve enhanced performance in many tasks such as image classification and object detection due to their ability to capture global dependencies within input data. While their software implementations are widely adopted, deploying ViTs on hardware introduces several challenges. These include fault tolerance in the presence of hardware failures, real-time reliability, and high computational requirements. Permanent faults that are in processing elements, interconnections, or memory subsystems lead to incorrect computations and degrading system performance. This paper proposes a fault-tolerant hardware implementation of ViTs to overcome these challenges. This hardware implementation integrates real-time fault detection and recovery mechanisms. The architecture includes four primary units: patch embedding, encoder, decoder, and Multi Layer Perceptron (MLP) which are supported by fault-tolerant components such as lightweight recompute units, a centralized Built-In Self-Test (BIST), and a learning-based decision-making system using machine learning model 'decision tree'. These units are interconnected through a centralized global buffer for efficient data transfer, ensuring seamless operation even under fault conditions.
more » « less
Free, publicly-accessible full text available June 25, 2026
A High-Performance and Flexible Accelerator for Dynamic Graph Convolutional Networks

https://doi.org/10.23919/DATE64628.2025.10992726

Zhao, Yingnan; Wang, Ke; Louri, Ahmed (March 2025, IEEE)

Free, publicly-accessible full text available March 31, 2026
A Chiplet-Based High-Performance and Secure Hybrid Interconnection Network Design Against DoS and Sniffing Attacks

https://doi.org/10.1109/SoutheastCon56624.2025.10971688

Mahmud, Md Tareq; Wang, Ke (March 2025, IEEE)

Free, publicly-accessible full text available March 22, 2026
HS-GCN: a High-performance, Sustainable, and Scalable Chiplet-based Accelerator for Graph Convolutional Network Inference

https://doi.org/10.1109/TSUSC.2025.3575285

Zhao, Yingnan; Wang, Ke; Louri, Ahmed (January 2025, IEEE Transactions on Sustainable Computing)

Full Text Available
MERIT: A Sustainable DNN Accelerator Design with Photonic Phase-Change Memory

https://doi.org/10.1109/TSUSC.2024.3521847

Li, Yuan; Louri, Ahmed; Karanth, Avinash (January 2025, IEEE Transactions on Sustainable Computing)

The growing computational demands of deep learning have driven interest in analog neural networks using resistive memory and silicon photonics. However, these technologies face inherent limitations in computing parallelism when used independently. Photonic phase-change memory (PCM), which integrates photonics with PCM, overcomes these constraints by enabling simultaneous processing of multiple inputs encoded on different wavelengths, significantly enhancing parallel computation for deep neural network (DNN) inference and training. This paper presents MERIT, a sustainable DNN accelerator that capitalizes on the non-volatility of resistive memory and the high operating speed of photonic devices. MERIT enables seamless inference and training by loading weight kernels into photonic PCM arrays and selectively supplying light encoded with input features for the forward pass and loss gradients for the backward pass. We compare MERIT with state-of-the-art digital and analog DNN accelerators including TPU, DEAP, and PTC. Simulation results demonstrate that MERIT reduces execution time by 68% and energy consumption by 64% for inference, and reduces execution time by 79% and energy consumption by 84% for training.
more » « less
Full Text Available
A Flexible Hybrid Interconnection Design for High-Performance and Energy-Efficient Chiplet-Based Systems

https://doi.org/10.1109/LCA.2024.3477253

Mahmud, Md Tareq; Wang, Ke (July 2024, IEEE Computer Architecture Letters)

Full Text Available
An Efficient Hardware Accelerator Design for Dynamic Graph Convolutional Network (DGCN) Inference

https://doi.org/10.1145/3649329.3658254

Zhao, Yingnan; Wang, Ke; Yang, Jiaqi; Louri, Ahmed (June 2024, ACM)

Full Text Available
OPT-GCN: A Unified and Scalable Chiplet-based Accelerator for High-Performance and Energy-Efficient GCN Computation

https://doi.org/10.1109/TCAD.2024.3401543

Zhao, Yingnan; Wang, Ke; Louri, Ahmed (May 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

As the size of real-world graphs continues to grow at an exponential rate, performing the Graph Convolutional Network (GCN) inference efficiently is becoming increasingly challenging. Prior works that employ a unified computing engine with a predefined computation order lack the necessary flexibility and scalability to handle diverse input graph datasets. In this paper, we introduce OPT-GCN, a chiplet-based accelerator design that performs GCN inference efficiently while providing flexibility and scalability through an architecture-algorithm co-design. On the architecture side, the proposed design integrates a unified computing engine in each chiplet and an active interposer, both of which are adaptable to efficiently perform the GCN inference and facilitate data communication. On the algorithm side, we propose dynamic scheduling and mapping algorithms to optimize memory access and on-chip computations for diverse GCN applications. Experimental results show that the proposed design provides a memory access reduction by a factor of 11.3×, 3.4×, 1.4× energy savings of 15.2×, 3.7×, 1.6× on average compared to HyGCN, AWB-GCN, and GCNAX, respectively.
more » « less
Full Text Available
Fault Tolerance in Triplet Network Training: Analysis, Evaluation and Protection Methods

https://doi.org/10.1109/TETC.2024.3481962

Wang, Ziheng; Niknia, Farzad; Liu, Shanshan; Reviriego, Pedro; Louri, Ahmed; Lombardi, Fabrizio (January 2024, IEEE Transactions on Emerging Topics in Computing)

Full Text Available

« Prev Next »

Search for: All records