NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MERIT: A Sustainable DNN Accelerator Design with Photonic Phase-Change Memory

https://doi.org/10.1109/TSUSC.2024.3521847

Li, Yuan; Louri, Ahmed; Karanth, Avinash (January 2025, IEEE Transactions on Sustainable Computing)

The growing computational demands of deep learning have driven interest in analog neural networks using resistive memory and silicon photonics. However, these technologies face inherent limitations in computing parallelism when used independently. Photonic phase-change memory (PCM), which integrates photonics with PCM, overcomes these constraints by enabling simultaneous processing of multiple inputs encoded on different wavelengths, significantly enhancing parallel computation for deep neural network (DNN) inference and training. This paper presents MERIT, a sustainable DNN accelerator that capitalizes on the non-volatility of resistive memory and the high operating speed of photonic devices. MERIT enables seamless inference and training by loading weight kernels into photonic PCM arrays and selectively supplying light encoded with input features for the forward pass and loss gradients for the backward pass. We compare MERIT with state-of-the-art digital and analog DNN accelerators including TPU, DEAP, and PTC. Simulation results demonstrate that MERIT reduces execution time by 68% and energy consumption by 64% for inference, and reduces execution time by 79% and energy consumption by 84% for training.
more » « less
Free, publicly-accessible full text available January 1, 2026
Extending Energy-Efficient and Scalable DNN Training and Inference with 3D Photonic Accelerator

https://doi.org/10.1109/JETCAS.2025.3591812

Curry, Juliana; Li, Yuan; Louri, Ahmed; Karanth, Avinash; Bunescu, Razvan (January 2025, IEEE Journal on Emerging and Selected Topics in Circuits and Systems)

Free, publicly-accessible full text available January 1, 2026
Training Photonic Mach Zehnder Meshes for Neural Network Acceleration

https://doi.org/10.1109/HiPC62374.2024.00020

Wolff, Andy; Karanth, Avinash (December 2024, IEEE)

Free, publicly-accessible full text available December 18, 2025
PCM Enabled Low-Power Photonic Accelerator for Inference and Training on Edge Devices

https://doi.org/10.1109/IPDPSW63119.2024.00118

Curry, Juliana; Louri, Ahmed; Karanth, Avinash; Bunescu, Razvan (July 2024, IEEE)

The convergence of edge computing and artificial intelligence requires that inference is performed on-device to provide rapid response with low latency and high accuracy without transferring large amounts of data to the cloud. However, power and size limitations make it challenging for electrical accelerators to support both inference and training for large neural network models. To this end, we propose Trident, a low-power photonic accelerator that combines the benefits of phase change material (PCM) and photonics to implement both inference and training in one unified architecture. Emerging silicon photonics has the potential to exploit the parallelism of neural network models, reduce power consumption and provide high bandwidth density via wavelength division multiplexing, making photonics an ideal candidate for on-device training and inference. As PCM is reconfigurable and non-volatile, we utilize it for two distinct purposes: (i) to maintain resonant wavelength without expensive electrical or thermal heaters, and (ii) to implement non-linear activation function, which eliminates the need to move data between memory and compute units. This multi-purpose use of PCM is shown to lead to significant reduction in energy consumption and execution time. Compared to photonic accelerators DEAP-CNN, CrossLight, and PIXEL, Trident improves energy efficiency by up to 43% and latency by up to 150% on average. Compared to electronic edge AI accelerators Google Coral which utilizes the Google Edge TPU and Bearkey TB96-AI, Trident improves energy efficiency by 11% and 93% respectively. While NVIDIA AGX Xavier is more energy efficient, the reduced data movement and GST activation of Trident reduce latency by 107% on average compared to the NVIDIA accelerator. When compared to the Google Coral and the Bearkey TB96-AI, Trident reduces latency by 1413% and 595% on average.
more » « less
Full Text Available
HSCONN: Hardware-Software Co-Optimization of Self-Attention Neural Networks for Large Language Models

https://doi.org/10.1145/3649476.3658709

Liu, Siqin; Kuve, Prakash Chand; Karanth, Avinash (June 2024, ACM)

Full Text Available
SCORCH: Neural Architecture Search and Hardware Accelerator Co-design with Reinforcement Learning

https://doi.org/10.1109/ISQED60706.2024.10528756

Liu, Siqin; Karanth, Avinash (April 2024, IEEE)

The ability to automatically generate a neural network architecture and the corresponding hardware implementation to optimize both accuracy and performance characteristics (latency, power) simultaneously for edge-based Artificial Intelligence (AI) applications is becoming prevalent. As both neural architecture search (NAS) and hardware implementation have ample design space, it is very challenging to integrate with resource-constrained edge computing hardware since the current co-search frameworks take several hundreds of GPU hours to converge. In this paper, we propose SCORCH, a novel neural architecture search and hardware accelerator co-design framework with reinforcement learning to maximize accuracy, and increase energy efficiency and throughput while converging faster. By predicting hyperparameters of neural networks together with hardware resources, we use a reinforcement-based multi-phased controller to explore neural architecture to achieve higher accuracy and hardware performance simultaneously by applying customized dataflows, voltage/frequency scaling, and tunable Network-on-Chip (NoC) hardware parameters. Our simulation results on the CIFAR-10/100 and ImageNet datasets show that SCORCH achieves identical neural network accuracy while achieving 2.6% higher accuracy, and 35.6%, 26.2%, and 65.8% reductions in latency, energy, and area compared with state-of-art co-search frameworks such as DANCE, NANDS, and NASAIC.
more » « less
Full Text Available
Versa-DNN: A Versatile Architecture Enabling High-Performance and Energy-Efficient Multi-DNN Acceleration

https://doi.org/10.1109/TPDS.2023.3340953

Yang, Jiaqi; Zheng, Hao; Louri, Ahmed (February 2024, IEEE Transactions on Parallel and Distributed Systems)

Full Text Available
A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN Acceleration

https://doi.org/10.1109/TPDS.2023.3327535

Li, Yuan; Louri, Ahmed; Karanth, Avinash (January 2024, IEEE Transactions on Parallel and Distributed Systems)

Full Text Available
Polyform: A Versatile Architecture for Multi-DNN Execution via Spatial and Temporal Acceleration

https://doi.org/10.1109/ICCD58817.2023.00033

Yin, Lingxiang; Ghazizadeh, Amir; Tian, Shilin; Louri, Ahmed; Zheng, Hao (November 2023, IEEE)

Full Text Available
ARIES: Accelerating Distributed Training in Chiplet-Based Systems via Flexible Interconnects

https://doi.org/10.1109/ICCAD57390.2023.10323955

Yin, Lingxiang; Ghazizadeh, Amir; Louri, Ahmed; Zheng, Hao (October 2023, IEEE)

Full Text Available

« Prev Next »

Search for: All records