NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

An ASIC Accelerator for QNN With Variable Precision and Tunable Energy Efficiency

https://doi.org/10.1109/TCAD.2024.3357597

Wagle, Ankit; Singh, Gian; Khatri, Sunil; Vrudhula, Sarma (July 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

This article presents TULIP, a new architecture for a variable precision quantized neural network (QNN) inference. It is designed with the goal of maximizing energy efficiency per classification. TULIP is constructed by arranging a collection of unique processing elements (TULIP-PEs) in a single-instruction–multiple-data (SIMD) fashion. Each TULIP-PE contains binary neurons that are interconnected using multiplexers. Each neuron also has a small dedicated local register connected to it. The binary neurons are implemented as standard cells and used for implementing threshold functions, i.e., an inner-product and thresholding operation on its binary inputs. The neurons can be reconfigured with a single change in the control signals to implement all the standard operations used in a QNN. This article presents novel algorithms for implementing the operations of a QNN on the TULIP-PEs in the form of a schedule of threshold functions. TULIP was implemented as an ASIC in TSMC 40nm-LP technology. A QNN accelerator that employs a conventional multiply and accumulate-based arithmetic processor was also implemented in the same technology to provide a fair comparison. The results show that TULIP is 30X−50X more energy-efficient than an equivalent design, without any penalty in performance, area, or accuracy. Furthermore, TULIP achieves these improvements without using traditional techniques such as voltage scaling or approximate computing. Finally, this article also demonstrates how the run-time tradeoff between accuracy and energy efficiency is done on the TULIP architecture.
more » « less
Full Text Available
A DRAM-based Near-Memory Architecture for Accelerated and Energy-Efficient Execution of Transformers

https://doi.org/10.1145/3649476.3658732

Singh, Gian; Vrudhula, Sarma (June 2024, ACM)

Transformers-based language models have achieved remarkable accuracy in various NLP tasks, employing self-attention mecha- nisms primarily based on matrix multiplication. However, their significant size leads to data movement issues, causing latency and energy efficiency challenges in conventional Von-Neumann systems. To mitigate these issues, several in-memory and near- memory architectures have been proposed. This paper introduces PACT-3D, a near-memory architecture featuring novel computing units integrated with DRAM banks. PACT-3D significantly reduces latency by 1.7× and improves energy efficiency by 18.7× compared to state-of-the-art near-memory architectures.
more » « less
Full Text Available
PARAG: PIM Architecture for Real-Time Acceleration of GCNs

https://doi.org/10.1109/HiPC58850.2023.00016

Singh, Gian; Kuppannagari, Sanmukh R; Vrudhula, Sarma (December 2023, IEEE)

Graph Convolutional Networks (GCNs) have successfully incorporated deep learning to graph structures for social network analysis, bio-informatics, etc. The execution pattern of GCNs is a hybrid of graph processing and neural networks which poses unique and significant challenges for hardware implementation. Graph processing involves a large amount of irregular memory access with little computation whereas processing of neural networks involves a large number of operations with regular memory access. Existing graph processing and neural network accelerators are therefore inefficient for computing GCNs. This paper presents Parag, processing in memory (PIM) architecture for GCN computation. It consists of customized logic with minuscule computing units called Neural Processing Elements (NPEs) interfaced to each bank of the DRAM to support parallel graph processing and neural network computation. It utilizes the massive internal parallelism of DRAM to accelerate the GCN execution with high energy efficiency. Simulation results for inference of GCN over standard datasets show a latency and energy reduction by three orders of magnitude over a CPU implementation. When compared to a state-of-the-art PIM architecture, PARAG achieves on an average 4x reduction in latency and 4.23x reduction in the energy-delay-product (EDP).
more » « less
Full Text Available
CAMDNN: Content-Aware Mapping of a Network of Deep Neural Networks on Edge MPSoCs

https://doi.org/10.1109/TC.2022.3207137

Heidari, Soroush; Ghasemi, Mehdi; Kim, Young Geun; Wu, Carole-Jean; Vrudhula, Sarma (December 2022, IEEE Transactions on Computers)

Full Text Available
A Novel ASIC Design Flow Using Weight-Tunable Binary Neurons as Standard Cells

https://doi.org/10.1109/TCSI.2022.3164995

Wagle, Ankit; Singh, Gian; Khatri, Sunil; Vrudhula, Sarma (July 2022, IEEE Transactions on Circuits and Systems I: Regular Papers)

In this paper, we describe a design of a mixed-signal circuit for an binary neuron (a.k.a perceptron, threshold logic gate) and a methodology for automatically embedding such cells in ASICs. The binary neuron, referred to as an FTL (flash threshold logic) uses floating gate or flash transistors whose threshold voltages serve as a proxy for the weights of the neuron. Algorithms for mapping the weights to the flash transistor threshold voltages are presented. The threshold voltages are determined to maximize both the robustness of the cell and its speed. The performance, power, and area of a single FTL cell are shown to be significantly smaller (79.4%), consume less power (61.6%), and operate faster (40.3%) compared to conventional CMOS logic equivalents. Also included are the architecture and the algorithms to program the flash devices of an FTL. The FTL cells are implemented as standard cells, and are designed to allow commercial synthesis and P&R tools to automatically use them in synthesis of ASICs. Substantial reductions in area and power without sacrificing performance are demonstrated on several ASIC benchmarks by the automatic embedding of FTL cells. The paper also demonstrates how FTL cells can be used for fixing timing errors after fabrication.
more » « less
Full Text Available
EdgeWise: Energy-Efficient CNN Computation on Edge Devices under Stochastic Communication Delays

https://doi.org/10.1145/3530908

Ghasemi, Mehdi; Rakhmatov, Daler; Wu, Carole-Jean; Vrudhula, Sarma (April 2022, ACM Transactions on Embedded Computing Systems)

This paper presents a framework to enable the energy-efficient execution of convolutional neural networks (CNNs) on edge devices. The framework consists of a pair of edge devices connected via a wireless network: a performance and energy-constrained device D as the first recipient of data, and an energy-unconstrained device N as an accelerator for D. Device D decides on-the-fly how to distribute the workload with the objective of minimizing its energy consumption while accounting for the inherent uncertainty in network delay and the overheads involved in data transfer. These challenges are tackled by adopting the data-driven modeling framework of Markov Decision Processes (MDP), whereby an optimal policy is consulted by D in O(1) time to make layer-by-layer assignment decisions. As a special case, a linear-time dynamic programming algorithm is also presented for finding optimal layer assignment at once, under the assumption that the network delay is constant throughout the execution of the application. The proposed framework is demonstrated on a platform comprised of a Raspberry PI 3 as D and an NVIDIA Jetson TX2 as N. An average improvement of 31% and 23% in energy consumption is achieved compared to the alternatives of executing the CNNs entirely on D and N. Two state-of-the-art methods were also implemented, and compared with the proposed methods.
more » « less
Full Text Available
Energy-Efficient Mapping for a Network of DNN Models at the Edge

Mehdi Ghasemi; Soroush Heidari; Young Geun Kim; Aaron Lamb; Carole-Jean Wu; Sarma Vrudhula (August 2021, IEEE International Conference on Smart Computing)

This paper describes a novel framework for executing a network of trained deep neural network (DNN) models on commercial-off-the-shelf devices that are deployed in an IoT environment. The scenario consists of two devices connected by a wireless network: a user-end device (U), which is a low-end, energy and performance-limited processor, and a cloudlet (C), which is a substantially higher performance and energy-unconstrained processor. The goal is to distribute the computation of the DNN models between U and C to minimize the energy consumption of U while taking into account the variability in the wireless channel delay and the performance overhead of executing models in parallel. The proposed framework was implemented using an NVIDIA Jetson Nano for U and a Dell workstation with Titan Xp GPU as C. Experiments demonstrate significant improvements both in terms of energy consumption of U and processing delay.
more » « less
Full Text Available

Search for: All records