NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

STRIVE: Empowering a Low Power Tensor Processing Unit with Fault Detection and Error Resilience

https://doi.org/10.1145/3705003

Gundi, Noel Daniel; Roy, Sanghamitra; Chakraborty, Koushik (March 2025, ACM Transactions on Design Automation of Electronic Systems)

Rapid growth in Deep Neural Network (DNN) workloads has increased the energy footprint of the Artificial Intelligence (AI) computing realm. For optimum energy efficiency, we propose operating a DNN hardware in the Low-Power Computing (LPC) region. However, operating at LPC causes increased delay sensitivity to Process Variation (PV). Delay faults are an intriguing consequence of PV. In this article, we demonstrate the vulnerability of DNNs to delay variations, substantially lowering the prediction accuracy. To overcome delay faults, we present STRIVE—a post-fabrication fault detection and reactive error reduction technique. We also introduce a time-borrow correction technique to ensure error-free DNN computation.
more » « less
Free, publicly-accessible full text available March 31, 2026
Understanding Timing Error Characteristics from Overclocked Systolic Multiply–Accumulate Arrays in FPGAs

https://doi.org/10.3390/jlpea14010004

Chamberlin, Andrew; Gerber, Andrew; Palmer, Mason; Goodale, Tim; Gundi, Noel Daniel; Chakraborty, Koushik; Roy, Sanghamitra (March 2024, Journal of Low Power Electronics and Applications)

Artificial Intelligence (AI) hardware accelerators have seen tremendous developments in recent years due to the rapid growth of AI in multiple fields. Many such accelerators comprise a Systolic Multiply–Accumulate Array (SMA) as its computational brain. In this paper, we investigate the faulty output characterization of an SMA in a real silicon FPGA board. Experiments were run on a single Zybo Z7-20 board to control for process variation at nominal voltage and in small batches to control for temperature. The FPGA is rated up to 800 MHz in the data sheet due to the max frequency of the PLL, but the design is written using Verilog for the FPGA and C++ for the processor and synthesized with a chosen constraint of a 125 MHz clock. We then operate the system at a frequency range of 125 MHz to 450 MHz for the FPGA and the nominal 667 MHz for the processor core to produce timing errors in the FPGA without affecting the processor. Our extensive experimental platform with a hardware–software ecosystem provides a methodological pathway that reveals fascinating characteristics of SMA behavior under an overclocked environment. While one may intuitively expect that timing errors resulting from overclocked hardware may produce a wide variation in output values, our post-silicon evaluation reveals a lack of variation in erroneous output values. We found an intriguing pattern where error output values are stable for a given input across a range of operating frequencies far exceeding the rated frequency of the FPGA.
more » « less
Full Text Available
STRIVE: Enabling Choke Point Detection and Timing Error Resilience in a Low-Power Tensor Processing Unit

Gundi, Noel Daniel; Mowri, Zinnia Muntaha; Chamberlin, Andrew; Roy, Sanghamitra; Chakraborty, Koushik (July 2023, Proceedings ACM IEEE Design Automation Conference)

Rapid growth in Deep Neural Network (DNN) workloads has increased the energy footprint of the Artificial Intelligence (AI) computing realm. For optimum energy efficiency, we propose operating a DNN hardware in the Low-Power Computing (LPC) region. However, operating at LPC causes increased delay sensitivity to Process Variation (PV). Delay faults are an intriguing consequence of PV. In this paper, we demonstrate the vulnerability of DNNs to delay variations, substantially lowering the prediction accuracy. To overcome delay faults, we present STRIVE—a post-fabrication fault detection and reactive error reduction technique. We also introduce a time-borrow correction technique to ensure error-free DNN computation.
more » « less
Full Text Available
Implementing a Timing Error-Resilient and Energy-Efficient Near-Threshold Hardware Accelerator for Deep Neural Network Inference

https://doi.org/10.3390/jlpea12020032

Gundi, Noel Daniel; Pandey, Pramesh; Roy, Sanghamitra; Chakraborty, Koushik (June 2022, Journal of Low Power Electronics and Applications)

Increasing processing requirements in the Artificial Intelligence (AI) realm has led to the emergence of domain-specific architectures for Deep Neural Network (DNN) applications. Tensor Processing Unit (TPU), a DNN accelerator by Google, has emerged as a front runner outclassing its contemporaries, CPUs and GPUs, in performance by 15×–30×. TPUs have been deployed in Google data centers to cater to the performance demands. However, a TPU’s performance enhancement is accompanied by a mammoth power consumption. In the pursuit of lowering the energy utilization, this paper proposes PREDITOR—a low-power TPU operating in the Near-Threshold Computing (NTC) realm. PREDITOR uses mathematical analysis to mitigate the undetectable timing errors by boosting the voltage of the selective multiplier-and-accumulator units at specific intervals to enhance the performance of the NTC TPU, thereby ensuring a high inference accuracy at low voltage. PREDITOR offers up to 3×–5× improved performance in comparison to the leading-edge error mitigation schemes with a minor loss in accuracy.
more » « less
Full Text Available
UPTPU: Improving Energy Efficiency of a Tensor Processing Unit through Underutilization Based Power-Gating

https://doi.org/10.1109/DAC18074.2021.9586224

Pandey, Pramesh; Gundi, Noel Daniel; Chakraborty, Koushik; Roy, Sanghamitra (December 2021, 2021 58th ACM/IEEE Design Automation Conference (DAC))

The AI boom is bringing a plethora of domain-specific architectures for Neural Network computations. Google’s Tensor Processing Unit (TPU), a Deep Neural Network (DNN) accelerator, has replaced the CPUs/GPUs in its data centers, claiming more than 15X rate of inference. However, the unprecedented growth in DNN workloads with the widespread use of AI services projects an increasing energy consumption of TPU based data centers. In this work, we parametrize the extreme hardware underutilization in TPU systolic array and propose UPTPU: an intelligent, dataflow adaptive power-gating paradigm to provide a staggering 3.5X - 6.5X energy efficiency to TPU for different input batch sizes.
more » « less
Full Text Available

Search for: All records