Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Rapid growth in Deep Neural Network (DNN) workloads has increased the energy footprint of the Artificial Intelligence (AI) computing realm. For optimum energy efficiency, we propose operating a DNN hardware in the Low-Power Computing (LPC) region. However, operating at LPC causes increased delay sensitivity to Process Variation (PV). Delay faults are an intriguing consequence of PV. In this article, we demonstrate the vulnerability of DNNs to delay variations, substantially lowering the prediction accuracy. To overcome delay faults, we present STRIVE—a post-fabrication fault detection and reactive error reduction technique. We also introduce a time-borrow correction technique to ensure error-free DNN computation.more » « lessFree, publicly-accessible full text available March 31, 2026
-
Artificial Intelligence (AI) hardware accelerators have seen tremendous developments in recent years due to the rapid growth of AI in multiple fields. Many such accelerators comprise a Systolic Multiply–Accumulate Array (SMA) as its computational brain. In this paper, we investigate the faulty output characterization of an SMA in a real silicon FPGA board. Experiments were run on a single Zybo Z7-20 board to control for process variation at nominal voltage and in small batches to control for temperature. The FPGA is rated up to 800 MHz in the data sheet due to the max frequency of the PLL, but the design is written using Verilog for the FPGA and C++ for the processor and synthesized with a chosen constraint of a 125 MHz clock. We then operate the system at a frequency range of 125 MHz to 450 MHz for the FPGA and the nominal 667 MHz for the processor core to produce timing errors in the FPGA without affecting the processor. Our extensive experimental platform with a hardware–software ecosystem provides a methodological pathway that reveals fascinating characteristics of SMA behavior under an overclocked environment. While one may intuitively expect that timing errors resulting from overclocked hardware may produce a wide variation in output values, our post-silicon evaluation reveals a lack of variation in erroneous output values. We found an intriguing pattern where error output values are stable for a given input across a range of operating frequencies far exceeding the rated frequency of the FPGA.more » « less
-
Rapid growth in Deep Neural Network (DNN) workloads has increased the energy footprint of the Artificial Intelligence (AI) computing realm. For optimum energy efficiency, we propose operating a DNN hardware in the Low-Power Computing (LPC) region. However, operating at LPC causes increased delay sensitivity to Process Variation (PV). Delay faults are an intriguing consequence of PV. In this paper, we demonstrate the vulnerability of DNNs to delay variations, substantially lowering the prediction accuracy. To overcome delay faults, we present STRIVE—a post-fabrication fault detection and reactive error reduction technique. We also introduce a time-borrow correction technique to ensure error-free DNN computation.more » « less
-
Increasing processing requirements in the Artificial Intelligence (AI) realm has led to the emergence of domain-specific architectures for Deep Neural Network (DNN) applications. Tensor Processing Unit (TPU), a DNN accelerator by Google, has emerged as a front runner outclassing its contemporaries, CPUs and GPUs, in performance by 15×–30×. TPUs have been deployed in Google data centers to cater to the performance demands. However, a TPU’s performance enhancement is accompanied by a mammoth power consumption. In the pursuit of lowering the energy utilization, this paper proposes PREDITOR—a low-power TPU operating in the Near-Threshold Computing (NTC) realm. PREDITOR uses mathematical analysis to mitigate the undetectable timing errors by boosting the voltage of the selective multiplier-and-accumulator units at specific intervals to enhance the performance of the NTC TPU, thereby ensuring a high inference accuracy at low voltage. PREDITOR offers up to 3×–5× improved performance in comparison to the leading-edge error mitigation schemes with a minor loss in accuracy.more » « less
-
The AI boom is bringing a plethora of domain-specific architectures for Neural Network computations. Google’s Tensor Processing Unit (TPU), a Deep Neural Network (DNN) accelerator, has replaced the CPUs/GPUs in its data centers, claiming more than 15X rate of inference. However, the unprecedented growth in DNN workloads with the widespread use of AI services projects an increasing energy consumption of TPU based data centers. In this work, we parametrize the extreme hardware underutilization in TPU systolic array and propose UPTPU: an intelligent, dataflow adaptive power-gating paradigm to provide a staggering 3.5X - 6.5X energy efficiency to TPU for different input batch sizes.more » « less
An official website of the United States government

Full Text Available