skip to main content

Title: Dynamic Heterogeneous Voltage Regulation for Systolic Array-Based DNN Accelerators
With the growing performance and wide application of deep neural networks (DNNs), recent years have seen enormous efforts on DNN accelerator hardware design for platforms from mobile devices to data centers. The systolic array has been a popular architectural choice for many proposed DNN accelerators with hundreds to thousands of processing elements (PEs) for parallel computing. Systolic array-based DNN accelerators for datacenter applications have high power consumption and nonuniform workload distribution, which makes power delivery network (PDN) design challenging. Server-class multicore processors have benefited from distributed on-chip voltage regulation and heterogeneous voltage regulation (HVR) for improving energy efficiency while guaranteeing power delivery integrity. This paper presents the first work on HVR-based PDN architecture and control for systolic array-based DNN accelerators. We propose to employ a PDN architecture comprising heterogeneous on-chip and off-chip voltage regulators and multiple power domains. By analyzing patterns of typical DNN workloads via a modeling framework, we propose a DNN workload-aware dynamic PDN control policy to maximize system energy efficiency while ensuring power integrity. We demonstrate significant energy efficiency improvements brought by the proposed PDN architecture, dynamic control, and power gating, which lead to a more than five-fold reduction of leakage energy and PDN energy overhead for more » systolic array DNN accelerators. « less
; ; ;
Award ID(s):
2000851 1741173
Publication Date:
Journal Name:
2020 IEEE 38th International Conference on Computer Design (ICCD)
Page Range or eLocation-ID:
486 to 493
Sponsoring Org:
National Science Foundation
More Like this
  1. Spiking neural networks (SNNs) have emerged as a new generation of neural networks, presenting a brain-inspired event-driven model with advantages in spatiotemporal information processing. Due to the need for high power consumption of compute-intensive neural accelerators, adequate power delivery network (PDN) design is a key requirement to ensure power efficiency and integrity. However, PDN design for SNN accelerators has not been extensively studied despite its great potential benefit in energy efficiency. In this paper, we present the first study on dynamic heterogeneous voltage regulation (HVR) for spiking neural accelerators to maximize system energy efficiency while ensuring power integrity. We proposemore »a novel sparse-workload-aware dynamic PDN control policy, which enables high energy efficiency of sparse spiking computation on a systolic array. By exploring sparse inputs and all-or-none nature of spiking computations for PDN control, we explore different types of PDNs to accelerate spiking convolutional neural networks (S-CNNs) trained with the dynamic vision sensor (DVS) gesture dataset. Furthermore, we demonstrate various power gating schemes to further optimize the proposed PDN architecture, which leads to a more than a three-fold reduction in total energy overhead for spiking neural computations on systolic array-based accelerators.« less
  2. Spiking neural networks (SNNs) offer a promising biologically-plausible computing model and lend themselves to ultra-low-power event-driven processing on neuromorphic processors. Compared with the conventional artificial neural networks, SNNs are well-suited for processing complex spatiotemporal data. Despite its significance, dataflow optimization of spiking neural accelerator architectures has not been extensively studied. Recognizing the need for efficient processing of complex spatiotemporal data while considering the all-or-none nature of spiking activities, we propose holistic reconfigurable dataflow optimization for systolic array acceleration of spiking convolutional networks (S-CNNs). A novel scheme is introduced for parallel acceleration of computation across multiple time points, which further allowsmore »for systemic optimization of variable tiling for a large performance and efficiency gains. We show how variable tiling, in particular, the positioning of the temporal dimension, can be targeted to optimize data movement, throughput, and energy efficiency. Furthermore, we explore joint layer-dependent dataflow and accelerator hardware optimization to further boost performance and energy efficiency. To support systemic design space exploration, we develop an SNN dataflow simulator capable of analyzing the throughput and energy dissipation of systolic array accelerators for any targeted S-CNN while considering the inherent spatiotemporal characteristics of spiking neural computation. The proposed techniques deliver orders of magnitude of improvements on throughput, energy efficiency, and delay-energy product for accelerating deep Alexnet and VGG-16 SNNs.« less
  3. Deep neural networks (DNNs) have emerged as the most powerful machine learning technique in numerous artificial intelligent applications. However, the large sizes of DNNs make themselves both computation and memory intensive, thereby limiting the hardware performance of dedicated DNN accelerators. In this paper, we propose a holistic framework for energy-efficient high-performance highly-compressed DNN hardware design. First, we propose block-circulant matrix-based DNN training and inference schemes, which theoretically guarantee Big-O complexity reduction in both computational cost (from O(n2) to O(n log n)) and storage requirement (from O(n2) to O(n)) of DNNs. Second, we dedicatedly optimize the hardware architecture, especially on themore »key fast Fourier transform (FFT) module, to improve the overall performance in terms of energy efficiency, computation performance and resource cost. Third, we propose a design flow to perform hardware-software co-optimization with the purpose of achieving good balance between test accuracy and hardware performance of DNNs. Based on the proposed design flow, two block-circulant matrix-based DNNs on two different datasets are implemented and evaluated on FPGA. The fixed-point quantization and the proposed block-circulant matrix-based inference scheme enables the network to achieve as high as 3.5 TOPS computation performance and 3.69 TOPS/W energy efficiency while the memory is saved by 108X ~ 116X with negligible accuracy degradation.« less
  4. With proliferation of DNN-based applications, the confidentiality of DNN model is an important commercial goal. Spatial accelerators, which parallelize matrix/vector operations, are utilized for enhancing energy efficiency of DNN computation. Recently, model extraction attacks on simple accelerators, either with a single processing element or running a binarized network, were demonstrated using the methodology derived from differential power analysis (DPA) attack on cryptographic devices. This article investigates the vulnerability of realistic spatial accelerators using general, 8-bit, number representation. We investigate two systolic array architectures with weight-stationary dataflow: (1) a 3 × 1 array for a dot-product operation and (2) a 3more »× 3 array for matrix-vector multiplication. Both are implemented on the SAKURA-G FPGA board. We show that both architectures are ultimately vulnerable. A conventional DPA succeeds fully on the 1D array, requiring 20K power measurements. However, the 2D array exhibits higher security even with 460K traces. We show that this is because the 2D array intrinsically entails multiple MACs simultaneously dependent on the same input. However, we find that a novel template-based DPA with multiple profiling phases is able to fully break the 2D array with only 40K traces. Corresponding countermeasures need to be investigated for spatial DNN accelerators.« less
  5. With the deployment of artificial intelligent (AI) algorithms in a large variety of applications, there creates an increasing need for high-performance computing capabilities. As a result, different hardware platforms have been utilized for acceleration purposes. Among these hardware-based accelerators, the field-programmable gate arrays (FPGAs) have gained a lot of attention due to their re-programmable characteristics, which provide customized control logic and computing operators. For example, FPGAs have recently been adopted for on-demand cloud services by the leading cloud providers like Amazon and Microsoft, providing acceleration for various compute-intensive tasks. While the co-residency of multiple tenants on a cloud FPGA chipmore »increases the efficiency of resource utilization, it also creates unique attack surfaces that are under-explored. In this paper, we exploit the vulnerability associated with the shared power distribution network on cloud FPGAs. We present a stealthy power attack that can be remotely launched by a malicious tenant, shutting down the entire chip and resulting in denial-of-service for other co-located benign tenants. Specifically, we propose stealthy-shutdown: a well-timed power attack that can be implemented in two steps: (1) an attacker monitors the realtime FPGA power-consumption detected by ring-oscillator-based voltage sensors, and (2) when capturing high power-consuming moments, i.e., the power consumption by other tenants is above a certain threshold, she/he injects a well-timed power load to shut down the FPGA system. Note that in the proposed attack strategy, the power load injected by the attacker only accounts for a small portion of the overall power consumption; therefore, such attack strategy remains stealthy to the cloud FPGA operator. We successfully implement and validate the proposed attack on three FPGA evaluation kits with running real-world applications. The proposed attack results in a stealthy-shutdown, demonstrating severe security concerns of co-tenancy on cloud FPGAs. We also offer two countermeasures that can mitigate such power attacks.« less