skip to main content


Title: MaxTracker: Continuously Tracking the Maximum Computation Progress for Energy Harvesting ReRAM-based CNN Accelerators
There is an ongoing trend to increasingly offload inference tasks, such as CNNs, to edge devices in many IoT scenarios. As energy harvesting is an attractive IoT power source, recent ReRAM-based CNN accelerators have been designed for operation on harvested energy. When addressing the instability problems of harvested energy, prior optimization techniques often assume that the load is fixed, overlooking the close interactions among input power, computational load, and circuit efficiency, or adapt the dynamic load to match the just-in-time incoming power under a simple harvesting architecture with no intermediate energy storage. Targeting a more efficient harvesting architecture equipped with both energy storage and energy delivery modules, this paper is the first effort to target whole system, end-to-end efficiency for an energy harvesting ReRAM-based accelerator. First, we model the relationships among ReRAM load power, DC-DC converter efficiency, and power failure overhead. Then, a maximum computation progress tracking scheme ( MaxTracker ) is proposed to achieve a joint optimization of the whole system by tuning the load power of the ReRAM-based accelerator. Specifically, MaxTracker accommodates both continuous and intermittent computing schemes and provides dynamic ReRAM load according to harvesting scenarios. We evaluate MaxTracker over four input power scenarios, and the experimental results show average speedups of 38.4%/40.3% (up to 51.3%/84.4%), over a full activation scheme (with energy storage) and order-of-magnitude speedups over the recently proposed (energy storage-less) ResiRCA technique. Furthermore, we also explore MaxTracker in combination with the Capybara reconfigurable capacitor approach to offer more flexible tuners and thus further boost the system performance.  more » « less
Award ID(s):
1822923
NSF-PAR ID:
10296301
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
ACM Transactions on Embedded Computing Systems
Volume:
20
Issue:
5s
ISSN:
1539-9087
Page Range / eLocation ID:
1 to 23
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Many recent works have shown substantial efficiency boosts from performing inference tasks on Internet of Things (IoT) nodes rather than merely transmitting raw sensor data. However, such tasks, e.g., convolutional neural networks (CNNs), are very compute intensive. They are therefore challenging to complete at sensing-matched latencies in ultra-low-power and energy-harvesting IoT nodes. ReRAM crossbar-based accelerators (RCAs) are an ideal candidate to perform the dominant multiplication-and-accumulation (MAC) operations in CNNs efficiently, but conventional, performance-oriented RCAs, while energy-efficient, are power hungry and ill-optimized for the intermittent and unstable power supply of energy-harvesting IoT nodes. This paper presents the ResiRCA architecture that integrates a new, lightweight, and configurable RCA suitable for energy harvesting environments as an opportunistically executing augmentation to a baseline sense-and-transmit battery-powered IoT node. To maximize ResiRCA throughput under different power levels, we develop the ResiSchedule approach for dynamic RCA reconfiguration. The proposed approach uses loop tiling-based computation decomposition, model duplication within the RCA, and inter-layer pipelining to reduce RCA activation thresholds and more closely track execution costs with dynamic power income. Experimental results show that ResiRCA together with ResiSchedule achieve average speedups and energy efficiency improvements of 8× and 14× respectively compared to a baseline RCA with intermittency-unaware scheduling. 
    more » « less
  2. null (Ed.)
    This paper presents the integration of an AC-DC rectifier and a DC-DC boost converter circuit designed in 180 nm CMOS process for ultra-low frequency (<; 10 Hz) energy harvesting applications. The proposed rectifier is a very low voltage CMOS rectifier circuit that rectifies the low-frequency signal of 100-250 mV amplitude and 1-10 Hz frequency into DC voltage. In this work, the energy is harvested from the REWOD (reverse electrowetting-on-dielectric) generator, which is a reverse electrowetting technique that converts mechanical vibrations to electrical energy. The objective is to develop a REWOD-based self-powered motion (such as walking, running, jogging, etc.) tracking sensors that can be worn, thus harvesting energy from regular activities. To this end, the proposed circuits are designed in such a way that the output from the REWOD is rectified and regulated using a DC-DC converter which is a 5-stage cross-coupled switching circuit. Simulation results show a voltage range of 1.1 V-2.1 V, i.e., 850-1200% voltage conversion efficiency (VCE) and 30% power conversion efficiency (PCE) for low input signal in the range 100-250 mV in the low-frequency range. This performance verifies the integration of the rectifier and DC-DC boost converter which makes it highly suitable for various motion-based energy harvesting applications. 
    more » « less
  3. Deconvolution is a key component in contemporary neural networks, especially generative adversarial networks (GANs) and fully convolutional networks (FCNs). Due to extra operations of deconvolution compared to convolution, considerable degradation of performance as well as energy efficiency is incurred when implementing deconvolution on the existing resistive random access memory (ReRAM)-based processing-in-memory (PIM) accelerators. In this work, we propose a ReRAM-based accelerator design, RED, for providing high-performance and low-energy deconvolution. We analyze the deconvolution execution on the existing ReRAM-based PIMs and utilize its interior computation pattern for design optimization. RED includes two major contributions: pixel-wise mapping scheme and zero-skipping data flow. Pixel-wise mapping scheme removes the zero insertion and performs convolutions over several ReRAM arrays and thus enables parallel computations with non-zero inputs. Zero-skipping data flow, assisted with customized input buffers design, enhances the computation parallelism and input data reuse. In evaluation, we compare RED against the existing ReRAM-based PIMs and CMOS-based counterpart with a variety of GAN and FCN models, each of which contains multiple deconvolution layers. The experimental results show that RED achieves a 4.0×-56.16× speedup and a 1.05×-18.17× energy efficiency improvement over previous related accelerator designs. 
    more » « less
  4. Power electronic inverters for photovoltaic (PV) systems over the years have trended towards high efficiency and power density. However, reliability improvements of inverters have received less attention. Inverters are one of the lifetime-limiting elements in most PV systems. Their failures increase system operation and maintenance costs, contributing to an increased lifetime energy cost of the PV system. Opportunities exist to increase inverter reliability through design for reliability techniques and the use of new modular topologies, semiconductor devices, and energy buffering schemes. This paper presents the implementation and design for reliability for a GaN-based single-phase residential string inverter using a new topological and control scheme that allows dynamic hardware allocation (DHA). In the proposed inverter architecture, a range of identical modules and control schemes are used to dispatch hardware resources within the inverter to variably deliver power to the load or filter the second harmonic current on the DC side. This new approach more than triples the lifetime of GaN-based inverters, reducing system repair/replacement costs, and increasing the PV system lifetime energy production. 
    more » « less
  5. The ending of Moore’s Law makes domain-specific architecture as the future of computing. The most representative is the emergence of various deep learning accelerators. Among the proposed solutions, resistive random access memory (ReRAM) based process-inmemory (PIM) architecture is anticipated as a promising candidate because ReRAM has the capability of both data storage and in-situ computation. However, we found that existing solutions are unable to efficiently support the computational needs required by the training of unsupervised generative adversarial networks (GANs), due to the lack of the following two features: 1) Computation efficiency: GAN utilizes a new operator, called transposed convolution. It inserts massive zeros in its input before a convolution operation, resulting in significant resource under-utilization; 2) Data traffic: The data intensive training process of GANs often incurs structural heavy data traffic as well as frequent massive data swaps. Our research follows the PIM strategy by leveraging the energy-efficiency of ReRAM arrays for vector-matrix multiplication to enhance the performance and energy efficiency. Specifically, we propose a novel computation deformation technique that can skip zero-insertions in transposed convolution for computation efficiency improvement. Moreover, we explore an efficient pipelined training procedure to reduce on-chip memory access. The implementation of related circuits and architecture is also discussed. At the end, we present our perspective on the future trend and opportunities of deep learning accelerators. 
    more » « less