Title: Protean: An Energy-Efficient and Heterogeneous Platform for Adaptive and Hardware-Accelerated Battery-Free Computing
Battery-free and intermittently powered devices offer long lifetimes and enable deployment in new applications and environments. Unfortunately, developing sophisticated inference-capable applications is still challenging due to the lack of platform support for more advanced (32-bit) microprocessors and specialized accelerators---which can execute data-intensive machine learning tasks, but add complexity across the stack when dealing with intermittent power. We present Protean to bridge the platform gap for inference-capable battery-free sensors. Designed for runtime scalability, meeting the dynamic range of energy harvesters with matching heterogeneous processing elements like neural network accelerators. We develop a modular "plug-and-play" hardware platform, SuperSensor, with a reconfigurable energy storage circuit that powers a 32-bit ARM-based microcontroller with a convolutional neural network accelerator. An adaptive task-based runtime system, Chameleon, provides intermittency-proof execution of machine learning tasks across heterogeneous processing elements. The runtime automatically scales and dispatches these tasks based on incoming energy, current state, and programmer annotations. A code generator, Metamorph, automates conversion of ML models to intermittent safe execution across heterogeneous compute elements. We evaluate Protean with audio and image workloads and demonstrate up to 666x improvement in inference energy efficiency by enabling usage of modern computational elements within intermittent computing. Further, Protean provides up to 166% higher throughput compared to non-adaptive baselines. more »« less
Bakar, Abu; Goel, Rishabh; de Winkel, Jasper; Huang, Jason; Ahmed, Saad; Islam, Bashima; Pawelczak, Przemyslaw; Yildirim, Kasim Sinan; Hester, Josiah
(, GetMobile: Mobile Computing and Communications)
Ko, Steve
(Ed.)
Today's smart devices have short battery lifetimes, high installation and maintenance costs, and rapid obsolescence - all leading to the explosion of electronic waste in the past two decades. These problems will worsen as the number of connected devices grows to one trillion by 2035. Energy harvesting, battery-free devices offer an alternative. Getting rid of the battery reduces e-waste, promises long lifetimes, and enables deployment in new applications and environments. Unfortunately, developing sophisticated inference-capable applications is still challenging. The lack of platform support for advanced (32-bit) microprocessors and specialized accelerators, which can execute dataintensive machine-learning tasks, has held back batteryless devices.
Basaklar, Toygun; Goksoy, A Alper; Krishnakumar, Anish; Gumussoy, Suat; Ogras, Umit Y
(, ACM Transactions on Embedded Computing Systems)
Domain-specific systems-on-chip (DSSoCs) combine general-purpose processors and specialized hardware accelerators to improve performance and energy efficiency for a specific domain. The optimal allocation of tasks to processing elements (PEs) with minimal runtime overheads is crucial to achieving this potential. However, this problem remains challenging as prior approaches suffer from non-optimal scheduling decisions or significant runtime overheads. Moreover, existing techniques focus on a single optimization objective, such as maximizing performance. This work proposes DTRL, a decision-tree-based multi-objective reinforcement learning technique for runtime task scheduling in DSSoCs. DTRL trains a single global differentiable decision tree (DDT) policy that covers the entire objective space quantified by a preference vector. Our extensive experimental evaluations using our novel reinforcement learning environment demonstrate that DTRL captures the trade-off between execution time and power consumption, thereby generating a Pareto set of solutions using a single policy. Furthermore, comparison with state-of-the-art heuristic–, optimization–, and machine learning-based schedulers shows that DTRL achieves up to 9× higher performance and up to 3.08× reduction in energy consumption. The trained DDT policy achieves 120 ns inference latency on Xilinx Zynq ZCU102 FPGA at 1.2 GHz, resulting in negligible runtime overheads. Evaluation on the same hardware shows that DTRL achieves up to 16% higher performance than a state-of-the-art heuristic scheduler.
Bakar, Abu; Ross, Alexander G.; Yildirim, Kasim Sinan; Hester, Josiah
(, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies)
Battery-free sensing devices harvest energy from their surrounding environment to perform sensing, computation, and communication. This enables previously impossible applications in the Internet-of-Things. A core challenge for these devices is maintaining usefulness despite erratic, random or irregular energy availability; which causes inconsistent execution, loss of service and power failures. Adapting execution (degrading or upgrading) seems promising as a way to stave off power failures, meet deadlines, or increase throughput. However, because of constrained resources and limited local information, it is a challenge to decide when would be the best time to adapt, and how exactly to adapt execution. In this paper, we systematically explore the fundamental mechanisms of energy-aware adaptation, and propose heuristic adaptation as a method for modulating the performance of tasks to enable higher sensor coverage, completion rates, or throughput, depending on the application. We build a task based adaptive runtime system for intermittently powered sensors embodying this concept. We complement this runtime with a user facing simulator that enables programmers to conceptualize the tradeoffs they make when choosing what tasks to adapt, and how, relative to real world energy harvesting environment traces. While we target battery-free, intermittently powered sensors, we see general application to all energy harvesting devices. We explore heuristic adaptation with varied energy harvesting modalities and diverse applications: machine learning, activity recognition, and greenhouse monitoring, and find that the adaptive version of our ML app performs up to 46% more classifications with only a 5% drop in accuracy; the activity recognition app captures 76% more classifications with only nominal down-sampling; and find that heuristic adaptation leads to higher throughput versus non-adaptive in all cases.
As the machine learning and systems communities strive to achieve higher energy-efficiency through custom deep neural network (DNN) accelerators, varied precision or quantization levels, and model compression techniques, there is a need for design space exploration frameworks that incorporate quantization-aware processing elements into the accelerator design space while having accurate and fast power, performance, and area models. In this work, we present QUIDAM , a highly parameterized quantization-aware DNN accelerator and model co-exploration framework. Our framework can facilitate future research on design space exploration of DNN accelerators for various design choices such as bit precision, processing element type, scratchpad sizes of processing elements, global buffer size, number of total processing elements, and DNN configurations. Our results show that different bit precisions and processing element types lead to significant differences in terms of performance per area and energy. Specifically, our framework identifies a wide range of design points where performance per area and energy varies more than 5 × and 35 ×, respectively. With the proposed framework, we show that lightweight processing elements achieve on par accuracy results and up to 5.7 × more performance per area and energy improvement when compared to the best INT16 based implementation. Finally, due to the efficiency of the pre-characterized power, performance, and area models, QUIDAM can speed up the design exploration process by 3-4 orders of magnitude as it removes the need for expensive synthesis and characterization of each design.
Domain specific neural network accelerators have garnered attention because of their improved energy efficiency and inference performance compared to CPUs and GPUs. Such accelerators are thus well suited for resource-constrained embedded systems. However, mapping sophisticated neural network models on these accelerators still entails significant energy and memory consumption, along with high inference time overhead. Binarized neural networks (BNNs), which utilize single-bit weights, represent an efficient way to implement and deploy neural network models on accelerators. In this paper, we present a novel optical-domain BNN accelerator, named ROBIN , which intelligently integrates heterogeneous microring resonator optical devices with complementary capabilities to efficiently implement the key functionalities in BNNs. We perform detailed fabrication-process variation analyses at the optical device level, explore efficient corrective tuning for these devices, and integrate circuit-level optimization to counter thermal variations. As a result, our proposed ROBIN architecture possesses the desirable traits of being robust, energy-efficient, low latency, and high throughput, when executing BNN models. Our analysis shows that ROBIN can outperform the best-known optical BNN accelerators and many electronic accelerators. Specifically, our energy-efficient ROBIN design exhibits energy-per-bit values that are ∼4 × lower than electronic BNN accelerators and ∼933 × lower than a recently proposed photonic BNN accelerator, while a performance-efficient ROBIN design shows ∼3 × and ∼25 × better performance than electronic and photonic BNN accelerators, respectively.
Bakar, Abu, Goel, Rishabh, de Winkel, Jasper, Huang, Jason, Ahmed, Saad, Islam, Bashima, Pawełczak, Przemysław, Yıldırım, Kasım Sinan, and Hester, Josiah. Protean: An Energy-Efficient and Heterogeneous Platform for Adaptive and Hardware-Accelerated Battery-Free Computing. Retrieved from https://par.nsf.gov/biblio/10398838. Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems (SenSys'22) . Web. doi:10.1145/3560905.3568561.
Bakar, Abu, Goel, Rishabh, de Winkel, Jasper, Huang, Jason, Ahmed, Saad, Islam, Bashima, Pawełczak, Przemysław, Yıldırım, Kasım Sinan, & Hester, Josiah. Protean: An Energy-Efficient and Heterogeneous Platform for Adaptive and Hardware-Accelerated Battery-Free Computing. Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems (SenSys'22), (). Retrieved from https://par.nsf.gov/biblio/10398838. https://doi.org/10.1145/3560905.3568561
Bakar, Abu, Goel, Rishabh, de Winkel, Jasper, Huang, Jason, Ahmed, Saad, Islam, Bashima, Pawełczak, Przemysław, Yıldırım, Kasım Sinan, and Hester, Josiah.
"Protean: An Energy-Efficient and Heterogeneous Platform for Adaptive and Hardware-Accelerated Battery-Free Computing". Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems (SenSys'22) (). Country unknown/Code not available. https://doi.org/10.1145/3560905.3568561.https://par.nsf.gov/biblio/10398838.
@article{osti_10398838,
place = {Country unknown/Code not available},
title = {Protean: An Energy-Efficient and Heterogeneous Platform for Adaptive and Hardware-Accelerated Battery-Free Computing},
url = {https://par.nsf.gov/biblio/10398838},
DOI = {10.1145/3560905.3568561},
abstractNote = {Battery-free and intermittently powered devices offer long lifetimes and enable deployment in new applications and environments. Unfortunately, developing sophisticated inference-capable applications is still challenging due to the lack of platform support for more advanced (32-bit) microprocessors and specialized accelerators---which can execute data-intensive machine learning tasks, but add complexity across the stack when dealing with intermittent power. We present Protean to bridge the platform gap for inference-capable battery-free sensors. Designed for runtime scalability, meeting the dynamic range of energy harvesters with matching heterogeneous processing elements like neural network accelerators. We develop a modular "plug-and-play" hardware platform, SuperSensor, with a reconfigurable energy storage circuit that powers a 32-bit ARM-based microcontroller with a convolutional neural network accelerator. An adaptive task-based runtime system, Chameleon, provides intermittency-proof execution of machine learning tasks across heterogeneous processing elements. The runtime automatically scales and dispatches these tasks based on incoming energy, current state, and programmer annotations. A code generator, Metamorph, automates conversion of ML models to intermittent safe execution across heterogeneous compute elements. We evaluate Protean with audio and image workloads and demonstrate up to 666x improvement in inference energy efficiency by enabling usage of modern computational elements within intermittent computing. Further, Protean provides up to 166% higher throughput compared to non-adaptive baselines.},
journal = {Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems (SenSys'22)},
author = {Bakar, Abu and Goel, Rishabh and de Winkel, Jasper and Huang, Jason and Ahmed, Saad and Islam, Bashima and Pawełczak, Przemysław and Yıldırım, Kasım Sinan and Hester, Josiah},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.