‘‘Extreme edge”1devices, such as smart sensors, are a uniquely challenging environment for the deployment of machine learning. The tiny energy budgets of these devices lie beyond what is feasible for conventional deep neural networks, particularly in high-throughput scenarios, requiring us to rethink how we approach edge inference. In this work, we propose ULEEN, a model and FPGA-based accelerator architecture based on weightless neural networks (WNNs). WNNs eliminate energy-intensive arithmetic operations, instead using table lookups to perform computation, which makes them theoretically well-suited for edge inference. However, WNNs have historically suffered from poor accuracy and excessive memory usage. ULEEN incorporates algorithmic improvements and a novel training strategy inspired by binary neural networks (BNNs) to make significant strides in addressing these issues. We compare ULEEN against BNNs in software and hardware using the four MLPerf Tiny datasets and MNIST. Our FPGA implementations of ULEEN accomplish classification at 4.0–14.3 million inferences per second, improving area-normalized throughput by an average of 3.6× and steady-state energy efficiency by an average of 7.1× compared to the FPGA-based Xilinx FINN BNN inference platform. While ULEEN is not a universally applicable machine learning model, we demonstrate that it can be an excellent choice for certain applications in energy- and latency-critical edge environments.
more »
« less
Demo Abstract: Online Training and Inference for On-Device Monocular Depth Estimation
A central challenge in machine learning deployment is maintaining accurate and updated models as the deployment environment changes over time. We present a hardware/software framework for simultaneous training and inference for monocular depth estimation on edge devices. Our proposed framework can be used as a hardware/software co-design tool that enables continual and online federated learning on edge devices. Our results show real-time training and inference performance, demonstrating the feasibility of online learning on edge devices.
more »
« less
- Award ID(s):
- 2428656
- PAR ID:
- 10599998
- Publisher / Repository:
- IEEE
- Date Published:
- ISBN:
- 979-8-3503-7025-6
- Page Range / eLocation ID:
- 221 to 222
- Subject(s) / Keyword(s):
- Federated Learning, Continual Learning, Hardware/Software Co-Design, Edge Devices, Internet-of-Things
- Format(s):
- Medium: X
- Location:
- Hong Kong, Hong Kong
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Various hardware accelerators have been developed for energy-efficient and real-time inference of neural networks on edge devices. However, most training is done on high-performance GPUs or servers, and the huge memory and computing costs prevent training neural networks on edge devices. This paper proposes a novel tensor-based training framework, which offers orders-of-magnitude memory reduction in the training process. We propose a novel rank-adaptive tensorized neural network model, and design a hardware-friendly low-precision algorithm to train this model. We present an FPGA accelerator to demonstrate the benefits of this training method on edge devices. Our preliminary FPGA implementation achieves 59× speedup and 123× energy reduction compared to embedded CPU, and 292× memory reduction over a standard full-size training.more » « less
-
The ever increasing size of deep neural network (DNN) models once implied that they were only limited to cloud data centers for runtime inference. Nonetheless, the recent plethora of DNN model compression techniques have successfully overcome this limit, turning into a reality that DNN-based inference can be run on numerous resource-constrained edge devices including mobile phones, drones, robots, medical devices, wearables, Internet of Things devices, among many others. Naturally, edge devices are highly heterogeneous in terms of hardware specification and usage scenarios. On the other hand, compressed DNN models are so diverse that they exhibit different tradeoffs in a multi-dimension space, and not a single model can achieve optimality in terms of all important metrics such as accuracy, latency and energy consumption. Consequently, how to automatically select a compressed DNN model for an edge device to run inference with optimal quality of experience (QoE) arises as a new challenge. The state-of-the-art approaches either choose a common model for all/most devices, which is optimal for a small fraction of edge devices at best, or apply device-specific DNN model compression, which is not scalable. In this paper, by leveraging the predictive power of machine learning and keeping end users in the loop, we envision an automated device-level DNN model selection engine for QoE-optimal edge inference. To concretize our vision, we formulate the DNN model selection problem into a contextual multi-armed bandit framework, where features of edge devices and DNN models are contexts and pre-trained DNN models are arms selected online based on the history of actions and users' QoE feedback. We develop an efficient online learning algorithm to balance exploration and exploitation. Our preliminary simulation results validate our algorithm and highlight the potential of machine learning for automating DNN model selection to achieve QoE-optimal edge inference.more » « less
-
null (Ed.)AI applications powered by deep learning inference are increasingly run natively on edge devices to provide better interactive user experience. This often necessitates fitting a model originally designed and trained in the cloud to edge devices with a range of hardware capabilities, which so far has relied on time-consuming manual effort. In this paper, we quantify the challenges of manually generating a large number of compressed models and then build a system framework, Mistify, to automatically port a cloud-based model to a suite of models for edge devices targeting various points in the design space. Mistify adds an intermediate “layer” that decouples the model design and deployment phases. By exposing configuration APIs to obviate the need for code changes deeply embedded into the original model, Mistify hides run-time issues from model designers and hides the model internals from model users, hence reducing the expertise needed in either. For better scalability, Mistify consolidates multiple model tailoring requests to minimize repeated computation. Further, Mistify leverages locally available edge data in a privacy-aware manner, and performs run-time model adaptation to provide scalable edge support and accurate inference results. Extensive evaluation shows that Mistify reduces the DNN porting time needed by over 10x to cater to a wide spectrum of edge deployment scenarios, incurring orders of magnitude less manual effort.more » « less
-
Not AvailablDeploying monocular depth estimation on resource-constrained edge devices is a significant challenge, particularly when attempting to perform both training and inference concurrently. Current lightweight, self-supervised approaches typically rely on complex frameworks that are hard to implement and deploy in real-world settings. To address this gap, we introduce the first framework for Lightweight Training and Inference (LITI) that combines ready-to-deploy models with streamlined code and fully functional, parallel training and inference pipelines. Our experiments show various models being deployed for inference, training, or both inference and training, leveraging inputs from a real-time RGB camera sensor. Thus, our framework enables training and inference on resource-constrained edge devices for complex applications such as depth estimation.more » « less
An official website of the United States government

