‘‘Extreme edge”1devices, such as smart sensors, are a uniquely challenging environment for the deployment of machine learning. The tiny energy budgets of these devices lie beyond what is feasible for conventional deep neural networks, particularly in high-throughput scenarios, requiring us to rethink how we approach edge inference. In this work, we propose ULEEN, a model and FPGA-based accelerator architecture based on weightless neural networks (WNNs). WNNs eliminate energy-intensive arithmetic operations, instead using table lookups to perform computation, which makes them theoretically well-suited for edge inference. However, WNNs have historically suffered from poor accuracy and excessive memory usage. ULEEN incorporates algorithmic improvements and a novel training strategy inspired by binary neural networks (BNNs) to make significant strides in addressing these issues. We compare ULEEN against BNNs in software and hardware using the four MLPerf Tiny datasets and MNIST. Our FPGA implementations of ULEEN accomplish classification at 4.0–14.3 million inferences per second, improving area-normalized throughput by an average of 3.6× and steady-state energy efficiency by an average of 7.1× compared to the FPGA-based Xilinx FINN BNN inference platform. While ULEEN is not a universally applicable machine learning model, we demonstrate that it can be an excellent choice for certain applications in energy- and latency-critical edge environments.
more »
« less
Radio Frequency Fingerprinting on the Edge
Deep learning methods have been very successful at radio frequency fingerprinting tasks, predicting the identity of transmitting devices with high accuracy. We study radio frequency fingerprinting deployments at resource-constrained edge devices. We use structured pruning to jointly train and sparsify neural networks tailored to edge hardware implementations. We compress convolutional layers by a 27.2× factor while incurring a negligible prediction accuracy decrease (less than 1%). We demonstrate the efficacy of our approach over multiple edge hardware platforms, including a Samsung Galaxy S10 phone and a Xilinx-ZCU104 FPGA. Our method yields significant inference speedups, 11.5× on the FPGA and 3× on the smartphone, as well as high efficiency: the FPGA processing time is 17× smaller than in a V100 GPU. To the best of our knowledge, we are the first to explore the possibility of compressing networks for radio frequency fingerprinting; as such, our experiments can be seen as a means of characterizing the informational capacity associated with this specific learning task.
more »
« less
- PAR ID:
- 10293165
- Date Published:
- Journal Name:
- IEEE Transactions on Mobile Computing
- ISSN:
- 1536-1233
- Page Range / eLocation ID:
- 1 to 1
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Various hardware accelerators have been developed for energy-efficient and real-time inference of neural networks on edge devices. However, most training is done on high-performance GPUs or servers, and the huge memory and computing costs prevent training neural networks on edge devices. This paper proposes a novel tensor-based training framework, which offers orders-of-magnitude memory reduction in the training process. We propose a novel rank-adaptive tensorized neural network model, and design a hardware-friendly low-precision algorithm to train this model. We present an FPGA accelerator to demonstrate the benefits of this training method on edge devices. Our preliminary FPGA implementation achieves 59× speedup and 123× energy reduction compared to embedded CPU, and 292× memory reduction over a standard full-size training.more » « less
-
5G and open radio access networks (Open RANs) will result in vendor-neutral hardware deployment that will require additional diligence towards managing security risks. This new paradigm will allow the same network infrastructure to support virtual network slices for transmit different waveforms, such as 5G New Radio, LTE, WiFi, at different times. In this multi- vendor, multi-protocol/waveform setting, we propose an additional physical layer authentication method that detects a specific emitter through a technique called as RF fingerprinting. Our deep learning approach uses convolutional neural networks augmented with triplet loss, where examples of similar/dissimilar signal samples are shown to the classifier over the training duration. We demonstrate the feasibility of RF fingerprinting base stations over the large-scale over-the-air experimental POWDER platform in Salt Lake City, Utah, USA. Using real world datasets, we show how our approach overcomes the challenges posed by changing channel conditions and protocol choices with 99.86% detection accuracy for different training and testing days.more » « less
-
Algorithm-hardware Co-optimization for Energy-efficient Drone Detection on Resource-constrained FPGAConvolutional neural network (CNN)-based object detection has achieved very high accuracy; e.g., single-shot multi-box detectors (SSDs) can efficiently detect and localize various objects in an input image. However, they require a high amount of computation and memory storage, which makes it difficult to perform efficient inference on resource-constrained hardware devices such as drones or unmanned aerial vehicles (UAVs). Drone/UAV detection is an important task for applications including surveillance, defense, and multi-drone self-localization and formation control. In this article, we designed and co-optimized an algorithm and hardware for energy-efficient drone detection on resource-constrained FPGA devices. We trained an SSD object detection algorithm with a custom drone dataset. For inference, we employed low-precision quantization and adapted the width of the SSD CNN model. To improve throughput, we use dual-data rate operations for DSPs to effectively double the throughput with limited DSP counts. For different SSD algorithm models, we analyze accuracy or mean average precision (mAP) and evaluate the corresponding FPGA hardware utilization, DRAM communication, and throughput optimization. We evaluated the FPGA hardware for a custom drone dataset, Pascal VOC, and COCO2017. Our proposed design achieves a high mAP of 88.42% on the multi-drone dataset, with a high energy efficiency of 79 GOPS/W and throughput of 158 GOPS using the Xilinx Zynq ZU3EG FPGA device on the Open Vision Computer version 3 (OVC3) platform. Our design achieves 1.1 to 8.7× higher energy efficiency than prior works that used the same Pascal VOC dataset, using the same FPGA device, but at a low-power consumption of 2.54 W. For the COCO dataset, our MobileNet-V1 implementation achieved an mAP of 16.8, and 4.9 FPS/W for energy-efficiency, which is ∼ 1.9× higher than prior FPGA works or other commercial hardware platforms.more » « less
-
There have been many recent attempts to extend the successes of convolutional neural networks (CNNs) from 2-dimensional (2D) image classification to 3-dimensional (3D) video recognition by exploring 3D CNNs. Considering the emerging growth of mobile or Internet of Things (IoT) market, it is essential to investigate the deployment of 3D CNNs on edge devices. Previous works have implemented standard 3D CNNs (C3D) on hardware platforms, however, they have not exploited model compression for acceleration of inference. This work proposes a hardware-aware pruning approach that can fully adapt to the loop tiling technique of FPGA design and is applied onto a novel 3D network called R(2+1)D. Leveraging the powerful ADMM, the proposed pruning method achieves simultaneous high accuracy and significant acceleration of computation on FPGA. With layer-wise pruning rates up to 10× and negligible accuracy loss, the pruned model is implemented on a Xilinx ZCU102 FPGA board, where the pruned model achieves 2.6× speedup compared with the unpruned version, and 2.3× speedup and 2.3× power efficiency improvement compared with state-of-the-art FPGA implementation of C3D.more » « less