As we enter the Internet of Things (IoT) era, the size of mobile computing devices is largely reduced while their computing capability is dramatically improved. Meanwhile, machine learning technologies have been well developed and shown cutting edge performance in various tasks, leading to their wide adoption. As a result, moving machine learning, especially deep learning capability to the edge of the IoT is a trend happening today. But directly moving machine learning algorithms which originally run on PC platform is not feasible for IoT devices due to their relatively limited computing power. In this paper, we first reviewed several representative approaches for enabling deep learning on mobile/IoT devices. Then we evaluated the performance and impact of these methods on IoT platform equipped with integrated GPU and ARM processor. Our results show that we can enable the deep learning capability on the edge of the IoT if we apply these approaches in an efficient manner.
more »
« less
SecDeep: Secure and Performant On-device Deep Learning Inference Framework for Mobile and IoT Devices
There is an increasing emphasis on securing deep learning (DL) inference pipelines for mobile and IoT applications with privacy-sensitive data. Prior works have shown that privacy-sensitive data can be secured throughout deep learning inferences on cloud-offloaded models through trusted execution environments such as Intel SGX. However, prior solutions do not address the fundamental challenges of securing the resource-intensive inference tasks on low-power, low-memory devices (e.g., mobile and IoT devices), while achieving high performance. To tackle these challenges, we propose SecDeep, a low-power DL inference framework demonstrating that both security and performance of deep learning inference on edge devices are well within our reach. Leveraging TEEs with limited resources, SecDeep guarantees full confidentiality for input and intermediate data, as well as the integrity of the deep learning model and framework. By enabling and securing neural accelerators, SecDeep is the first of its kind to provide trusted and performant DL model inferencing on IoT and mobile devices. We implement and validate SecDeep by interfacing the ARM NN DL framework with ARM TrustZone. Our evaluation shows that we can securely run inference tasks with 16× to 172× faster performance than no acceleration approaches by leveraging edge-available accelerators.
more »
« less
- Award ID(s):
- 1705135
- PAR ID:
- 10296312
- Date Published:
- Journal Name:
- IoTDI '21: Proceedings of the International Conference on Internet-of-Things Design and Implementation
- Page Range / eLocation ID:
- 67 to 79
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Edge computing has emerged as a popular paradigm for supporting mobile and IoT applications with low latency or high bandwidth needs. The attractiveness of edge computing has been further enhanced due to the recent availability of special-purpose hardware to accelerate specific compute tasks, such as deep learning inference, on edge nodes. In this paper, we experimentally compare the benefits and limitations of using specialized edge systems, built using edge accelerators, to more traditional forms of edge and cloud computing. Our experimental study using edge-based AI workloads shows that today's edge accelerators can provide comparable, and in many cases better, performance, when normalized for power or cost, than traditional edge and cloud servers. They also provide latency and bandwidth benefits for split processing, across and within tiers, when using model compression or model splitting, but require dynamic methods to determine the optimal split across tiers. We find that edge accelerators can support varying degrees of concurrency for multi-tenant inference applications, but lack isolation mechanisms necessary for edge cloud multi-tenant hosting.more » « less
-
null (Ed.)Benefiting from the advance of Deep Learning technology, IoT devices and systems are becoming more intelligent and multi-functional. They are expected to run various Deep Learning inference tasks with high efficiency and performance. This requirement is challenged by the mismatch between the limited computing capability of edge devices and large-scale Deep Neural Networks. Edge-cloud collaborative systems are then introduced to mitigate this conflict, enabling resource-constrained IoT devices to host arbitrary Deep Learning applications. However, the introduction of third-party clouds can bring potential privacy issues to edge computing. In this paper, we conduct a systematic study about the opportunities of attacking and protecting the privacy of edge-cloud collaborative systems. Our contributions are twofold: (1) we first devise a set of new attacks for an untrusted cloud to recover arbitrary inputs fed into the system, even if the attacker has no access to the edge device’s data or computations, or permissions to query this system. (2) We empirically demonstrate that solutions that add noise fail to defeat our proposed attacks, and then propose two more effective defense methods. This provides insights and guidelines to develop more privacy-preserving collaborative systems and algorithms.more » « less
-
Edge machine learning can deliver low-latency and private artificial intelligent (AI) services for mobile devices by leveraging computation and storage resources at the network edge. This paper presents an energy-efficient edge processing framework to execute deep learning inference tasks at the edge computing nodes whose wireless connections to mobile devices are prone to channel uncertainties. Aimed at minimizing the sum of computation and transmission power consumption with probabilistic quality-of-service (QoS) constraints, we formulate a joint inference tasking and downlink beamforming problem that is characterized by a group sparse objective function. We provide a statistical learning based robust optimization approach to approximate the highly intractable probabilistic-QoS constraints by nonconvex quadratic constraints, which are further reformulated as matrix inequalities with a rank-one constraint via matrix lifting. We design a reweighted power minimization approach by iteratively reweighted ℓ1 minimization with difference-of-convex-functions (DC) regularization and updating weights, where the reweighted approach is adopted for enhancing group sparsity whereas the DC regularization is designed for inducing rank-one solutions. Numerical results demonstrate that the proposed approach outperforms other state-of-the-art approaches.more » « less
-
Deep learning (DL) continues to play a pivotal role in a wide range of intelligent systems, including autonomous machines, smart surveillance, industrial automation, and portable healthcare technologies. These applications often demand low-latency inference and efficient resource utilization, especially when deployed on embedded or edge devices with limited computational capacity. As DL models become increasingly complex, selecting the right inference framework is essential to meeting performance and deployment goals. In this work, we conduct a comprehensive comparison of five widely adopted inference frameworks: PyTorch, ONNX Runtime, TensorRT, Apache TVM, and JAX. All experiments are performed on the NVIDIA Jetson AGX Orin platform, a high-performance computing solution tailored for edge artificial intelligence workloads. The evaluation considers several key performance metrics, including inference accuracy, inference time, throughput, memory usage, and power consumption. Each framework is tested using a wide range of convolutional and transformer models and analyzed in terms of deployment complexity, runtime efficiency, and hardware utilization. Our results show that certain frameworks offer superior inference speed and throughput, while others provide advantages in flexibility, portability, or ease of integration. We also observe meaningful differences in how each framework manages system memory and power under various load conditions. This study offers practical insights into the trade-offs associated with deploying DL inference on resource-constrained hardware.more » « less
An official website of the United States government

