skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Split Computing and Early Exiting for Deep Learning Applications: Survey and Research Challenges
Mobile devices such as smartphones and autonomous vehicles increasingly rely on deep neural networks (DNNs) to execute complex inference tasks such as image classification and speech recognition, among others. However, continuously executing the entire DNN on mobile devices can quickly deplete their battery. Although task offloading to cloud/edge servers may decrease the mobile device’s computational burden, erratic patterns in channel quality, network, and edge server load can lead to a significant delay in task execution. Recently, approaches based on split computing (SC) have been proposed, where the DNN is split into a head and a tail model, executed respectively on the mobile device and on the edge server. Ultimately, this may reduce bandwidth usage as well as energy consumption. Another approach, called early exiting (EE), trains models to embed multiple “exits” earlier in the architecture, each providing increasingly higher target accuracy. Therefore, the tradeoff between accuracy and delay can be tuned according to the current conditions or application demands. In this article, we provide a comprehensive survey of the state of the art in SC and EE strategies by presenting a comparison of the most relevant approaches. We conclude the article by providing a set of compelling research challenges.  more » « less
Award ID(s):
2134973
PAR ID:
10472616
Author(s) / Creator(s):
; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
ACM Computing Surveys
Volume:
55
Issue:
5
ISSN:
0360-0300
Page Range / eLocation ID:
1 to 30
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Spatial crowdsourcing (SC) enables task owners (TOs) to outsource spatial-related tasks to a SC-server who engages mobile users in collecting sensing data at some specified locations with their mobile devices. Data aggregation, as a specific SC task, has drawn much attention in mining the potential value of the massive spatial crowdsensing data. However, the release of SC tasks and the execution of data aggregation may pose considerable threats to the privacy of TOs and mobile users, respectively. Besides, it is nontrivial for the SC-server to allocate numerous tasks efficiently and accurately to qualified mobile users, as the SC-server has no knowledge about the entire geographical user distribution. To tackle these issues, in this paper, we introduce a fog-assisted SC architecture, in which many fog nodes deployed in different regions can assist the SC-server to distribute tasks and aggregate data in a privacy-aware manner. Specifically, a privacy-aware task allocation and data aggregation scheme (PTAA) is proposed leveraging bilinear pairing and homomorphic encryption. PTAA supports representative aggregate statistics (e.g.,sum, mean, variance, and minimum) with efficient data update while providing strong privacy protection. Security analysis shows that PTAA can achieve the desirable security goals. Extensive experiments also demonstrate its feasibility and efficiency. 
    more » « less
  2. Immersive applications such as Augmented Reality (AR) and Mixed Reality (MR) often need to perform multiple latency-critical tasks on every frame captured by the camera, which all require results to be available within the current frame interval. While such tasks are increasingly supported by Deep Neural Networks (DNNs) offloaded to edge servers due to their high accuracy but heavy computation, prior work has largely focused on offloading one task at a time. Compared to offloading a single task, where more frequent offloading directly translates into higher task accuracy, offloading of multiple tasks competes for shared edge server resources, and hence faces the additional challenge of balancing the offloading frequencies of different tasks to maximize the overall accuracy and hence app QoE. In this paper, we formulate this accuracy-centric multitask offloading problem, and present a framework that dynamically schedules the offloading of multiple DNN tasks from a mobile device to an edge server while optimizing the overall accuracy across tasks. Our design employs two novel ideas: (1) task-specific lightweight models that predict offloading accuracy drop as a function of offloading frequency and frame content, and (2) a general two-level control feedback loop that concurrently balances offloading among tasks and adapts between offloading and using local algorithms for each task. Evaluation results show that our framework improves the overall accuracy significantly in jointly offloading two core tasks in AR — depth estimation and odometry — by on average 7.6%–14.3% over the best baselines under different accuracy weight ratios. 
    more » « less
  3. Localization in urban environments is becoming increasingly important and used in tools such as ARCore [ 18 ], ARKit [ 34 ] and others. One popular mechanism to achieve accurate indoor localization and a map of the space is using Visual Simultaneous Localization and Mapping (Visual-SLAM). However, Visual-SLAM is known to be resource-intensive in memory and processing time. Furthermore, some of the operations grow in complexity over time, making it challenging to run on mobile devices continuously. Edge computing provides additional compute and memory resources to mobile devices to allow offloading tasks without the large latencies seen when offloading to the cloud. In this article, we present Edge-SLAM, a system that uses edge computing resources to offload parts of Visual-SLAM. We use ORB-SLAM2 [ 50 ] as a prototypical Visual-SLAM system and modify it to a split architecture between the edge and the mobile device. We keep the tracking computation on the mobile device and move the rest of the computation, i.e., local mapping and loop closing, to the edge. We describe the design choices in this effort and implement them in our prototype. Our results show that our split architecture can allow the functioning of the Visual-SLAM system long-term with limited resources without affecting the accuracy of operation. It also keeps the computation and memory cost on the mobile device constant, which would allow for the deployment of other end applications that use Visual-SLAM. We perform a detailed performance and resources use (CPU, memory, network, and power) analysis to fully understand the effect of our proposed split architecture. 
    more » « less
  4. The ever increasing size of deep neural network (DNN) models once implied that they were only limited to cloud data centers for runtime inference. Nonetheless, the recent plethora of DNN model compression techniques have successfully overcome this limit, turning into a reality that DNN-based inference can be run on numerous resource-constrained edge devices including mobile phones, drones, robots, medical devices, wearables, Internet of Things devices, among many others. Naturally, edge devices are highly heterogeneous in terms of hardware specification and usage scenarios. On the other hand, compressed DNN models are so diverse that they exhibit different tradeoffs in a multi-dimension space, and not a single model can achieve optimality in terms of all important metrics such as accuracy, latency and energy consumption. Consequently, how to automatically select a compressed DNN model for an edge device to run inference with optimal quality of experience (QoE) arises as a new challenge. The state-of-the-art approaches either choose a common model for all/most devices, which is optimal for a small fraction of edge devices at best, or apply device-specific DNN model compression, which is not scalable. In this paper, by leveraging the predictive power of machine learning and keeping end users in the loop, we envision an automated device-level DNN model selection engine for QoE-optimal edge inference. To concretize our vision, we formulate the DNN model selection problem into a contextual multi-armed bandit framework, where features of edge devices and DNN models are contexts and pre-trained DNN models are arms selected online based on the history of actions and users' QoE feedback. We develop an efficient online learning algorithm to balance exploration and exploitation. Our preliminary simulation results validate our algorithm and highlight the potential of machine learning for automating DNN model selection to achieve QoE-optimal edge inference. 
    more » « less
  5. Edge-assisted AR supports high-quality AR on resource-constrained mobile devices by offloading high-rate camera-captured frames to powerful GPU edge servers to perform heavy vision tasks. Since the result of an offloaded frame may not come back in the same frame interval, edge-assisted AR designs resort to local tracking on the last server returned result to generate more accurate result for the current frame. In such an offloading+local tracking paradigm, reducing the staleness of the last server returned result is critical to improving AR task accuracy. In this paper, we present MPCP, an online offloading scheduling framework that minimizes the staleness of server-returned result in edge-assisted AR by optimally pipelining network transfer of frames to the edge server and the Deep Neural Network inference on the edge server. MPCP is based on model predictive control (MPC). Our evaluation results show that MPCP reduces the depth estimation error by up to 10.0% compared to several baseline schemes. 
    more » « less