Immersive applications such as Augmented Reality (AR) and Mixed Reality (MR) often need to perform multiple latency-critical tasks on every frame captured by the camera, which all require results to be available within the current frame interval. While such tasks are increasingly supported by Deep Neural Networks (DNNs) offloaded to edge servers due to their high accuracy but heavy computation, prior work has largely focused on offloading one task at a time. Compared to offloading a single task, where more frequent offloading directly translates into higher task accuracy, offloading of multiple tasks competes for shared edge server resources, and hence faces the additional challenge of balancing the offloading frequencies of different tasks to maximize the overall accuracy and hence app QoE. In this paper, we formulate this accuracy-centric multitask offloading problem, and present a framework that dynamically schedules the offloading of multiple DNN tasks from a mobile device to an edge server while optimizing the overall accuracy across tasks. Our design employs two novel ideas: (1) task-specific lightweight models that predict offloading accuracy drop as a function of offloading frequency and frame content, and (2) a general two-level control feedback loop that concurrently balances offloading among tasks and adapts between offloading and using local algorithms for each task. Evaluation results show that our framework improves the overall accuracy significantly in jointly offloading two core tasks in AR — depth estimation and odometry — by on average 7.6%–14.3% over the best baselines under different accuracy weight ratios.
more »
« less
FoggyCache: Cross-Device Approximate Computation Reuse
Mobile and IoT scenarios increasingly involve interactive and computation intensive contextual recognition. Existing optimizations typically resort to computation offloading or simplified on-device processing. Instead, we observe that the same application is often invoked on multiple devices in close proximity. Moreover, the application instances often process similar contextual data that map to the same outcome. In this paper, we propose cross-device approximate computation reuse, which minimizes redundant computation by harnessing the “equivalence” between different input values and reusing previously computed outputs with high confidence. We devise adaptive locality sensitive hashing (A-LSH) and homogenized k nearest neighbors (H-kNN). The former achieves scalable and constant lookup, while the latter provides high-quality reuse and tunable accuracy guarantee. We further incorporate approximate reuse as a service, called FoggyCache, in the computation offloading runtime. Extensive evaluation shows that, when given 95% accuracy target, FoggyCache consistently harnesses over 90% of reuse opportunities, which translates to reduced computation latency and energy consumption by a factor of 3 to 10.
more »
« less
- Award ID(s):
- 1815115
- PAR ID:
- 10122201
- Date Published:
- Journal Name:
- Mobicom'18
- Volume:
- 2018
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Due to the proliferation of Internet of Things (IoT) and application/user demands that challenge communication and computation, edge computing has emerged as the paradigm to bring computing resources closer to users. In this paper, we present Whispering, an analytical model for the migration of services (service offloading) from the cloud to the edge, in order to minimize the completion time of computational tasks offloaded by user devices and improve the utilization of resources. We also empirically investigate the impact of reusing the results of previously executed tasks for the execution of newly received tasks (computation reuse) and propose an adaptive task offloading scheme between edge and cloud. Our evaluation results show that Whispering achieves up to 35% and 97% (when coupled with computation reuse) lower task completion times than cases where tasks are executed exclusively at the edge or the cloud.more » « less
-
The wide adoption of the emerging SmartNIC technology creates new opportunities to offload application-level computation into the networking layer, which frees the burden of host CPUs, leading to performance improvement. Shuffle, the all-to-all data exchange process, is a critical building block for network communication in distributed data-intensive applications and can potentially benefit from SmartNICs. In this paper, we develop SmartShuffle, which accelerates the data-intensive application's shuffle process by offloading various computation tasks into the SmartNIC devices. SmartShuffle supports offloading both low-level network functions, including data partitioning and network transport, and high-level computation tasks, including filtering, aggregation, and sorting. SmartShuffle adopts a coordinated offload architecture to make sender-side and receiver-side SmartNICs jointly contribute to the benefits of shuffle computation offload. SmartShuffle carefully manages the tight and time-varying computation and memory constraints on the device. We propose a liquid offloading approach, which dynamically migrates operators between the host CPU and the SmartNIC at runtime such that resources in both devices are fully utilized. We prototype SmartShuffle on the Stingray SoC SmartNICs and plug it into Spark. Our evaluation shows that SmartShuffle improves host CPU efficiency and I/O efficiency with lower job completion time. SmartShuffle outperforms Spark, and Spark RDMA by up to 40% on TPC-H.more » « less
-
Abstract: Task offloading, which refers to processing (computation-intensive) data at facilitating servers, is an exemplary service that greatly benefits from the fog computing paradigm, which brings computation resources to the edge network for reduced application latency. However, the resource-consuming nature of task execution, as well as the sheer scale of IoT systems, raises an open and challenging question: whether fog is a remedy or a resource drain, considering frequent and massive offloading operations? This question is nontrivial, because participants of offloading processes, i.e., fog nodes, may have diversified technical specifications, while task generators, i.e., task nodes, may employ a variety of criteria to select offloading targets, resulting in an unmanageable space for performance evaluation. To overcome these challenges of heterogeneity, we propose a gravity model that characterizes offloading criteria with various gravity functions, in which individual/system resource consumption can be examined by the device/network effort metrics, respectively. Simulation results show that the proposed gravity model can flexibly describe different offloading schemes in terms of application and node-level behavior. We find that the expected lifetime and device effort of individual tasks decrease as O(1/N) over the network size N , while the network effort decreases much slower, even remain O(1) when load balancing measures are employed, indicating a possible resource drain in the edge network.more » « less
-
In edge computing use cases (e.g., smart cities), where several users and devices may be in close proximity to each other, computational tasks with similar input data for the same services (e.g., image or video annotation) may be offloaded to the edge. The execution of such tasks often yields the same results (output) and thus duplicate (redundant) computation. Based on this observation, prior work has advocated for "computation reuse", a paradigm where the results of previously executed tasks are stored at the edge and are reused to satisfy incoming tasks with similar input data, instead of executing these incoming tasks from scratch. However, realizing computation reuse in practical edge computing deployments, where services may be offered by multiple (distributed) edge nodes (servers) for scalability and fault tolerance, is still largely unexplored. To tackle this challenge, in this paper, we present Reservoir, a framework to enable pervasive computation reuse at the edge, while imposing marginal overheads on user devices and the operation of the edge network infrastructure. Reservoir takes advantage of Locality Sensitive Hashing (LSH) and runs on top of Named-Data Networking (NDN), extending the NDN architecture for the realization of the computation reuse semantics in the network. Our evaluation demonstrated that Reservoir can reuse computation with up to an almost perfect accuracy, achieving 4.25-21.34x lower task completion times compared to cases without computation reuse.more » « less
An official website of the United States government

