skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Title: TrustServing: A Quality Inspection Sampling Approach for Remote DNN Services
Deep neural networks (DNNs) are being applied to various areas such as computer vision, autonomous vehicles, and healthcare, etc. However, DNNs are notorious for their high computational complexity and cannot be executed efficiently on resource constrained Internet of Things (IoT) devices. Various solutions have been proposed to handle the high computational complexity of DNNs. Offloading computing tasks of DNNs from IoT devices to cloud/edge servers is one of the most popular and promising solutions. While such remote DNN services provided by servers largely reduce computing tasks on IoT devices, it is challenging for IoT devices to inspect whether the quality of the service meets their service level objectives (SLO) or not. In this paper, we address this problem and propose a novel approach named QIS (quality inspection sampling) that can efficiently inspect the quality of the remote DNN services for IoT devices. To realize QIS, we design a new ID-generation method to generate data (IDs) that can identify the serving DNN models on edge servers. QIS inserts the IDs into the input data stream and implements sampling inspection on SLO violations. The experiment results show that the QIS approach can reliably inspect, with a nearly 100% success rate, the service qualtiy of remote DNN services when the SLA level is 99.9% or lower at the cost of only up to 0.5% overhead.  more » « less
Award ID(s):
1910844
PAR ID:
10192853
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2020 17th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)
Page Range / eLocation ID:
1 to 9
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The success of ChatGPT is reshaping the landscape of the entire IT industry. The large language model (LLM) powering ChatGPT is experiencing rapid development, marked by enhanced features, improved accuracy, and reduced latency. Due to the execution overhead of LLMs, prevailing commercial LLM products typically manage user queries on remote servers. However, the escalating volume of user queries and the growing complexity of LLMs have led to servers becoming bottlenecks, compromising the quality of service (QoS). To address this challenge, a potential solution is to shift LLM inference services to edge devices, a strategy currently being explored by industry leaders such as Apple, Google, Qualcomm, Samsung, and others. Beyond alleviating the computational strain on servers and enhancing system scalability, deploying LLMs at the edge offers additional advantages. These include real-time responses even in the absence of network connectivity and improved privacy protection for customized or personal LLMs. This article delves into the challenges and potential bottlenecks currently hindering the effective deployment of LLMs on edge devices. Through deploying the LLaMa-2 7B model with INT4 quantization on diverse edge devices and systematically analyzing experimental results, we identify insufficient memory and/or computing resources on traditional edge devices as the primary obstacles. Based on our observation and empirical analysis, we further provide insights and design guidance for the next generation of edge devices and systems from both hardware and software directions 
    more » « less
  2. The Internet of Things (IoT) requires distributed, large scale data collection via geographically distributed devices. While IoT devices typically send data to the cloud for processing, this is problematic for bandwidth constrained applications. Fog and edge computing (processing data near where it is gathered, and sending only results to the cloud) has become more popular, as it lowers network overhead and latency. Edge computing often uses devices with low computational capacity, therefore service frameworks and middleware are needed to efficiently compose services. While many frameworks use a top-down perspective, quality of service is an emergent property of the entire system and often requires a bottom up approach. We define services as multi-modal, allowing resource and performance tradeoffs. Different modes can be composed to meet an application's high level goal, which is modeled as a function. We examine a case study for counting vehicle traffic through intersections in Nashville. We apply object detection and tracking to video of the intersection, which must be performed at the edge due to privacy and bandwidth constraints. We explore the hardware and software architectures, and identify the various modes. This paper lays the foundation to formulate the online optimization problem presented by the system which makes tradeoffs between the quantity of services and their quality constrained by available resources. 
    more » « less
  3. Pervasive Edge Computing (PEC), a recent addition to the edge computing paradigm, leverages the computing resources of end-user devices to execute computation tasks in close proximity to users. One of the primary challenges in the PEC environment is determining the appropriate servers for offloading computation tasks based on factors, such as computation latency, response quality, device reliability, and cost of service. Computation outsourcing in the PEC ecosystem requires additional security and privacy considerations. Finally, mechanisms need to be in place to guarantee fair payment for the executed service(s). We present 𝑃𝐸𝑃𝑃𝐸𝑅, a novel, privacy-preserving, and decentralized framework that addresses aforementioned challenges by utilizing blockchain technology and trusted execution environments (TEE). 𝑃𝐸𝑃𝑃𝐸𝑅 improves the performance of PEC by allocating resources among end-users efficiently and securely. It also provides the underpinnings for building a financial ecosystem at the pervasive edge. To evaluate the effectiveness of 𝑃𝐸𝑃𝑃𝐸𝑅, we developed and deployed a proof of concept implementation on the Ethereum blockchain, utilizing Intel SGX as the TEE technology. We propose a simple but highly effective remote attestation method that is particularly beneficial to PEC compared to the standard remote attestation method used today. Our extensive comparison experiment shows that 𝑃𝐸𝑃𝑃𝐸𝑅 is 1.23Γ— to 2.15Γ— faster than the current standard remote attestation procedure. In addition, we formally prove the security of our system using the universal composability (UC) framework. 
    more » « less
  4. Mobile devices such as smartphones and autonomous vehicles increasingly rely on deep neural networks (DNNs) to execute complex inference tasks such as image classification and speech recognition, among others. However, continuously executing the entire DNN on mobile devices can quickly deplete their battery. Although task offloading to cloud/edge servers may decrease the mobile device’s computational burden, erratic patterns in channel quality, network, and edge server load can lead to a significant delay in task execution. Recently, approaches based on split computing (SC) have been proposed, where the DNN is split into a head and a tail model, executed respectively on the mobile device and on the edge server. Ultimately, this may reduce bandwidth usage as well as energy consumption. Another approach, called early exiting (EE), trains models to embed multiple β€œexits” earlier in the architecture, each providing increasingly higher target accuracy. Therefore, the tradeoff between accuracy and delay can be tuned according to the current conditions or application demands. In this article, we provide a comprehensive survey of the state of the art in SC and EE strategies by presenting a comparison of the most relevant approaches. We conclude the article by providing a set of compelling research challenges.

     
    more » « less
  5. null (Ed.)
    Due to the proliferation of Internet of Things (IoT) and application/user demands that challenge communication and computation, edge computing has emerged as the paradigm to bring computing resources closer to users. In this paper, we present Whispering, an analytical model for the migration of services (service offloading) from the cloud to the edge, in order to minimize the completion time of computational tasks offloaded by user devices and improve the utilization of resources. We also empirically investigate the impact of reusing the results of previously executed tasks for the execution of newly received tasks (computation reuse) and propose an adaptive task offloading scheme between edge and cloud. Our evaluation results show that Whispering achieves up to 35% and 97% (when coupled with computation reuse) lower task completion times than cases where tasks are executed exclusively at the edge or the cloud. 
    more » « less