- Award ID(s):
- 1910844
- PAR ID:
- 10192853
- Date Published:
- Journal Name:
- 2020 17th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)
- Page Range / eLocation ID:
- 1 to 9
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
The success of ChatGPT is reshaping the landscape of the entire IT industry. The large language model (LLM) powering ChatGPT is experiencing rapid development, marked by enhanced features, improved accuracy, and reduced latency. Due to the execution overhead of LLMs, prevailing commercial LLM products typically manage user queries on remote servers. However, the escalating volume of user queries and the growing complexity of LLMs have led to servers becoming bottlenecks, compromising the quality of service (QoS). To address this challenge, a potential solution is to shift LLM inference services to edge devices, a strategy currently being explored by industry leaders such as Apple, Google, Qualcomm, Samsung, and others. Beyond alleviating the computational strain on servers and enhancing system scalability, deploying LLMs at the edge offers additional advantages. These include real-time responses even in the absence of network connectivity and improved privacy protection for customized or personal LLMs. This article delves into the challenges and potential bottlenecks currently hindering the effective deployment of LLMs on edge devices. Through deploying the LLaMa-2 7B model with INT4 quantization on diverse edge devices and systematically analyzing experimental results, we identify insufficient memory and/or computing resources on traditional edge devices as the primary obstacles. Based on our observation and empirical analysis, we further provide insights and design guidance for the next generation of edge devices and systems from both hardware and software directionsmore » « less
-
The Internet of Things (IoT) requires distributed, large scale data collection via geographically distributed devices. While IoT devices typically send data to the cloud for processing, this is problematic for bandwidth constrained applications. Fog and edge computing (processing data near where it is gathered, and sending only results to the cloud) has become more popular, as it lowers network overhead and latency. Edge computing often uses devices with low computational capacity, therefore service frameworks and middleware are needed to efficiently compose services. While many frameworks use a top-down perspective, quality of service is an emergent property of the entire system and often requires a bottom up approach. We define services as multi-modal, allowing resource and performance tradeoffs. Different modes can be composed to meet an application's high level goal, which is modeled as a function. We examine a case study for counting vehicle traffic through intersections in Nashville. We apply object detection and tracking to video of the intersection, which must be performed at the edge due to privacy and bandwidth constraints. We explore the hardware and software architectures, and identify the various modes. This paper lays the foundation to formulate the online optimization problem presented by the system which makes tradeoffs between the quantity of services and their quality constrained by available resources.more » « less
-
Pervasive Edge Computing (PEC), a recent addition to the edge computing paradigm, leverages the computing resources of end-user devices to execute computation tasks in close proximity to users. One of the primary challenges in the PEC environment is determining the appropriate servers for offloading computation tasks based on factors, such as computation latency, response quality, device reliability, and cost of service. Computation outsourcing in the PEC ecosystem requires additional security and privacy considerations. Finally, mechanisms need to be in place to guarantee fair payment for the executed service(s). We present ππΈπππΈπ , a novel, privacy-preserving, and decentralized framework that addresses aforementioned challenges by utilizing blockchain technology and trusted execution environments (TEE). ππΈπππΈπ improves the performance of PEC by allocating resources among end-users efficiently and securely. It also provides the underpinnings for building a financial ecosystem at the pervasive edge. To evaluate the effectiveness of ππΈπππΈπ , we developed and deployed a proof of concept implementation on the Ethereum blockchain, utilizing Intel SGX as the TEE technology. We propose a simple but highly effective remote attestation method that is particularly beneficial to PEC compared to the standard remote attestation method used today. Our extensive comparison experiment shows that ππΈπππΈπ is 1.23Γ to 2.15Γ faster than the current standard remote attestation procedure. In addition, we formally prove the security of our system using the universal composability (UC) framework.more » « less
-
Mobile devices such as smartphones and autonomous vehicles increasingly rely on deep neural networks (DNNs) to execute complex inference tasks such as image classification and speech recognition, among others. However, continuously executing the entire DNN on mobile devices can quickly deplete their battery. Although task offloading to cloud/edge servers may decrease the mobile deviceβs computational burden, erratic patterns in channel quality, network, and edge server load can lead to a significant delay in task execution. Recently, approaches based on split computing (SC) have been proposed, where the DNN is split into a head and a tail model, executed respectively on the mobile device and on the edge server. Ultimately, this may reduce bandwidth usage as well as energy consumption. Another approach, called early exiting (EE), trains models to embed multiple βexitsβ earlier in the architecture, each providing increasingly higher target accuracy. Therefore, the tradeoff between accuracy and delay can be tuned according to the current conditions or application demands. In this article, we provide a comprehensive survey of the state of the art in SC and EE strategies by presenting a comparison of the most relevant approaches. We conclude the article by providing a set of compelling research challenges.
-
null (Ed.)Due to the proliferation of Internet of Things (IoT) and application/user demands that challenge communication and computation, edge computing has emerged as the paradigm to bring computing resources closer to users. In this paper, we present Whispering, an analytical model for the migration of services (service offloading) from the cloud to the edge, in order to minimize the completion time of computational tasks offloaded by user devices and improve the utilization of resources. We also empirically investigate the impact of reusing the results of previously executed tasks for the execution of newly received tasks (computation reuse) and propose an adaptive task offloading scheme between edge and cloud. Our evaluation results show that Whispering achieves up to 35% and 97% (when coupled with computation reuse) lower task completion times than cases where tasks are executed exclusively at the edge or the cloud.more » « less