An Empirical Analysis and Resource Footprint Study of Deploying Large Language Models on Edge Devices

Dhar, Nobel; Deng, Bobin; Lo, Dan; Wu, Xiaofeng; Zhao, Liang; Suo, Kun

doi:10.1145/3603287.3651205

Citation Details

An Empirical Analysis and Resource Footprint Study of Deploying Large Language Models on Edge Devices

The success of ChatGPT is reshaping the landscape of the entire IT industry. The large language model (LLM) powering ChatGPT is experiencing rapid development, marked by enhanced features, improved accuracy, and reduced latency. Due to the execution overhead of LLMs, prevailing commercial LLM products typically manage user queries on remote servers. However, the escalating volume of user queries and the growing complexity of LLMs have led to servers becoming bottlenecks, compromising the quality of service (QoS). To address this challenge, a potential solution is to shift LLM inference services to edge devices, a strategy currently being explored by industry leaders such as Apple, Google, Qualcomm, Samsung, and others. Beyond alleviating the computational strain on servers and enhancing system scalability, deploying LLMs at the edge offers additional advantages. These include real-time responses even in the absence of network connectivity and improved privacy protection for customized or personal LLMs. This article delves into the challenges and potential bottlenecks currently hindering the effective deployment of LLMs on edge devices. Through deploying the LLaMa-2 7B model with INT4 quantization on diverse edge devices and systematically analyzing experimental results, we identify insufficient memory and/or computing resources on traditional edge devices as the primary obstacles. Based on our observation and empirical analysis, we further provide insights and design guidance for the next generation of edge devices and systems from both hardware and software directions more »

Award ID(s):: 2103459

PAR ID:: 10529097

Author(s) / Creator(s):: Dhar, Nobel; Deng, Bobin; Lo, Dan; Wu, Xiaofeng; Zhao, Liang; Suo, Kun

Publisher / Repository:: ACM

Date Published:: 2024-04-18

ISBN:: 9798400702372

Page Range / eLocation ID:: 69 to 76

Subject(s) / Keyword(s):: Large Language Models (LLMs), LLaMA-2, Edge Devices, Edge Computing

Format(s):: Medium: X

Location:: Marietta GA USA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3603287.3651205

More Like this