Dělen: Enabling Flexible and Adaptive Model-serving for Multi-tenant Edge AI

Liang, Qianlin; Hanafy, Walid A.; Bashir, Noman; Ali-Eldin, Ahmed; Irwin, David; Shenoy, Prashant

doi:10.1145/3576842.3582375

Citation Details

This content will become publicly available on May 9, 2024

Dělen: Enabling Flexible and Adaptive Model-serving for Multi-tenant Edge AI

Model-serving systems expose machine learning (ML) models to applications programmatically via a high-level API. Cloud plat- forms use these systems to mask the complexities of optimally managing resources and servicing inference requests across multi- ple applications. Model serving at the edge is now also becoming increasingly important to support inference workloads with tight latency requirements. However, edge model serving differs substan- tially from cloud model serving in its latency, energy, and accuracy constraints: these systems must support multiple applications with widely different latency and accuracy requirements on embedded edge accelerators with limited computational and energy resources. To address the problem, this paper presents Dělen,1 a flexible and adaptive model-serving system for multi-tenant edge AI. Dělen exposes a high-level API that enables individual edge applications to specify a bound at runtime on the latency, accuracy, or energy of their inference requests. We efficiently implement Dělen using conditional execution in multi-exit deep neural networks (DNNs), which enables granular control over inference requests, and evalu- ate it on a resource-constrained Jetson Nano edge accelerator. We evaluate Dělen flexibility by implementing state-of-the-art adapta- tion policies using Dělen’s API, and evaluate its adaptability under different workload dynamics and goals when running single and multiple applications. more »

Award ID(s):: 2213636 2211888 2105494 2211302

NSF-PAR ID:: 10433378

Author(s) / Creator(s):: Liang, Qianlin; Hanafy, Walid A.; Bashir, Noman; Ali-Eldin, Ahmed; Irwin, David; Shenoy, Prashant

Date Published:: 2023-05-09

Journal Name:: ACM/IEEE Conference on Internet of Things Design and Implementation

Page Range / eLocation ID:: 209 to 221

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on May 9, 2024
Conference Paper:
https://doi.org/10.1145/3576842.3582375

More Like this