GMorph: Accelerating Multi-DNN Inference via Model Fusion

Yang, Qizheng; Yang, Tianyi; Xiang, Mingcan; Zhang, Lijun; Wang, Haoliang; Serafini, Marco; Guan, Hui

Citation Details

AI-powered applications often involve multiple deep neural network (DNN)-based prediction tasks to support application level functionalities. However, executing multi-DNNs can be challenging due to the high resource demands and computation costs that increase linearly with the number of DNNs. Multi-task learning (MTL) addresses this problem by designing a multi-task model that shares parameters across tasks based on a single backbone DNN. This paper explores an alternative approach called model fusion: rather than training a single multi-task model from scratch as MTL does, model fusion fuses multiple task-specific DNNs that are pre-trained separately and can have heterogeneous architectures into a single multi-task model. We materialize model fusion in a software framework called GMorph to accelerate multi- DNN inference while maintaining task accuracy. GMorph features three main technical contributions: graph mutations to fuse multi-DNNs into resource-efficient multi-task models, search-space sampling algorithms, and predictive filtering to reduce the high search costs. Our experiments show that GMorph can outperform MTL baselines and reduce the inference latency of multi-DNNs by 1.1-3X while meeting the target task accuracy. more »

Award ID(s):: 2338512 2312396 2220211 2224054

PAR ID:: 10538822

Author(s) / Creator(s):: Yang, Qizheng; Yang, Tianyi; Xiang, Mingcan; Zhang, Lijun; Wang, Haoliang; Serafini, Marco; Guan, Hui

Publisher / Repository:: ACM EuroSys'24

Date Published:: 2024-04-24

ISBN:: 979-8-4007-0437-6

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this