Enabling Real-time DNN Switching via Weight-Sharing

Tong, Jianming; Chen, Yangyu; Pan, Yue; Bambhaniya, Abhimanyu; Khare, Alind; Heo, Taekyung; Tumanov, Alexey; Krishna, Tushar

Citation Details

There is a growing rise of applications that need to support a library of models with diverse latency-accuracy trade-offs on a Pareto frontier, especially in the health-care domain. This work presents an end-to-end system for training and serving weight-sharing models. On the training end, we leverage recent research in creating a family of models on the latency- accuracy Pareto frontier that share weights, reducing the total number of unique parameters. On the serving (inference end), we propose a novel accelerator FastSwitch that extracts weight reuse across different models, thereby providing fast real-time switching between different models. more »

Award ID(s):: 2029004

PAR ID:: 10430210

Author(s) / Creator(s):: Tong, Jianming; Chen, Yangyu; Pan, Yue; Bambhaniya, Abhimanyu; Khare, Alind; Heo, Taekyung; Tumanov, Alexey; Krishna, Tushar

Date Published:: 2022-07-01

Journal Name:: Conference proceedings International Symposium on Computer Architecture

ISSN:: 0884-7495

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this