Toward Weight Sharing Paradigm for Efficient AI: Training and Inference Serving

Behnam, Payman; Khare, Alind; Garg, Dhruv; Tumanov, Alexey

doi:10.1145/3759441.3759447

Deep neural networks are increasingly required to operate across diverse hardware platforms, latency constraints, and power budgets, which motivates the need for specialized models for each scenario. However, designing and training a separate model per scenario or serving a large ensemble of models is often impractical. Weight sharing has emerged as a promising paradigm to address this challenge by training a single ''SuperNet'' that subsumes many sub-models (SubNets), and by reusing weights across those SubNets both at training and inference time. This paper provides an abridged survey of our recent advances that leverage weight sharing for efficient AI, covering both training and inference serving. In centralized once-for-all training, Delayed ε-Shrinking (DεS) improves training efficiency by strategically scheduling the introduction of smaller SubNets during training. In a federated fashion, SuperFedNas co-trains a SuperNet across distributed clients and disjoins training and searching, which enables oneshot specialization to many deployment targets at minimal cost. ∇QDARTS integrates quantization into differentiable architecture search, jointly finding neural architectures, weights, and low-precision settings to yield highly efficient models in a single search. For inference serving, SuperServe introduces a weight-shared model with dynamic SubNet routing (SubNetAct) to instantaneously switch among a spectrum of accuracy-latency operating points, coupled with a scheduler (SlackFit) for unpredictable workloads. Finally, SUSHI co-designs model, system, and accelerator to exploit weightshared SuperNets on tinyML devices, caching SubGraphs on FPGA to reduce latency and energy. Together, these works demonstrate that the weight sharing paradigm can dramatically improve the efficiency of both training and inference serving of deep models across a range of scenarios.

More Like this