OPA: One-Predict-All For Efficient Deployment

Junpeng Guo; Shengqing Xia; Chunyi Peng

Deep neural network (DNN) is the de facto standard for running a variety of computer vision applications over mobile and embedded systems. Prior to deployment, a DNN is specialized by training to fit the target use scenario (depending on computing power and visual data input). To handle its costly training and meet diverse deployment needs, a “Train Once, Deploy Everywhere” paradigm has been recently proposed by training one super-network and selecting one out of many sub-networks (part of the super-network) for the target scenario; This empowers efficient DNN deployment at low training cost (training once). However, the existing studies tackle some deployment factors like computing power and source data but largely overlook the impact of their runtime dynamics (say, time-varying visual contents and GPU/CPU workloads). In this work, we propose OPA to cover all these deployment factors, particularly those along with runtime dynamics in visual data contents and computing resources. To quickly and accurately learn which sub-network runs “best” in the dynamic deployment scenario, we devise a “One-Predict-All” approach with no need to run all the candidate sub-networks. Instead, we first develop a shallow sub-network to test the water and then use its test results to predict the performance of all other deeper sub-networks. We have implemented and evaluated OPA. Compared to the state-of-the-art, OPA has achieved up to 26% higher Top-1 accuracy for a given latency requirement.

More Like this