Neural Network Partitioning for Fast Distributed Inference

Viramontes, Robert; Davoodi, Azadeh

doi:10.1109/ISQED57927.2023.10129343

Citation Details

Neural Network Partitioning for Fast Distributed Inference

The rising availability of heterogeneous networked devices highlights new opportunities for distributed artificial intelligence. This work proposes an Integer Linear Programming (ILP) optimization scheme to assign layers of a neural network in a distributed setting with heterogeneous devices representing edge, hub, and cloud in order to minimize the overall inference latency. The ILP formulation captures the tradeoff between avoiding communication cost when executing consecutive layers on the same device versus the latency benefit due to weight preloading when an idle device is waiting to receive the results of an earlier layer across the network. In our experiments we show the layer assignment and inference latency of a neural network can significantly vary depending on the types of devices in the network and their communications bandwidths. more »

Award ID(s):: 2006394

PAR ID:: 10520403

Author(s) / Creator(s):: Viramontes, Robert; Davoodi, Azadeh

Publisher / Repository:: IEEE

Date Published:: 2023-04-05

ISBN:: 979-8-3503-3475-3

Page Range / eLocation ID:: 1 to 7

Format(s):: Medium: X

Location:: San Francisco, CA, USA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ISQED57927.2023.10129343

More Like this