Low latency RNN inference with cellular batching

Gao, Pin; Yu, Lingfan; Wu, Yongwei; Li, Jinyang

doi:10.1145/3190508.3190541

Citation Details

Low latency RNN inference with cellular batching

Performing inference on pre-trained neural network models must meet the requirement of low-latency, which is often at odds with achieving high throughput. Existing deep learning systems use batching to improve throughput, which do not perform well when serving Recurrent Neural Networks with dynamic dataflow graphs. We propose the technique of cellular batching, which improves both the latency and throughput of RNN inference. Unlike existing systems that batch a fixed set of dataflow graphs, cellular batching makes batching decisions at the granularity of an RNN "cell" (a subgraph with shared weights) and dynamically assembles a batched cell for execution as requests join and leave the system. We implemented our approach in a system called BatchMaker. Experiments show that BatchMaker achieves much lower latency and also higher throughput than existing systems. more »

Award ID(s):: 1816717

PAR ID:: 10311675

Author(s) / Creator(s):: Gao, Pin; Yu, Lingfan; Wu, Yongwei; Li, Jinyang

Date Published:: 2018-04-23

Journal Name:: EuroSys '18: Proceedings of the Thirteenth EuroSys Conference

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3190508.3190541

More Like this