OPER: Optimality-Guided Embedding Table Parallelization for Large-scale Recommendation Model

Wang, Zheng; Wang, Yuke; Feng, Boyuan; Huang, Guyue; Mudigere, Dheevatsa; Muthiah, Bharath; Li, Ang; Ding, Yufei

Citation Details

The deployment of Deep Learning Recommendation Models (DLRMs) involves the parallelization of extra-large embedding tables (EMTs) on multiple GPUs. Existing works overlook the input-dependent behavior of EMTs and parallelize them in a coarse-grained manner, resulting in unbalanced workload distribution and inter-GPU communication. To this end, we propose OPER, an algorithm-system co-design with OPtimality-guided Embedding table parallelization for large-scale Recommendation model training and inference. The core idea of OPER is to explore the connection between DLRM inputs and the efficiency of distributed EMTs, aiming to provide a near-optimal parallelization strategy for EMTs. Specifically, we conduct an in-depth analysis of various types of EMTs parallelism and propose a heuristic search algorithm to efficiently approximate an empirically near-optimal EMT parallelization. Furthermore, we implement a distributed shared memory-based system, which supports the lightweight but complex computation and communication pattern of fine-grained EMT parallelization, effectively converting theoretical improvements into real speedups. Extensive evaluation shows that OPER achieves 2.3× and 4.0× speedup on average in training and inference, respectively, over state-of-the-art DLRM frameworks. more »

Award ID(s):: 2124039

PAR ID:: 10538948

Author(s) / Creator(s):: Wang, Zheng; Wang, Yuke; Feng, Boyuan; Huang, Guyue; Mudigere, Dheevatsa; Muthiah, Bharath; Li, Ang; Ding, Yufei

Publisher / Repository:: USENIX Association

Date Published:: 2024-07-10

ISBN:: 978-1-939133-41-0

Format(s):: Medium: X

Location:: Santa Clara, CA, USA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this