Efficient Large Scale DLRM Implementation On Heterogeneous Memory Systems

Hildebrand, Mark; Lowe-Power, Jason; Akella, Venkatesh

doi:10.1007/978-3-031-32041-5_3

Citation Details

Efficient Large Scale DLRM Implementation On Heterogeneous Memory Systems

We propose a new data structure called CachedEmbeddings for training large scale deep learning recommendation models (DLRM) efficiently on heterogeneous (DRAM + non-volatile) memory platforms. CachedEmbeddings implements an implicit software-managed cache and data movement optimization that is integrated with the Julia programming framework to optimize the implementation of large scale DLRM implementations with multiple sparse embedded tables operations. In particular we show an implementation that is 1.4X to 2X better than the best known Intel CPU based implementations on state-of-the-art DLRM benchmarks on a real heterogeneous memory platform from Intel, and 1.32X to 1.45X improvement over Intel’s 2LM implementation that treats the DRAM as a hardware managed cache. more »

Award ID(s):: 2144883

PAR ID:: 10421143

Author(s) / Creator(s):: Hildebrand, Mark; Lowe-Power, Jason; Akella, Venkatesh

Date Published:: 2023-05-21

Journal Name:: High Performance Computing: 38th International Conference, ISC High Performance 2023, Hamburg, Germany, May 21–25, 2023, Proceedings

Page Range / eLocation ID:: 42-61

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1007/978-3-031-32041-5_3

More Like this