Exploring Memory Access Similarity to Improve Irregular Application Performance for Distributed Hybrid Memory Systems

Liu, Wenjie; He, Xubin; Liu, Qing

doi:10.1109/TPDS.2022.3227544

Citation Details

Exploring Memory Access Similarity to Improve Irregular Application Performance for Distributed Hybrid Memory Systems

With the increasing problem complexity, more irregular applications are deployed on high-performance clusters due to the parallel working paradigm, and yield irregular memory access behaviors across nodes. However, the irregularity of memory access behaviors is not comprehensively studied, which results in low utilization of the integrated hybrid memory system compositing of stacked DRAM and off-chip DRAM. To address this problem, we devise a novel method called Similarity-Managed Hybrid Memory System (SM-HMS) to improve the hybrid memory system performance by leveraging the memory access similarity among nodes in a cluster. Within SM-HMS, two techniques are proposed, Memory Access Similarity Measuring and Similarity-based Memory Access Behavior Sharing. To quantify the memory access similarity, memory access behaviors of each node are vectorized, and the distance between two vectors is used as the memory access similarity. The calculated memory access similarity is used to share memory access behaviors precisely across nodes. With the shared memory access behaviors, SM-HMS divides the stacked DRAM into two sections, the sliding window section and the outlier section. The shared memory access behaviors guide the replacement of the sliding window section while the outlier section is managed in the LRU manner. Our evaluation results with a set of irregular applications on various clusters consisting of up to 256 nodes have shown that SM-HMS outperforms the state-of-the-art approaches, Cameo, Chameleon, and Hyrbid2, on job finish time reduction by up to 58:6%, 56:7%, and 31:3%, with 46:1%, 41:6%, and 19:3% on average, respectively. SM-HMS can also achieve up to 98:6% (91:9% on average) of the ideal hybrid memory system performance. more »