NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ACES: Accelerating Sparse Matrix Multiplication with Adaptive Execution Flow and Concurrency-Aware Cache Optimizations

https://doi.org/10.1145/3620666.3651381

Lu, Xiaoyang; Long, Boyu; Chen, Xiaoming; Han, Yinhe; Sun, Xian-He (April 2024, ACM)

Sparse matrix-matrix multiplication (SpMM) is a critical computational kernel in numerous scientific and machine learning applications. SpMM involves massive irregular memory accesses and poses great challenges to conventional cache-based computer architectures. Recently dedicated SpMM accelerators have been proposed to enhance SpMM performance. However, current SpMM accelerators still face challenges in adapting to varied sparse patterns, fully exploiting inherent parallelism, and optimizing cache performance. To address these issues, we introduce ACES, a novel SpMM accelerator in this study. First, ACES features an adaptive execution flow that dynamically adjusts to diverse sparse patterns. The adaptive execution flow balances parallel computing efficiency and data reuse. Second, ACES incorporates locality-concurrency co-optimizations within the global cache. ACES utilizes a concurrency-aware cache management policy, which considers data locality and concurrency for optimal replacement decisions. Additionally, the integration of a non-blocking buffer with the global cache enhances concurrency and reduces computational stalls. Third, the hardware architecture of ACES is designed to integrate all innovations. The architecture ensures efficient support across the adaptive execution flow, advanced cache optimizations, and fine-grained parallel processing. Our performance evaluation demonstrates that ACES significantly outperforms existing solutions, providing a 2.1× speedup and marking a substantial advancement in SpMM acceleration.
more » « less
Full Text Available
CHROME: Concurrency-Aware Holistic Cache Management Framework with Online Reinforcement Learning

https://doi.org/10.1109/HPCA57654.2024.00090

Lu, Xiaoyang; Najafi, Hamed; Liu, Jason; Sun, Xian-He (March 2024, 30th IEEE International Symposium on High-Performance Computer Architecture (HPCA))

Cache management is a critical aspect of computer architecture, encompassing techniques such as cache replacement, bypassing, and prefetching. Existing research has often focused on individual techniques, overlooking the potential benefits of joint optimization. Moreover, many of these approaches rely on static and intuition-driven policies, limiting their performance under complex and dynamic workloads. To address these challenges, this paper introduces CHROME, a novel concurrencyaware cache management framework. CHROME takes a holistic approach by seamlessly integrating intelligent cache replacement and bypassing with pattern-based prefetching. By leveraging online reinforcement learning, CHROME dynamically adapts cache decisions based on multiple program features and applies a reward for each decision that considers the accuracy of the action and the system-level feedback information. Our performance evaluation demonstrates that CHROME outperforms current state-of-the-art schemes, exhibiting significant improvements in cache management. Notably, CHROME achieves a remarkable performance boost of up to 13.7% over the traditional LRU method in multi-core systems with only modest overhead.
more » « less
Full Text Available
CARE: A Concurrency-Aware Enhanced Lightweight Cache Management Framework

https://doi.org/10.1109/HPCA56546.2023.10071125

Lu, Xiaoyang; Wang, Rujia; Sun, Xian-He (February 2023, 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA))

Full Text Available
SampleMine: A Framework for Applying Random Sampling to Subgraph Pattern Mining through Loop Perforation

https://doi.org/10.1145/3559009.3569658

Jiang, Peng; Wei, Yihua; Su, Jiya; Wang, Rujia; Wu, Bo (October 2022, PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques)

Full Text Available
DNNCloak: Secure DNN Models Against Memory Side-channel Based Reverse Engineering Attacks

https://doi.org/10.1109/ICCD56317.2022.00023

Che, Yuezhi; Wang, Rujia (October 2022, 2022 IEEE 40th International Conference on Computer Design (ICCD))

Full Text Available

Search for: All records