NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DeepSim: A Transformer Based Model For Fast Simulation And Exploring Computer System Design Space

https://doi.org/10.1145/3573900.3593634

Najafi, Hamed; Lu, Xiaoyang (June 2023, SIGSIM-PADS '23: Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation)

Full Text Available
The Memory-Bounded Speedup Model and Its Impacts in Computing

https://doi.org/10.1007/s11390-022-2911-1

Sun, Xian-He; Lu, Xiaoyang (February 2023, Journal of Computer Science and Technology)

Full Text Available
CARE: A Concurrency-Aware Enhanced Lightweight Cache Management Framework

https://doi.org/10.1109/HPCA56546.2023.10071125

Lu, Xiaoyang; Wang, Rujia; Sun, Xian-He (February 2023, 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA))

Full Text Available
A Generalized Model for Modern Hierarchical Memory System

https://doi.org/10.1109/WSC57314.2022.10015298

Najafi, Hamed; Liu, Jason; Lu, Xiaoyang; Sun, Xian-He (December 2022, 2022 Winter Simulation Conference (WSC))

Memory system is critical to architecture design which can significantly impact application performance. Concurrent Average Memory Access Time (C-AMAT) is a model for analyzing and optimizing memory system performance using a recursive definition of the memory access latency along the memory hierarchy. The original C-AMAT model, however, does not provide the necessary granularity and flexibility for handling modern memory architectures with heterogeneous memory technologies and diverse system topology. We propose to augment C-AMAT to take into consideration the idiosyncrasies of individual cache/memory components as well as their topological arrangement in the memory architecture design. Through trace-based simulation, we validate the augmented model and examine the memory system performance with insight unavailable using the original C-AMAT model.
more » « less
Full Text Available
Accelerating Graph Processing With Lightweight Learning-Based Data Reordering

https://doi.org/10.1109/LCA.2022.3151087

Zou, Mo; Zhang, Mingzhe; Wang, Rujia; Sun, Xian-He; Ye, Xiaochun; Fan, Dongrui; Tang, Zhimin (January 2022, IEEE Computer Architecture Letters)

Full Text Available
Premier: A Concurrency-Aware Pseudo-Partitioning Framework for Shared Last-Level Cache

https://doi.org/10.1109/ICCD53106.2021.00068

Lu, Xiaoyang; Wang, Rujia; Sun, Xian-He (October 2021, IEEE 39th International Conference on Computer Design (ICCD))

Full Text Available
CoPIM: A Concurrency-aware PIM Workload Offloading Architecture for Graph Applications

https://doi.org/10.1109/ISLPED52811.2021.9502483

Yan, Liang; Zhang, Mingzhe; Wang, Rujia; Chen, Xiaoming; Zou, Xingqi; Lu, Xiaoyang; Han, Yinhe; Sun, Xian-He (July 2021, Proceedings of the 2021 ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED202))

Full Text Available
A Study on Modeling and Optimization of Memory Systems

https://doi.org/10.1007/s11390-021-0771-8

Liu, Jason; Espina, Pedro; Sun, Xian-He (January 2021, Journal of Computer Science and Technology)
null (Ed.)
Full Text Available
APAC: An Accurate and Adaptive Prefetch Framework with Concurrent Memory Access Analysis

https://doi.org/10.1109/ICCD50377.2020.00048

Lu, Xiaoyang; Wang, Rujia; Sun, Xian-He (October 2020, 38th IEEE International Conference on Computer Design (ICCD 2020))
null (Ed.)
Prefetching techniques have been studied for decades. However, there are few studies on how concurrent memory accesses may affect prefetching effectiveness. When there are multiple concurrent memory requests, we can classify them into sub-classes by analyzing the overlapping relationship. In this work, we first propose pure prefetch coverage (PPC), a novel prefetching metric that can identify an accurate prefetch coverage under the concurrent memory access model. Then we propose APAC, an adaptive prefetch framework with PPC metric that can capture the dynamics of applications and adjust the prefetching aggressiveness. Our experimental results show that the PPC metric has a higher IPC correlation compared to the conventional prefetch coverage (PC) metric. For memory-intensive single-thread benchmarks, APAC provides an average performance improvement by 17.3% and 5.9% compared to the state-of-the-art adaptive prefetch framework FDP and NST. In a multi-core system, APAC outperforms FDP and NST by 8.5% and 5.0% IPC on average, respectively.
more » « less
Full Text Available
Performance Modeling and Evaluation of a Production Disaggregated Memory System

https://doi.org/10.1145/3422575.3422795

Sun, Xian-He; Zhang, Ning; Toonen, Brian; Allcock, Bill (September 2020, The International Symposium on Memory Systems)
null (Ed.)
High performance computers rely on large memories to cache data and improve performance. However, managing the ever-increasing number of levels in the memory hierarchy becomes increasingly difficult. The Disaggregated Memory System (DMS) architecture was introduced in recent years for better memory utilization. DMS is a global memory pool between the local memories and storage. To leverage DMS, we need a better understanding of its performance and how to exploit its full potential. In this study, we first present a DMS performance model for performance evaluation and analysis. We next conduct a thorough performance evaluation to identify application-DMS characteristics under different system configurations. Experimental tests are conducted on the RAM Area Network (RAN), a DMS implementation available at the Argonne National Laboratory, for performance evaluation. Then, the results of performance experiments are presented along with an analysis of the pros and cons of the RAN-DMS design and implementation. The counterintuitive performance results for the K-means application are analyzed at code-level to illustrate DMS performance. Finally, based on our findings, we present some discussions on future DMS design and its potential on AI applications.
more » « less
Full Text Available

Search for: All records