- Award ID(s):
- 2016701
- NSF-PAR ID:
- 10469908
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9781450398008
- Page Range / eLocation ID:
- 1 to 11
- Format(s):
- Medium: X
- Location:
- Washington DC USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
Current state-of-the-art systems for hybrid memory management are enriched with machine intelligence. To enable the practical use of Machine Learning (ML), system-level page schedulers focus the ML model training over a small subset of the applications’ memory footprint. At the same time, they use existing lightweight historical information to predict the access behavior of majority of the pages. To maximize application performance improvements, the pages selected for machine learning-based management are identified with elaborate page selection methods. These methods involve the calculation of detailed performance estimates depending on the configuration of the hybrid memory platform. This paper explores the opportunities to reduce such operational overheads of machine learning-based hybrid memory page schedulers via use of visualization techniques to depict memory access patterns, and reveal spatial and temporal correlations among the selected pages, that current methods fail to leverage. We propose an initial version of a visualization pipeline for prioritizing pages for machine learning, that is independent of the hybrid memory configuration. Our approach selects pages whose ML-based management delivers, on average, performance levels within 5% of current solutions, while reducing by 75 × the page selection time. We discuss future directions and make a case that visualization and computer vision methods can unlock new insights and reduce the operational complexity of emerging systems solutions.more » « less
-
Emerging workloads benefit from massive memory capacities provided by hybrid memory platforms. Recent system-level hybrid memory management solutions integrate machine learning methods to better predict complex data access behaviors. Given the substantial associated learning overheads, such solutions train parallel recurrent neural networks to learn the access patterns at the granularity of a page for a carefully selected page subset. Our observation reveals that the size of this subset varies immensely across workload classes, sizes and patterns. Increasing the granularity at the level of a page group will help reduce the aggregate learning overheads. Yet, unsupervised machine learning clustering methods are not practical to use in this context. Instead, this paper builds Coeus - a page grouping mechanism for machine learning-based hybrid memory management. Coeus is simple, robust and efficient. Coeus leverages data reuse insights to fine-tune the granularity at which patterns are interpreted by the system. As a result, Coeus creates large clusters of pages that share the same access behavior, in a practical way. Coeus reduces by almost 3x the associated learning overheads. In addition, Coeus achieves 3x higher application performance, by the combined effects of applying machine learning to more pages and by performing management operations at better granularity, compared to configurations of existing hybrid memory managers.more » « less
-
The increasing demand of big data analytics for more main memory capacity in datacenters and exascale computing environments is driving the integration of heterogeneous memory technologies. The new technologies exhibit vastly greater differences in access latencies, bandwidth and capacity compared to the traditional NUMA systems. Leveraging this heterogeneity while also delivering application performance enhancements requires intelligent data placement. We present Kleio, a page scheduler with machine intelligence for applications that execute across hybrid memory components. Kleio is a hybrid page scheduler that combines existing, lightweight, history-based data tiering methods for hybrid memory, with novel intelligent placement decisions based on deep neural networks. We contribute new understanding toward the scope of benefits that can be achieved by using intelligent page scheduling in comparison to existing history-based approaches, and towards the choice of the deep learning algorithms and their parameters that are effective for this problem space. Kleio incorporates a new method for prioritizing pages that leads to highest performance boost, while limiting the resulting system resource overheads. Our performance evaluation indicates that Kleio reduces on average 80% of the performance gap between the existing solutions and an oracle with knowledge of future access pattern. Kleio provides hybrid memory systems with fast and effective neural network training and prediction accuracy levels, which bring significant application performance improvements with limited resource overheads, so as to lay the grounds for its practical integration in future systems.more » « less
-
Abstract Graph analytics shows promise for solving challenging problems on relational data. However, memory constraints arise from the large size of graphs and the high complexity of algorithms. Data prefetching is a crucial technique to hide memory access latency by predicting and fetching data into the memory cache beforehand. Traditional prefetchers struggle with fixed rules in adapting to complex memory access patterns in graph analytics. Machine learning (ML) algorithms, particularly long short-term memory (LSTM) models, excel in memory access prediction. However, they encounter challenges such as difficulty in learning interleaved access patterns and high storage costs when predicting in large memory address space. In addition, there remains a gap between designing a high-performance ML-based memory access predictor and developing an effective ML-based prefetcher for an existing memory system. In this work, we propose a novel Attention-based prefetching framework to accelerate graph analytics applications. To achieve high-performance memory access prediction, we propose A2P, a novel Attention-based memory Access Predictor for graph analytics. We use the multi-head self-attention mechanism to extract features from memory traces. We design a novel
bitmap labeling method to collect future deltas within a spatial range, making interleaved patterns easier to learn. We introduce a novelsuper page concept, allowing the model to surpass physical page constraints. To integrate A2P into a memory system, we design a three-module prefetching framework composed of an existing memory hierarchy, a prefetch controller, and the predictor A2P. In addition, we propose a hybrid design to combine A2P and existing hardware prefetchers for higher prefetching performance. We evaluate A2P and the prefetching framework using the widely used GAP benchmark. Prediction experiments show that for the top three predictions, A2P outperforms the widely used state-of-the-art LSTM-based model by 23.1% w.r.t. Precision, 21.2% w.r.t. Recall, and 10.4% w.r.t. Coverage. Prefetching experiments show that A2P provides 18.4% IPC Improvement on average, outperforming state-of-the-art prefetchers BO by 17.2%, ISB by 15.0%, and Delta-LSTM by 10.9%. The hybrid prefetcher combining A2P and ISB achieves 21.7% IPC Improvement, outperforming the hybrid of BO and ISB by 16.3%. -
We evaluated Intel ® Optane™ DC Persistent Memory and found that Intel's persistent memory is highly sensitive to data locality, size, and access patterns, which becomes clearer by optimizing both virtual memory page size and data layout for locality. Using the Polybench high-performance computing benchmark suite and controlling for mapped page size, we evaluate persistent memory (PMEM) performance relative to DRAM. In particular, the Linux PMEM support maps preferentially maps persistent memory in large pages while always mapping DRAM to small pages. We observed using large pages for PMEM and small pages for DRAM can create a 5x difference in performance, dwarfing other effects discussed in the literature. We found PMEM performance comparable to DRAM performance for the majority of tests when controlled for page size and optimized for data locality.more » « less