We evaluated Intel ® Optane™ DC Persistent Memory and found that Intel's persistent memory is highly sensitive to data locality, size, and access patterns, which becomes clearer by optimizing both virtual memory page size and data layout for locality. Using the Polybench high-performance computing benchmark suite and controlling for mapped page size, we evaluate persistent memory (PMEM) performance relative to DRAM. In particular, the Linux PMEM support maps preferentially maps persistent memory in large pages while always mapping DRAM to small pages. We observed using large pages for PMEM and small pages for DRAM can create a 5x difference in performance, dwarfing other effects discussed in the literature. We found PMEM performance comparable to DRAM performance for the majority of tests when controlled for page size and optimized for data locality.
more »
« less
Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data
The problem of efficiently feeding processing elements and finding ways to reduce data movement is pervasive in computing. Efficient modeling of both temporal and spatial locality of memory references is invaluable in identifying superfluous data movement in a given application. To this end, we present a new way to infer both spatial and temporal locality using reuse distance analysis. This is accomplished by performing reuse distance analysis at different data block granularities: specifically, 64B, 4KiB, and 2MiB sizes. This process of simultaneously observing reuse distance with multiple granularities is called multi-spectral reuse distance. This approach allows for a qualitative analysis of spatial locality, through observing the shifting of mass in an application's reuse signature at different granularities. Furthermore, the shift of mass is empirically measured by calculating the Earth Mover's Distance between reuse signatures of an application. From the characterization, it is possible to determine how spatially dense the memory references of an application are based on the degree to which the mass has shifted (or not shifted) and how close (or far) the Earth Mover's Distance is to zero as the data block granularity is increased. It is also possible to determine an appropriate page size from this information, and whether or not a given page is being fully utilized. From the applications profiled, it is observed that not all applications will benefit from having a larger page size. Additionally, larger data block granularities subsuming smaller ones suggest that larger pages will allow for more spatial locality exploitation, but examining the memory footprint will show whether those larger pages are fully utilized or not.
more »
« less
- Award ID(s):
- 1763503
- PAR ID:
- 10183994
- Date Published:
- Journal Name:
- Proceedings of IEEE High-Performance Extreme Computing Conference (HPEC)
- Page Range / eLocation ID:
- 1 to 8
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Emerging workloads benefit from massive memory capacities provided by hybrid memory platforms. Recent system-level hybrid memory management solutions integrate machine learning methods to better predict complex data access behaviors. Given the substantial associated learning overheads, such solutions train parallel recurrent neural networks to learn the access patterns at the granularity of a page for a carefully selected page subset. Our observation reveals that the size of this subset varies immensely across workload classes, sizes and patterns. Increasing the granularity at the level of a page group will help reduce the aggregate learning overheads. Yet, unsupervised machine learning clustering methods are not practical to use in this context. Instead, this paper builds Coeus - a page grouping mechanism for machine learning-based hybrid memory management. Coeus is simple, robust and efficient. Coeus leverages data reuse insights to fine-tune the granularity at which patterns are interpreted by the system. As a result, Coeus creates large clusters of pages that share the same access behavior, in a practical way. Coeus reduces by almost 3x the associated learning overheads. In addition, Coeus achieves 3x higher application performance, by the combined effects of applying machine learning to more pages and by performing management operations at better granularity, compared to configurations of existing hybrid memory managers.more » « less
-
The demand for memory is ever increasing. Many prior works have explored hardware memory compression to increase effective memory capacity. However, prior works compress and pack/migrate data at a small - memory blocklevel - granularity; this introduces an additional block-level translation after the page-level virtual address translation. In general, the smaller the granularity of address translation, the higher the translation overhead. As such, this additional block-level translation exacerbates the well-known address translation problem for large and/or irregular workloads. A promising solution is to only save memory from cold (i.e., less recently accessed) pages without saving memory from hot (i.e., more recently accessed) pages (e.g., keep the hot pages uncompressed); this avoids block-level translation overhead for hot pages. However, it still faces two challenges. First, after a compressed cold page becomes hot again, migrating the page to a full 4KB DRAM location still adds another level (albeit page-level, instead of block-level) of translation on top of existing virtual address translation. Second, only compressing cold data require compressing them very aggressively to achieve high overall memory savings; decompressing very aggressively compressed data is very slow (e.g., > 800ns assuming the latest Deflate ASIC in industry). This paper presents Translation-optimized Memory Compression for Capacity (TMCC) to tackle the two challenges above. To address the first challenge, we propose compressing page table blocks in hardware to opportunistically embed compression translations into them in a software-transparent manner to effectively prefetch compression translations during a page walk, instead of serially fetching them after the walk. To address the second challenge, we perform a large design space exploration across many hardware configurations and diverse workloads to derive and implement in HDL an ASIC Deflate that is specialized for memory; for memory pages, it is 4X as fast as the state-of-the art ASIC Deflate, with little to no sacrifice in compression ratio. Our evaluations show that for large and/or irregular workloads, TMCC can either improve performance by 14% without sacrificing effective capacity or provide 2.2x the effective capacity without sacrificing performance compared to a stateof-the-art hardware memory compression for capacity.more » « less
-
Current state-of-the-art resource management systems leverage Machine Learning (ML) methods to enable the efficient use of heterogeneous memory hardware, deployed across emerging computing platforms. While machine intelligence can be effectively used to learn and predict complex data access patterns of modern analytics, the use of ML over the exploded data sizes and memory footprints is prohibitive for its practical system-level integration. For this reason, recent solutions use existing lightweight historical information to predict the access behavior of majority of the application pages, and train ML models over a small page subset. To maximize application performance improvements, the pages selected for machine learning-based management are identified with elaborate page selection methods. These methods involve the calculation of detailed performance estimates depending on the configuration of the hybrid memory platform. This paper aims to reduce such vast operational overheads, that further exacerbate the existing high overheads of using machine intelligence, in return for high performance and efficiency. To this end, we build Cronus, an image-based pipeline for selecting pages for ML-based management. We visualize memory access patterns and reveal spatial and temporal correlations among the selected pages, that current methods fail to leverage. We then use the created images to detect patterns and select page groups for machine learning model deployment. Cronus drastically reduces the operational costs, while preserving the effectiveness of the page selection and achieved performance of machine intelligent hybrid memory management. This work makes a case that visualization and computer vision methods can unlock new insights and reduce the operational complexity of emerging systems solutions.more » « less
-
Current state-of-the-art systems for hybrid memory management are enriched with machine intelligence. To enable the practical use of Machine Learning (ML), system-level page schedulers focus the ML model training over a small subset of the applications’ memory footprint. At the same time, they use existing lightweight historical information to predict the access behavior of majority of the pages. To maximize application performance improvements, the pages selected for machine learning-based management are identified with elaborate page selection methods. These methods involve the calculation of detailed performance estimates depending on the configuration of the hybrid memory platform. This paper explores the opportunities to reduce such operational overheads of machine learning-based hybrid memory page schedulers via use of visualization techniques to depict memory access patterns, and reveal spatial and temporal correlations among the selected pages, that current methods fail to leverage. We propose an initial version of a visualization pipeline for prioritizing pages for machine learning, that is independent of the hybrid memory configuration. Our approach selects pages whose ML-based management delivers, on average, performance levels within 5% of current solutions, while reducing by 75 × the page selection time. We discuss future directions and make a case that visualization and computer vision methods can unlock new insights and reduce the operational complexity of emerging systems solutions.more » « less
An official website of the United States government

