skip to main content


Title: Principles of Memory-Centric Programming for High Performance Computing
The memory wall challenge -- the growing disparity between CPU speed and memory speed -- has been one of the most critical and long-standing challenges in computing. For high performance computing, programming to achieve efficient execution of parallel applications often requires more tuning and optimization efforts to improve data and memory access than for managing parallelism. The situation is further complicated by the recent expansion of the memory hierarchy, which is becoming deeper and more diversified with the adoption of new memory technologies and architectures such as 3D-stacked memory, non-volatile random-access memory (NVRAM), and hybrid software and hardware caches. The authors believe it is important to elevate the notion of memory-centric programming, with relevance to the compute-centric or data-centric programming paradigms, to utilize the unprecedented and ever-elevating modern memory systems. Memory-centric programming refers to the notion and techniques of exposing hardware memory system and its hierarchy, which could include DRAM and NUMA regions, shared and private caches, scratch pad, 3-D stacked memory, non-volatile memory, and remote memory, to the programmer via portable programming abstractions and APIs. These interfaces seek to improve the dialogue between programmers and system software, and to enable compiler optimizations, runtime adaptation, and hardware reconfiguration with regard to data movement, beyond what can be achieved using existing parallel programming APIs. In this paper, we provide an overview of memory-centric programming concepts and principles for high performance computing.  more » « less
Award ID(s):
1652732
PAR ID:
10050476
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
MCHPC'17 Proceedings of the Workshop on Memory Centric Programming for HPC
Page Range / eLocation ID:
2 to 6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Ashvin Goel ; Dalit Naor (Ed.)
    Byte-addressable non-volatile memory (NVM) allows programs to directly access storage using memory interface without going through the expensive conventional storage stack. However, direct access to NVM makes the NVM data vulnerable to software bugs and hardware errors. This issue is critical because, unlike DRAM, corrupted data can persist forever, even after the system restart. Albeit the plethora of research on NVM programs and systems, there is little focus on protecting NVM data from software bugs and hardware errors. In this paper, we propose TENET, a new NVM programming framework, which guarantees memory safety and fault tolerance to protect NVM data against software bugs and hardware errors. TENET provides the popular persistent transactional memory (PTM) programming model. TENET leverages the concurrency guarantees (i.e., ACID properties) of PTM to provide performant and cost-efficient memory safety and fault tolerance. Our evaluations show that TENET offers an enhanced protection scope at a modest performance overhead and storage cost as compared to other PTMs with partial or no memory safety and fault tolerance support. 
    more » « less
  2. Current systems hide data movement from software behind the load-store interface. Software’s inability to observe and respond to data movement is the root cause of many inefficiencies, including the growing fraction of execution time and energy devoted to data movement itself. Recent specialized memory-hierarchy designs prove that large data-movement savings are possible. However, these designs require custom hardware, raising a large barrier to their practical adoption. This paper argues that the hardware-software interface is the problem, and custom hardware is often unnecessary with an expanded interface. The täkō architecture lets software observe data movement and interpose when desired. Specifically, caches in täkō can trigger software callbacks in response to misses, evictions, and writebacks. Callbacks run on reconfigurable dataflow engines placed near caches. Five case studies show that this interface covers a wide range of data-movement features and optimizations. Microarchitecturally, täkō is similar to recent near-data computing designs, adding ≈5% area to a baseline multicore. täkō improves performance by 1.4×–4.2×, similar to prior custom hardware designs, and comes within 1.8% of an idealized implementation. 
    more » « less
  3. In this paper, we propose MRIMA, as a novel MRAM-based In-Memory Accelerator for non-volatile, flexible, and efficient in-memory computing. MRIMA transforms current Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) arrays to massively parallel computational units capable of working as both non-volatile memory and in-memory logic. Instead of integrating complex logic units in cost-sensitive memory, MRIMA exploits hardware-friendly bit-line computing methods to implement complete Boolean logic functions between operands within a memory array in a single clock cycle, overcoming the multi-cycle logic issue in contemporary Processing-In-Memory (PIM) platforms. We present practical case studies to demonstrate MRIMA’s acceleration for binary-weight and low bit-width Convolutional Neural Networks (CNN) as well as data encryption. Our device-to-architecture co-simulation results on CNN acceleration demonstrate that MRIMA can obtain 1.7× better energy-efficiency and 11.2× speed-up compared to ASICs, and, 1.8× better energy-efficiency and 2.4× speed-up over the best DRAM-based PIM solutions. As an AES in-memory encryption engine, MRIMA shows 77% and 21% lower energy consumption compared to CMOS-ASIC and recent domain wall-based design, respectively. 
    more » « less
  4. null (Ed.)
    Byte-addressable, non-volatile, random access memory (NVM) has the potential to dramatically accelerate the performance of storage-intensive workloads. For applications with irregular data access patterns, and applications that rely on ad-hoc data structures, the most promising model for interacting with NVM is a transactional model. However, the specifics of the model matter significantly. We introduce two models for programming persistent transactions. We show how to build concurrent persistent transactional memory from traditional software transactional memories. We then introduce general and model-specific optimizations that can substantially improve the performance of persistent transactions. Our evaluation shows a substantial improvement in the both the latency and scalability of persistent transactions. 
    more » « less
  5. null (Ed.)
    Non-volatile memory (NVRAM) based on phase-change memory (such as Optane DC Persistent Memory Module) is making its way into Intel servers to address the needs of emerging applications that have a huge memory footprint. These systems have both DRAM and NVRAM on the same memory channel with the smaller capacity DRAM serving as a cache to the larger capacity NVRAM in the so called 2LM mode. In this work we analyze the performance of such DRAM caches on real hardware using a broad range of synthetic and real-world benchmarks. We identify three key limitations of DRAM caches in these emerging systems which prevent large-scale, bandwidth bound applications from taking full advantage of NVRAM read and write bandwidth. We show that software based techniques are necessary for orchestrating the data movement between DRAM and PMM for such workloads to take full advantage of these new heterogeneous memory systems. 
    more » « less