skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Hardware Supported Permission Checks on Persistent Objects for Performance and Programmability
Non-Volatile Memory technologies are advancing rapidly and may augment or replace DRAM in future systems. However, a key question is how programmers will use them to construct and manipulate persistent data. One possible approach gives programmers direct access to persistent memory using relocatable persistent pools that hold persistent objects which can be accessed using persistent pointers, called ObjectIDs. Prior work has shown that hardware-supported address translation for ObjectIDs provides significant performance improvement and simplifies programming, however these works did not consider the large overheads incurred to check permissions before accessing persistent objects. In this paper, we identify permission checking in hardware as a critical mechanism that must be included when translating ObjectIDs to addresses in order to simplify programming and fully benefit from hardware translation. To support it, we add a System Persistent Object Table (SPOT) to support translation and permissions checks on ObjectIDs. The SPOT holds all known pools, their physical address, and their permissions information in memory. When a program attempts to access a persistent object, the SPOT is consulted and permissions are verified without trapping to the operating system. We have implemented our new design in a cycle accurate simulator and compared it with software only approaches and prior work. We find that our design offers a compelling 2.9x speedup on average for microbenchmarks that access pools with the RANDOM pattern and 1.4x and 1.8x speedup on TPC-C and vacation, respectively, for the SEPARATE pattern.  more » « less
Award ID(s):
1717486
PAR ID:
10077433
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
International Symposium on Computer Architecture
Page Range / eLocation ID:
466 to 478
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Modern programming languages offer abstractions that simplify software development and allow hardware to reach its full potential. These abstractions range from the well-established OpenMP language extensions to newer C++ features like smart pointers. To properly use these abstractions in an existing codebase, programmers must determine how a given source code region interacts with Program State Elements (PSEs) (i.e., the program's variables and memory locations). We call this process Program State Element Characterization (PSEC). Without tool support for PSEC, a programmer's only option is to manually study the entire codebase. We propose a profile-based approach that automates PSEC and provides abstraction recommendations to programmers. Because a profile-based approach incurs an impractical overhead, we introduce the Compiler and Runtime Memory Observation Tool (CARMOT), a PSEC-specific compiler co-designed with a parallel runtime. CARMOT reduces the overhead of PSEC by two orders of magnitude, making PSEC practical. We show that CARMOT's recommendations achieve the same speedup as hand-tuned OpenMP directives and avoid memory leaks with C++ smart pointers. From this, we argue that PSEC tools, such as CARMOT, can provide support for the rich ecosystem of modern programming language abstractions. 
    more » « less
  2. Breath-first search (BFS) is a fundamental building block in many graph-based applications. It is challenging to optimize due to its irregular memory-access pattern. Prior work, based on hardware description languages (HDLs) and high-level synthesis (HLS), address the memory-access bottleneck by using techniques such as edge-centric traversal, data alignment, and compute-unit (CU) replication. While these optimizations work well for dense graph datasets, optimizing BFS on sparse graphs remains a significant challenge due to the kernel launch overhead and poor workload distribution across processing elements. As a complement to the prior work, we present and evaluate optimizations in OpenCL for BFS on sparse graphs. Specifically, we explore application-specific and architecture-aware optimizations aimed at mitigating the irregular global-memory access bottleneck in sparse graphs. In our kernel design, we consider factors such as choice of data structure between queue and array, number of memory banks, and kernel launch configuration. We evaluate the impact of proposed optimizations on a diverse set of sparse graphs. In comparison with the state-of-the-art OpenCL implementation for FPGA, we achieve 5.7x-22.3x speedup on Stratix 10 SX 2800 FPGA for the graphs that are most sensitive to our optimization scheme. 
    more » « less
  3. Ashvin Goel; Dalit Naor (Ed.)
    Byte-addressable non-volatile memory (NVM) allows programs to directly access storage using memory interface without going through the expensive conventional storage stack. However, direct access to NVM makes the NVM data vulnerable to software bugs and hardware errors. This issue is critical because, unlike DRAM, corrupted data can persist forever, even after the system restart. Albeit the plethora of research on NVM programs and systems, there is little focus on protecting NVM data from software bugs and hardware errors. In this paper, we propose TENET, a new NVM programming framework, which guarantees memory safety and fault tolerance to protect NVM data against software bugs and hardware errors. TENET provides the popular persistent transactional memory (PTM) programming model. TENET leverages the concurrency guarantees (i.e., ACID properties) of PTM to provide performant and cost-efficient memory safety and fault tolerance. Our evaluations show that TENET offers an enhanced protection scope at a modest performance overhead and storage cost as compared to other PTMs with partial or no memory safety and fault tolerance support. 
    more » « less
  4. Persistent memory (PMem) is a low-latency storage technology connected to the processor memory bus. The Direct Access (DAX) interface promises fast access to PMem, mapping it directly to processes' virtual address spaces. However, virtual memory operations (e.g., paging) limit its performance and scalability. Through an analysis of Linux/x86 memory mapping, we find that current systems fall short of what hardware can provide due to numerous software inefficiencies stemming from OS assumptions that memory mapping is for DRAM. In this paper we propose DaxVM, a design that extends the OS virtual memory and file system layers leveraging persistent memory attributes to provide a fast and scalable DAX-mmap interface. DaxVM eliminates paging costs through pre-populated file page tables, supports faster and scalable virtual address space management for ephemeral mappings, performs unmappings asynchronously, bypasses kernel-space dirty-page tracking support, and adopts asynchronous block pre-zeroing. We implement DaxVM in Linux and the ext4 file system targeting x86-64 architecture. DaxVM mmap achieves 4.9× higher throughput than default mmap for the Apache webserver and up to 1.5× better performance than read system calls. It provides similar benefits for text search. It also provides fast boot times and up to 2.95× better throughput than default mmap for PMem-optimized key-value stores running on a fragmented ext4 image. Despite designed for direct access to byte-addressable storage, various aspects of DaxVM are relevant for efficient access to other high performant storage mediums. 
    more » « less
  5. Persistent memory (PMem) is a low-latency storage technology connected to the processor memory bus. The Direct Access (DAX) interface promises fast access to PMem, mapping it directly to processes' virtual address spaces. However, virtual memory operations (e.g., paging) limit its performance and scalability. Through an analysis of Linux/x86 memory mapping, we find that current systems fall short of what hardware can provide due to numerous software inefficiencies stemming from OS assumptions that memory mapping is for DRAM. In this paper we propose DaxVM, a design that extends the OS virtual memory and file system layers leveraging persistent memory attributes to provide a fast and scalable DAX-mmap interface. DaxVM eliminates paging costs through pre-populated file page tables, supports faster and scalable virtual address space management for ephemeral mappings, performs unmappings asynchronously, bypasses kernel-space dirty-page tracking support, and adopts asynchronous block pre-zeroing. We implement DaxVM in Linux and the ext4 file system targeting x86-64 architecture. DaxVM mmap achieves 4.9× higher throughput than default mmap for the Apache webserver and up to 1.5× better performance than read system calls. It provides similar benefits for text search. It also provides fast boot times and up to 2.95× better throughput than default mmap for PMem-optimized key-value stores running on a fragmented ext4 image. Despite designed for direct access to byte-addressable storage, various aspects of DaxVM are relevant for efficient access to other high performant storage mediums. 
    more » « less