skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on October 9, 2026

Title: Towards Verifying Crash Consistency
Compute Express Link (CXL) memory sharing, persistent memory, and other related technologies allow data to survive crash events. A key challenge is ensuring that data is consistent after crashes such that it can be safely accessed. While there has been much work on bug-finding tools for persistent memory programs, these tools cannot guarantee that a program is crash-consistent. In this paper, we present a language, CrashLang, and its type system, that together guarantee that well-typed data structure implementations written in CrashLang are crash-consistent. CrashLang leverages the well-known commit-store pattern in which a single store logically commits an entire data structure operation. In this paper, we prove that well-typed CrashLang programs are crash-consistent, and provide a prototype implementation of the CrashLang compiler. We have evaluated CrashLang on five benchmarks: the Harris linked list, the Treiber stack, the Michael–Scott queue, a Read-Copy-Update binary search tree, and a Cache-Line Hash Table. We experimentally verified that each implementation correctly survives crashes.  more » « less
Award ID(s):
2006948 2102940 2220410
PAR ID:
10659676
Author(s) / Creator(s):
; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the ACM on Programming Languages
Volume:
9
Issue:
OOPSLA2
ISSN:
2475-1421
Page Range / eLocation ID:
753 to 783
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Non-volatile memory (NVM) is poised to augment or replace DRAM as main memory. With the right abstraction and support, non-volatile main memory (NVMM) can provide an alternative to the storage system to host long-lasting persistent data. However, keeping persistent data in memory requires programs to be written such that data is crash consistent (i.e. it can be recovered after failure). Critical to supporting crash recovery is the guarantee of ordering of when stores become durable with respect to program order. Strict persistency, which requires persist order to coincide with program order of stores, is simple and intuitive but generally thought to be too slow. More relaxed persistency models are available but demand higher programming complexity, e.g. they require the programmer to insert persist barriers correctly in their program. We identify the source of strict persistency inefficiency as the gap between the point of visibility (PoV) which is the cache, and the point of persistency (PoP) which is the memory. In this paper, we propose a new approach to close the PoV/PoP gap which we refer to as Battery-Backed Buffer (BBB). The key idea of BBB is to provide a battery-backed persist buffer (bbPB) in each core next to the L1 data cache (L1D). A store value is allocated in the bbPB as it is written to cache, becoming part of the persistence domain. If a crash occurs, battery ensures bbPB can be fully drained to NVMM. BBB simplifies persistent programming as the programmer does not need to insert persist barriers or flushes. Furthermore, our BBB design achieves nearly identical results to eADR in terms of performance and number of NVMM writes, while requiring two orders of magnitude smaller energy and time to drain. 
    more » « less
  2. Persistent memory presents a great opportunity for crash-consistent computing in large-scale computing systems. The ability to recover data upon power outage or crash events can significantly improve the availability of large-scale systems, while improving the performance of persistent data applications (e.g., database applications). However, persistent memory suffers from high write latency and requires specific programming model (e.g., Intel’s PMDK) to guarantee crash consistency, which results in long latency to persist data. To mitigate these problems, recent standards advocate for sufficient back-up power that can flush the whole cache hierarchy to the persistent memory upon detection of an outage, i.e., extending the persistence domain to include the cache hierarchy. In the secure NVM with extended persistent domain(EPD), in addition to flushing the cache hierarchy, extra actions need to be taken to protect the flushed cache data. These extra actions of secure operation could cause significant burden on energy costs and battery size. We demonstrate that naive implementations could lead to significantly expanding the required power holdup budget (e.g., 10.3x more operations than EPD system without secure memory support). The significant overhead is caused by memory accesses of secure metadata. In this paper, we present Horus, a novel EPD-aware secure memory implementation. Horus reduces the overhead during draining period of EPD system by reducing memory accesses of secure metadata. Experiment result shows that Horus reduces the draining time by 5x, compared with the naive baseline design. 
    more » « less
  3. While developed largely for higher density and lower power, byte-addressable nonvolatile memory can also allow data to persist across program runs and system crashes without the need to flush to disk or flash. If data is to be recovered after a crash, however, care must be taken to ensure that the contents of memory are consistent at all times. This can be challenging in multithreaded applications with write-back caches. We present QSTM, a persistent word-based software transactional memory (STM) system to address this problem. Unlike past such systems, QSTM is nonblocking and does not require either the modification of target data structures or the use of a wide CAS instruction. 
    more » « less
  4. Non-volatile random access memory (NVRAM) offers byte-addressable persistence at speeds comparable to DRAM. However, with caches remaining volatile, automatic cache evictions can reorder updates to memory, potentially leaving persistent memory in an inconsistent state upon a system crash. Flush and fence instructions can be used to force ordering among updates, but are expensive. This has motivated significant work studying how to write correct and efficient persistent programs for NVRAM. In this paper, we present FliT, a C++ library that facilitates writing efficient persistent code. Using the library's default mode makes any linearizable data structure durable with minimal changes to the code. FliT avoids many redundant flush instructions by using a novel algorithm to track dirty cache lines. It also allows for extra optimizations, but achieves good performance even in its default setting. To describe the FliT library's capabilities and guarantees, we define a persistent programming interface, called the P-V Interface, which FliT implements. The P-V Interface captures the expected behavior of code in which some instructions' effects are persisted and some are not. We show that the interface captures the desired semantics of many practical algorithms in the literature. We apply the FliT library to four different persistent data structures, and show that across several workloads, persistence implementations, and data structure sizes, the FliT library always improves operation throughput, by at least 2.1X over a naive implementation in all but one workload. 
    more » « less
  5. The Safe System Approach (SSA) aims to eliminate fatal and serious injury roadway crashes through a holistic view of the road system, moving away from traditional safety analysis based exclusively on historical crash data. One reason for this is the classification of crashes into broad categories (e.g., head-on, sideswipe), which does not capture crash progression or contributing factors. In this context, this paper applies crash sequence analysis to historical crash data and uses the findings to proactively identify safety issues in similar contexts, in alignment with the SSA framework. The method uses sequence-of-events information from crash data to generate clusters of crashes with similar underlying characteristics. Data from fatal and serious injury crashes from urban intersections in the state of Ohio between 2018 and 2022 were used in the analysis. The results show 12 clusters with unique characteristics that consider the sequence of events of each crash. Although derived from crash data, the clusters offer an in-depth understanding of the factors associated with each one and help identify cluster-specific countermeasures related to various SSA elements. State and local jurisdictions can use the presented methodology in transportation safety programs, by focusing on the clusters that represent local challenges or on countermeasures related to the issues of multiple clusters. Finally, the method can also be associated with site-specific analysis, providing a comprehensive toolkit for practitioners. 
    more » « less