NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Reconciling Hardware Transactional Memory and Persistent Programming with Buffered Durability

https://doi.org/10.1145/3694906.3743321

Du, Mingzhe; Su, Ziheng; Scott, Michael L (July 2025, ACM)

Hardware Transactional Memory (HTM) simplifies concurrent programming and can accelerate multithreaded execution through lock elision. Non-Volatile Memory (NVM) combines the speed and byte addressability of DRAM with the durability of storage, enabling the construction of high-performance, persistent data structures. Unfortunately, the write-back instructions typically needed to ensure post-crash consistency in NVM cause HTM transactions to abort, precluding the straightforward combination of HTM and persistent data structures. The problem goes away on machines with persistent caches, but these require special battery-backed circuitry and are far from commonplace.To combine HTM and persistent data structures, we advocate for buffered durable linearizability (BDL), a relaxed correctness criterion that enables recovery to a "recent" consistent state in the wake of a crash, allowing writes-back to occur outside transactions.Significantly, BDL retains the persistence guarantees of storage systems—such as databases backed by disks or flash—that have relied on buffering for decades.The combination of HTM and buffered durability enables three separate usage scenarios. First, we add durability to an existing HTM-based structure (a van Emde Boas tree due to Khalaji et al.); second, we use HTM to simplify an existing persistent structure (a skiplist due to Wang et al.); third, we "back port" an HTM-based structure optimized for persistent caches (a hash table due to Zhang et al.) to work well on more conventional processors. The first two scenarios yield several-fold improvements in throughput; the third sees very little slowdown.
more » « less
Free, publicly-accessible full text available July 16, 2026
Buffered Persistence in B+ Trees

https://doi.org/10.1145/3698801

Du, Mingzhe; Scott, Michael_L (December 2024, Proceedings of the ACM on Management of Data)

Non-volatile Memory (NVM) offers the opportunity to build large, durable B+ trees with markedly higher performance and faster post-crash recovery than is possible with traditional disk- or flash-based persistence. Unfortunately, cache flush and fence instructions, required for crash consistency and failure atomicity on many machines, introduce substantial overhead not present in non-persistent trees, and force additional NVM reads and writes. The overhead is particularly pronounced in workloads that benefit from cache reuse due to good temporal locality or small working sets---traits commonly observed in real-world applications. In this paper, we propose a buffered durable B+ tree (BD+Tree) that improves performance and reduces NVM traffic viarelaxedpersistence. Execution of a BD+Tree is divided intoepochsof a few milliseconds each; if a crash occurs in epoche,the tree recovers to its state as of the end of epoche-2. (The persistence boundary can always be made current with an explicit sync operation, which quickly advances the epoch by 2.) NVM writes within an epoch are aggregated for delayed persistence, thereby increasing cache reuse and reducing traffic to NVM. In comparison to state-of-the-art persistent B+ trees, our micro-benchmark experiments show that BD+Tree can improve throughput by up to 2.4x and reduce NVM writes by up to 90% when working sets are small or workloads exhibit strong temporal locality. On real-world workloads that benefit from cache reuse, BD+Tree realizes throughput improvements of 1.1--2.4x and up to a 99% decrease in NVM writes. Even on uniform workloads, with working sets that significantly exceed cache capacity, BD+Tree still improves throughput by 1--1.3x. The performance advantage of BD+Tree increases with larger caches, suggesting ongoing benefits as CPUs evolve toward gigabyte cache capacities.
more » « less
Transactional Composition of Nonblocking Data Structures

https://doi.org/10.1145/3558481.3591079

Cai, Wentao; Wen, Haosen; Scott, Michael L. (June 2023, 35th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA))

This paper introduces nonblocking transaction composition (NBTC), a new methodology for atomic composition of nonblocking operations on concurrent data structures. Unlike previous software transactional memory (STM) approaches, NBTC leverages the linearizability of existing nonblocking structures, reducing the number of memory accesses that must be executed together, atomically, to only one per operation in most cases (these are typically the linearizing instructions of the constituent operations). Our obstruction-free implementation of NBTC, which we call Medley, makes it easy to transform most nonblocking data structures into transactional counterparts while preserving their liveness and high concurrency. In our experiments, Medley outperforms Lock-Free Transactional Transform (LFTT), the fastest prior competing methodology, by 40--170%. The marginal overhead of Medley's transactional composition, relative to separate operations performed in succession, is roughly 2.2x. For persistent data structures, we observe that failure atomicity for transactions can be achieved "almost for free'' with epoch-based periodic persistence. Toward that end, we integrate Medley with nbMontage, a general system for periodically persistent data structures. The resulting txMontage provides ACID transactions and achieves throughput up to two orders of magnitude higher than that of the OneFile persistent STM system.
more » « less
Full Text Available
Fat Pointers for Temporal Memory Safety of C

https://doi.org/10.1145/3586038

Zhou, Jie; Criswell, John; Hicks, Michael (April 2023, Proceedings of the ACM on Programming Languages)

Temporal memory safety bugs, especially use-after-free and double free bugs, pose a major security threat to C programs. Real-world exploits utilizing these bugs enable attackers to read and write arbitrary memory locations, causing disastrous violations of confidentiality, integrity, and availability. Many previous solutions retrofit temporal memory safety to C, but they all either incur high performance overhead and/or miss detecting certain types of temporal memory safety bugs. In this paper, we propose a temporal memory safety solution that is both efficient and comprehensive. Specifically, we extend Checked C, a spatially-safe extension to C, with temporally-safe pointers. These are implemented by combining two techniques: fat pointers and dynamic key-lock checks. We show that the fat-pointer solution significantly improves running time and memory overhead compared to the disjoint-metadata approach that provides the same level of protection. With empirical program data and hands-on experience porting real-world applications, we also show that our solution is practical in terms of backward compatibility---one of the major complaints about fat pointers.
more » « less
Full Text Available
Transactional Composition of Nonblocking Data Structures

https://doi.org/10.1145/3572848.3577503

Cai, Wentao; Wen, Haosen; Scott, Michael L. (February 2023, 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP))

We introduce nonblocking transaction composition (NBTC), a new methodology for atomic composition of nonblocking operations on concurrent data structures. Unlike previous software transactional memory (STM) approaches, NBTC leverages the linearizability of existing nonblocking structures, reducing the number of memory accesses that must be executed together, atomically, to only one per operation in most cases (these are typically the linearizing instructions of the constituent operations). Our obstruction-free implementation of NBTC, which we call Medley, makes it easy to transform most nonblocking data structures into transactional counterparts while preserving their nonblocking liveness and high concurrency. In our experiments, Medley outperforms Lock-Free Transactional Transform (LFTT), the fastest prior competing methodology, by 40--170%. The marginal overhead of Medley's transactional composition, relative to separate operations performed in succession, is roughly 2.2×. For persistent memory, we observe that failure atomicity for transactions can be achieved "almost for free" with epoch-based periodic persistence. Toward that end, we integrate Medley with nbMontage, a general system for periodically persistent data structures. The resulting txMontage provides ACID transactions and achieves throughput up to two orders of magnitude higher than that of the OneFile persistent STM system.
more » « less
Full Text Available
Randezvous: Making Randomization Effective on MCUs

https://doi.org/10.1145/3564625.3567970

Shen, Zhuojia; Dharsee, Komail; Criswell, John (December 2022, Annual Computer Security Applications Conference)

Internet-of-Things devices such as autonomous vehicular sensors, medical devices, and industrial cyber-physical systems commonly rely on small, resource-constrained microcontrollers (MCUs). MCU software is typically written in C and is prone to memory safety vulnerabilities that are exploitable by remote attackers to launch code reuse attacks and code/control data leakage attacks. We present Randezvous, a highly performant diversification-based mitigation to such attacks and their brute force variants on ARM MCUs. Atop code/data layout randomization and an efficient execute-only code approach, Randezvous creates decoy pointers to camouflage control data in memory; code pointers in the stack are then protected by a diversified shadow stack, local-to-global variable promotion, and return address nullification. Moreover, Randezvous adds a novel delayed reboot mechanism to slow down persistent attacks and mitigates control data spraying attacks via global guards. We demonstrate Randezvous’s security by statistically modeling leakage-equipped brute force attacks under Randezvous, crafting a proof-of-concept exploit that shows Randezvous’s efficacy, and studying a real-world CVE. Our evaluation of Randezvous shows low overhead on three benchmark suites and two applications.
more » « less
Full Text Available
Holistic Control-Flow Protection on Real-Time Embedded Systems with Kage

Du, Yufei; Shen, Zhuojia; Dharsee, Komail; Zhou, Jie; Walls, Robert J.; Criswell, John (August 2022, Usenix Security Symposium)

This paper presents Kage: a system that protects the control data of both application and kernel code on microcontroller-based embedded systems. Kage consists of a Kage-compliant embedded OS that stores all control data in separate memory regions from untrusted data, a compiler that transforms code to protect these memory regions efficiently and to add forward-edge control-flow integrity checks, and a secure API that allows safe updates to the protected data. We implemented Kage as an extension to FreeRTOS, an embedded real-time operating system. We evaluated Kage’s performance using the CoreMark benchmark. Kage incurred a 5.2% average run-time overhead and 49.8% code size overhead. Furthermore, the code size overhead was only 14.2% when compared to baseline FreeRTOS with the MPU enabled. We also evaluated Kage’s security guarantees by measuring and analyzing reachable code-reuse gadgets. Compared to FreeRTOS, Kage reduces the number of reachable gadgets from 2,276 to 27, and the remaining 27 gadgets cannot be stitched together to launch a practical attack.
more » « less
Full Text Available
How Should We Think about Persistent Data Structures?

https://doi.org/10.1145/3519270.3538455

Scott, Michael L. (July 2022, 41st ACM Symposium on Principles of Distributed Computing (PODC))

Salerno, Italy
more » « less
Full Text Available
Fast Nonblocking Persistence for Concurrent Data Structures (extended abstract)

Cai, Wentao; Wen, Haosen; Maksimovski, Vladimir; Du, Mingzhe; Sanna, Rafaello; Abdallah, Shreif; Scott, Michael L. (May 2022, 13th Annual Non-Volatile Memories Workshop)

San Diego, CA
more » « less
Full Text Available
Fast Nonblocking Persistence for Concurrent Data Structures

https://doi.org/10.4230/LIPIcs.DISC.2021.14

Cai, Wentao; Wen, Haosen; Maksimovski, Vladimir; Du, Mingzhe; Sanna, Rafaello; Abdallah, Shreif; Scott, Michael L. (October 2021, 35th Intl. Symp. on Distributed Computing (DISC))
null (Ed.)
We present a fully lock-free variant of our recent Montage system for persistent data structures. The variant, nbMontage, adds persistence to almost any nonblocking concurrent structure without introducing significant overhead or blocking of any kind. Like its predecessor, nbMontage is buffered durably linearizable: it guarantees that the state recovered in the wake of a crash will represent a consistent prefix of pre-crash execution. Unlike its predecessor, nbMontage ensures wait-free progress of the persistence frontier, thereby bounding the number of recent updates that may be lost on a crash, and allowing a thread to force an update of the frontier (i.e., to perform a sync operation) without the risk of blocking. As an extra benefit, the helping mechanism employed by our wait-free sync significantly reduces its latency. Performance results for nonblocking queues, skip lists, trees, and hash tables rival custom data structures in the literature – dramatically faster than achieved with prior general-purpose systems, and generally within 50% of equivalent non-persistent structures placed in DRAM.
more » « less
Full Text Available

« Prev Next »

Search for: All records