skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: VERLIB: Concurrent Versioned Pointers
Recent work has shown how to augment any CAS-based concurrent data structure to support taking a snapshot of the current memory state. Taking the snapshot, as well as loads and CAS (Compare and Swap) operations, take constant time. Importantly, such snapshotting can be used to easily implement linearizable queries, such as range queries, over any part of a data structure. In this paper, we make two significant improvements over this approach. The first improvement removes a subtle and hard to reason about restriction that was needed to avoid a level of indirection on pointers. We introduce an approach, which we refer to as indirection-on-need, that removes the restriction, but yet almost always avoids indirection. The second improvement is to efficiently support snapshotting with lock-free locks. This requires supporting an idempotent CAS. We show a particularly simple solution to the problem that leverages the data structures used for snapshotting. Based on these ideas we implemented an easy-to-use C++ library, verlib, centered around a versioned pointer type. The library works with lock (standard or lock-free) and CAS based algorithms, or any combination. Converting existing concurrent data-structures to use the library takes minimal effort. We present results for experiments that use verlib to convert state-of-the-art data structures for ordered maps (a B-tree), radix-ordered maps (an ART-tree), and unordered maps (an optimized hash table) to be snapshottable. The snapshottable versions perform almost as well as the original versions and far outperform any previous implementations that support atomic range queries.  more » « less
Award ID(s):
2119352 1919223
PAR ID:
10539480
Author(s) / Creator(s):
;
Publisher / Repository:
ACM Symposium on Principles and Practice of Parallel Programming (PPoPP '24)
Date Published:
ISBN:
9798400704352
Page Range / eLocation ID:
200 to 214
Format(s):
Medium: X
Location:
Edinburgh United Kingdom
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper presents a new and practical approach to lock-free locks based on helping, which allows the user to write code using fine-grained locks, but run it in a lock-free manner. Although lock-free locks have been suggested in the past, they are widely viewed as impractical, have some key limitations, and, as far as we know, have never been implemented. The paper presents some key techniques that make lock-free locks practical and more general. The most important technique is an approach to idempotence—i.e. making code that runs multiple times appear as if it ran once. The idea is based on using a shared log among processes running the same protected code. Importantly, the approach can be library based, requiring very little if any change to standard code—code just needs to use the idempotent versions of memory operations (load, store, LL/SC, allocation, free). We have implemented a C++ library called Flock based on the ideas. Flock allows lock-based data structures to run in either lock-free or blocking (traditional locks) mode. We implemented a variety of tree and list-based data structures with Flock and compare the performance of the lock-free and blocking modes under a variety of workloads. The lock-free mode is almost as fast as blocking mode under almost all workloads, and significantly faster when threads are oversubscribed (more threads than processors). We also compare with several existing lock-based and lock-free alternatives. 
    more » « less
  2. In-memory data management systems, such as key-value stores, have become an essential infrastructure in today's big-data processing and cloud computing. They rely on efficient index structures to access data. While unordered indexes, such as hash tables, can perform point search with O(1) time, they cannot be used in many scenarios where range queries must be supported. Many ordered indexes, such as B+ tree and skip list, have a O(log N) lookup cost, where N is number of keys in an index. For an ordered index hosting billions of keys, it may take more than 30 key-comparisons in a lookup, which is an order of magnitude more expensive than that on a hash table. With availability of large memory and fast network in today's data centers, this O(log N) time is taking a heavy toll on applications that rely on ordered indexes. In this paper we introduce a new ordered index structure, named Wormhole, that takes O(log L) worst-case time for looking up a key with a length of L. The low cost is achieved by simultaneously leveraging strengths of three indexing structures, namely hash table, prefix tree, and B+ tree, to orchestrate a single fast ordered index. Wormhole's range operations can be performed by a linear scan of a list after an initial lookup. This improvement of access efficiency does not come at a price of compromised space efficiency. Instead, Wormhole's index space is comparable to those of B+ tree and skip list. Experiment results show that Wormhole outperforms skip list, B+ tree, ART, and Masstree by up to 8.4x, 4.9x, 4.3x, and 6.6x in terms of key lookup throughput, respectively. 
    more » « less
  3. null (Ed.)
    The Transactional Data Structure Library (TDSL) methodology improves the programmability and performance of concurrent software by making it possible for programmers to compose multiple concurrent data structure operations into coarse-grained transactions. Like transactional memory, TDSL enables arbitrarily many operations on arbitrarily many data structures to appear to other threads as a single atomic, isolated transaction. Like concurrent data structures, the individual operations on a TDSL data structure are optimized to avoid artificial contention. We introduce techniques for reducing false conflicts in TDSL implementations. Our approach allows expressing the postconditions of operations entirely via semantic properties, instead of through low-level structural properties. Our design is general enough to support lists, deques, ordered and unordered maps, and vectors. It supports richer programming interfaces than are available in existing TDSL implementations. It is also capable of precise memory management, which is necessary in low-level languages like C++. 
    more » « less
  4. Summary This paper presents an extended version of our previous work on using compiler technology to automatically convert sequential C++ data abstractions, for example, queues, stacks, maps, and trees, to concurrent lock‐free implementations. A key difference between our work and existing research in software transactional memory (STM) is that our compiler‐based approach automatically selects the best state‐of‐the‐practice nonblocking synchronization method for the underlying sequential implementation of the data structure. The extended material includes a broader collection of the state‐of‐the‐practice lock‐free synchronization techniques, additional formal correctness proofs of the overall integration of the different synchronizations in our system, and a more comprehensive experimental study of the integrated techniques. We evaluate our compiler‐generated nonblocking data structures both by using a collection of micro‐benchmarks, including the Synchrobench suite, and by using a multi‐threaded application Dedup from PARSEC. Our automatically synchronized code attains performance competitive to that of concurrent data structures manually‐written by experts and much better performance than heavier‐weight support by STM. 
    more » « less
  5. Multiversioning is widely used in databases, transactional memory, and concurrent data structures. It can be used to support read-only transactions that appear atomic in the presence of concurrent update operations. Any system that maintains multiple versions of each object needs a way of efficiently reclaiming them.We experimentally compare various existing reclamation techniques by applying them to a multiversion tree and a multiversion hash table. Using insights from these experiments, we develop two new multiversion garbage collection (MVGC) techniques. These techniques use two novel concurrent version list data structures. Our experimental evaluation shows that our fastest technique is competitive with the fastest existing MVGC techniques, while using significantly less space on some workloads. Our new techniques provide strong theoretical bounds, especially on space usage. These bounds ensure that the schemes have consistent performance, avoiding the very high worst-case space usage of other techniques. 
    more » « less