skip to main content


Title: Turning manual concurrent memory reclamation into automatic reference counting
Safe memory reclamation (SMR) schemes are an essential tool for lock-free data structures and concurrent programming. However, manual SMR schemes are notoriously difficult to apply correctly, and automatic schemes, such as reference counting, have been argued for over a decade to be too slow for practical purposes. A recent wave of work has disproved this long-held notion and shown that reference counting can be as scalable as hazard pointers, one of the most common manual techniques. Despite these tremendous improvements, there remains a gap of up to 2x or more in performance between these schemes and faster manual techniques such as epoch-based reclamation (EBR). In this work, we first advance these ideas and show that in many cases, automatic reference counting can in fact be as fast as the fastest manual SMR techniques.We generalize our previous algorithm called Concurrent Deferred Reference Counting (CDRC) to obtain a method for converting any standard manual SMR technique into an automatic reference counting technique with a similar performance profile. Our second contribution is extending this framework to support weak pointers, which are reference-counted pointers that automatically break pointer cycles by not contributing to the reference count, thus addressing a common weakness in reference-counted garbage collection. Our experiments with a C++-library implementation show that our automatic techniques perform in line with their manual counterparts, and that our weak pointer implementation outperforms the best known atomic weak pointer library by up to an order of magnitude on high thread counts. All together, we show that the ease of use of automatic memory management can be achieved without significant cost to practical performance or general applicability.  more » « less
Award ID(s):
1919223 2119352 1901381 1910030
NSF-PAR ID:
10416275
Author(s) / Creator(s):
; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation
Page Range / eLocation ID:
61 to 75
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper we present interval-based reclamation (IBR), a new approach to safe reclamation of disconnected memory blocks in nonblocking concurrent data structures. Safe reclamation is a difficult problem: a thread, before freeing a block, must ensure that no other threads are accessing that block; the required synchronization tends to be expensive. In contrast with epoch-based reclamation, in which threads reserve all blocks created after a certain time, or pointer-based reclamation (e.g., hazard pointers), in which threads reserve individual blocks, IBR allows a thread to reserve all blocks known to have existed in a bounded interval of time. By comparing a thread's reserved interval with the lifetime of a detached but not yet reclaimed block, the system can determine if the block is safe to free. Like hazard pointers, IBR avoids the possibility that a single stalled thread may reserve an unbounded number of blocks; unlike hazard pointers, it avoids a memory fence on most pointer-following operations. It also avoids the need to explicitly "unreserve" a no-longer-needed pointer. We describe three specific IBR schemes (one with several variants) that trade off performance, applicability, and space requirements. IBR requires no special hardware or OS support. In experiments with data structure microbenchmarks, it also compares favorably (in both time and space) to other state-of-the-art approaches, making it an attractive alternative for libraries of concurrent data structures. 
    more » « less
  2. Dynamic memory managers are a crucial component of almost every modern software system. In addition to implementing efficient allocation and reclamation, memory managers provide the essential abstraction of memory as distinct objects, which underpins the properties of memory safety and type safety. Bugs in memory managers, while not common, are extremely hard to diagnose and fix. One reason is that their implementations often involve tricky pointer calculations, raw memory manipulation, and complex memory state invariants. While these properties are often documented, they are not specified in any precise, machine-checkable form. A second reason is that memory manager bugs can break the client application in bizarre ways that do not immediately implicate the memory manager at all. A third reason is that existing tools for debugging memory errors, such as Memcheck, cannot help because they rely on correct allocation and deallocation information to work. In this paper we present Permchecker, a tool designed specifically to detect and diagnose bugs in memory managers. The key idea in Permchecker is to make the expected structure of the heap explicit by associating typestates with each piece of memory. Typestate captures elements of both type (e.g., page, block, or cell) and state (e.g., allocated, free, or forwarded). Memory manager developers annotate their implementation with information about the expected typestates of memory and how heap operations change those typestates. At runtime, our system tracks the typestates and ensures that each memory access is consistent with the expected typestates. This technique detects errors quickly, before they corrupt the application or the memory manager itself, and it often provides accurate information about the reason for the error. The implementation of Permchecker uses a combination of compile-time annotation and instrumentation, and dynamic binary instrumentation (DBI). Because the overhead of DBI is fairly high, Permchecker is suitable for a testing and debugging setting and not for deployment. It works on a wide variety of existing systems, including explicit malloc/free memory managers and garbage collectors, such as those found in JikesRVM and OpenJDK. Since bugs in these systems are not numerous, we developed a testing methodology in which we automatically inject bugs into the code using bug patterns derived from real bugs. This technique allows us to test Permchecker on hundreds or thousands of buggy variants of the code. We find that Permchecker effectively detects and localizes errors in the vast majority of cases; without it, these bugs result in strange, incorrect behaviors usually long after the actual error occurs. 
    more » « less
  3. Multiversioning is widely used in databases, transactional memory, and concurrent data structures. It can be used to support read-only transactions that appear atomic in the presence of concurrent update operations. Any system that maintains multiple versions of each object needs a way of efficiently reclaiming them.We experimentally compare various existing reclamation techniques by applying them to a multiversion tree and a multiversion hash table. Using insights from these experiments, we develop two new multiversion garbage collection (MVGC) techniques. These techniques use two novel concurrent version list data structures. Our experimental evaluation shows that our fastest technique is competitive with the fastest existing MVGC techniques, while using significantly less space on some workloads. Our new techniques provide strong theoretical bounds, especially on space usage. These bounds ensure that the schemes have consistent performance, avoiding the very high worst-case space usage of other techniques. 
    more » « less
  4. null (Ed.)
    —We present the skip vector, a novel highperformance concurrent data structure based on the skip list. The key innovation in the skip vector is to flatten the index and data layers of the skip list into vectors. This increases spatial locality, reduces synchronization overhead, and avoids much of the costly pointer chasing that skip lists incur. We evaluate a skip vector implementation in C++. Our implementation coordinates interactions among threads by utilizing optimistic traversal with sequence locks. To ensure memory safety, it employs hazard pointers; this leads to tight bounds on wasted space, but due to the skip vector design, does not lead to high overhead. Performance of the skip vector for small data set sizes is higher than for a comparable skip list, and as the amount of data increases, the benefits of the skip vector over a skip list increase. 
    more » « less
  5. This paper introduces a new data-structural object that we call the tiny pointer. In many applications, traditional log n-bit pointers can be replaced with o(log n)-bit tiny pointers at the cost of only a constant-factor time overhead and a small probability of failure. We develop a comprehensive theory of tiny pointers, and give optimal constructions for both fixed-size tiny pointers (i.e., settings in which all of the tiny pointers must be the same size) and variable-size tiny pointers (i.e., settings in which the average tiny-pointer size must be small, but some tiny pointers can be larger). If a tiny pointer references an element in an array filled to load factor 1 — δ, then the optimal tiny-pointer size is Θ(log log log n + log δ-1) bits in the fixed-size case, and Θ(log δ-1) expected bits in the variable-size case. Our tiny-pointer constructions also require us to revisit several classic problems having to do with balls and bins; these results may be of independent interest. Using tiny pointers, we revisit five classic data-structure problems. We show that: • A data structure storing n v-bit values for n keys with constant-time modifications/queries can be implemented to take space nv + O(n log(r) n) bits, for any constant r > 0, as long as the user stores a tiny pointer of expected size O(1) with each key—here, log(r) n is the r-th iterated logarithm. • Any binary search tree can be made succinct with constant-factor time overhead, and can even be made to be within O(n) bits of optimal if we allow for O(log* n)-time modifications—this holds even for rotation-based trees such as the splay tree and the red-black tree. • Any fixed-capacity key-value dictionary can be made stable (i.e., items do not move once inserted) with constant-time overhead and 1 + o(1) space overhead. • Any key-value dictionary that requires uniform-size values can be made to support arbitrary-size values with constant-time overhead and with an additional space consumption of log(r) n + O(log j) bits per j-bit value for an arbitrary constant r > 0 of our choice. • Given an external-memory array A of size (1 + ε)n containing a dynamic set of up to n key-value pairs, it is possible to maintain an internal-memory stash of size O(n log ε-1) bits so that the location of any key-value pair in A can be computed in constant time (and with no IOs). These are all well studied and classic problems, and in each case tiny pointers allow for us to take a natural space-inefficient solution that uses pointers and make it space-efficient for free. 
    more » « less