Garbage collection (GC) support for unmanaged languages can reduce programming burden in reasoning about liveness of dynamic objects. It also avoids temporal memory safety violations and memory leaks.
In this paper, we propose Dynamic Pointer Provenance Tracking to realize sound GC. We observe that pointers cannot be created out-of-thin-air, and they must have provenance to at least one valid allocation. Therefore, by tracking pointer provenance from the source (e.g., malloc) through both explicit data-flow and implicit control-flow, our GC has sound and precise information to compute the set of all reachable objects at any program state. We discuss several static analysis optimizations, that can be employed during compilation aided with profiling, to significantly reduce the overhead of dynamic provenance tracking from nearly 8× to 16% for well-behaved programs that adhere to the C standards. Pointer provenance based sound GC invocation is also 13% faster and reclaims 6% more memory on average, compared to an unsound value-based GC.
more » « less- Award ID(s):
- 1703931
- PAR ID:
- 10493944
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Proceedings of the ACM on Programming Languages
- Volume:
- 4
- Issue:
- OOPSLA
- ISSN:
- 2475-1421
- Page Range / eLocation ID:
- 1 to 28
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
The proliferation of fast, dense, byte-addressable nonvolatile memory suggests the possibility of keeping data in pointer-rich “in-memory” format across program runs and even crashes. For full generality, such data requires dynamic memory allocation. Toward this end, we introduce _recoverability_, a correctness criterion for persistent allocators, together with a nonblocking allocator, _Ralloc_, that satisfies this criterion. Ralloc is based on _LRMalloc_, with three key innovations. First, we persist just enough information during normal operation to permit reconstruction of the heap after a full-system crash. Our reconstruction mechanism performs garbage collection (GC) to identify and remedy any failure-induced memory leaks. Second, in support of GC, we introduce the notion of _filter functions_, which identify the locations of pointers within persistent blocks. Third, to allow persistent regions to be mapped at an arbitrary address, we employ the position-independent pointer representation of Chen et al., both in data and in allocator metadata. Experiments show that Ralloc provides scalable performance competitive to that of both _Makalu_, the state-of-the-art lock-based persistent allocator, and the best transient allocators (e.g.,_JEMalloc_).more » « less
-
The proliferation of fast, dense, byte-addressable nonvolatile memory suggests that data might be kept in pointer-rich “in-memory” format across program runs and even process and system crashes. For full generality, such data requires dynamic memory allocation, and while the allocator could in principle be “rolled into” each data structure, it is desirableto make it a separate abstraction. Toward this end, we introduce _recoverability_, a correctness criterion for persistent allocators, together with a nonblocking allocator, _Ralloc,_ that satisfies this criterion. Ralloc is based on the _LRMalloc_ of Leite and Rocha, with four key innovations: First, we persist just enough information during normal operation to permit a garbage collection (GC) pass to correctly reconstruct the heap in the wake of a full-system crash. Second, we introduce the notion of _filter functions_, which identify the locations of pointers within persistent blocks to mitigate the limitations of conservative GC. Third, we reorganize the layout of the heap to facilitate the incremental allocation of physical space. Fourth, we employ position-independent (offset-based) pointers to allow persistent regions to be mapped at an arbitrary address. Experiments show Ralloc to be performance-competitive with both _Makalu_, the state-of-the-art lock-based persistent allocator, and such transient allocators as LRMalloc and JEMalloc. In particular, reliance on GC and offline metadata reconstruction allows Ralloc to pay almost nothing for persistence during normal operation.more » « less
-
Temporal memory safety bugs, especially use-after-free and double free bugs, pose a major security threat to C programs. Real-world exploits utilizing these bugs enable attackers to read and write arbitrary memory locations, causing disastrous violations of confidentiality, integrity, and availability. Many previous solutions retrofit temporal memory safety to C, but they all either incur high performance overhead and/or miss detecting certain types of temporal memory safety bugs.
In this paper, we propose a temporal memory safety solution that is both efficient and comprehensive. Specifically, we extend Checked C, a spatially-safe extension to C, with temporally-safe pointers. These are implemented by combining two techniques: fat pointers and dynamic key-lock checks. We show that the fat-pointer solution significantly improves running time and memory overhead compared to the disjoint-metadata approach that provides the same level of protection. With empirical program data and hands-on experience porting real-world applications, we also show that our solution is practical in terms of backward compatibility---one of the major complaints about fat pointers.
-
This paper introduces a new data-structural object that we call the tiny pointer. In many applications, traditional log n-bit pointers can be replaced with o(log n)-bit tiny pointers at the cost of only a constant-factor time overhead and a small probability of failure. We develop a comprehensive theory of tiny pointers, and give optimal constructions for both fixed-size tiny pointers (i.e., settings in which all of the tiny pointers must be the same size) and variable-size tiny pointers (i.e., settings in which the average tiny-pointer size must be small, but some tiny pointers can be larger). If a tiny pointer references an element in an array filled to load factor 1 — δ, then the optimal tiny-pointer size is Θ(log log log n + log δ-1) bits in the fixed-size case, and Θ(log δ-1) expected bits in the variable-size case. Our tiny-pointer constructions also require us to revisit several classic problems having to do with balls and bins; these results may be of independent interest. Using tiny pointers, we revisit five classic data-structure problems. We show that: • A data structure storing n v-bit values for n keys with constant-time modifications/queries can be implemented to take space nv + O(n log(r) n) bits, for any constant r > 0, as long as the user stores a tiny pointer of expected size O(1) with each key—here, log(r) n is the r-th iterated logarithm. • Any binary search tree can be made succinct with constant-factor time overhead, and can even be made to be within O(n) bits of optimal if we allow for O(log* n)-time modifications—this holds even for rotation-based trees such as the splay tree and the red-black tree. • Any fixed-capacity key-value dictionary can be made stable (i.e., items do not move once inserted) with constant-time overhead and 1 + o(1) space overhead. • Any key-value dictionary that requires uniform-size values can be made to support arbitrary-size values with constant-time overhead and with an additional space consumption of log(r) n + O(log j) bits per j-bit value for an arbitrary constant r > 0 of our choice. • Given an external-memory array A of size (1 + ε)n containing a dynamic set of up to n key-value pairs, it is possible to maintain an internal-memory stash of size O(n log ε-1) bits so that the location of any key-value pair in A can be computed in constant time (and with no IOs). These are all well studied and classic problems, and in each case tiny pointers allow for us to take a natural space-inefficient solution that uses pointers and make it space-efficient for free.more » « less
-
null (Ed.)Use-after-free (UAF) vulnerabilities, in which dangling pointers remain after memory is released, remain a persistent problem for applications written in C and C++. In order to protect legacy code, prior work has attempted to track pointer propagation and invalidate dangling pointers at deallocation time, but this work has gaps in coverage, as it lacks support for tracking program variables promoted to CPU registers. Moreover, we find that these gaps can significantly hamper detection of UAF bugs: in a preliminary study with OSS-Fuzz, we found that more than half of the UAFs in real-world programs we examined (10/19) could not be detected by prior systems due to register promotion. In this paper, we introduce HeapExpo, a new system that fills this gap in coverage by parsimoniously identifying potential dangling pointer variables that may be lifted into registers by the compiler and marking them as volatile. In our experiments, we find that HeapExpo effectively detects UAFs missed by other systems with an overhead of 35% on the majority of SPEC CPU2006 and 66% when including two benchmarks that have high amounts of pointer propagation.more » « less