The skip list is an elegant dictionary data structure that is com monly deployed in RAM. A skip list with N elements supports searches, inserts, and deletes in O(logN) operations with high probability (w.h.p.) and range queries returning K elements in O(log N + K) operations w.h.p.
A seemingly natural way to generalize the skip list to external memory with block size B is to “promote” with probability 1/B, rather than 1/2. However, there are practical and theoretical obsta cles to getting the skip list to retain its efficient performance, space bounds, and highprobability guarantees.
We give an externalmemory skip list that achieves write optimized bounds. That is, for 0 < ε < 1, range queries take O(logBε N + K/B) I/Os w.h.p. and insertions and deletions take O((logBε N)/B1−ε) amortized I/Os w.h.p.
Our writeoptimized skip list inherits the virtue of simplicity from RAM skip lists. Moreover, it matches or beats the asymptotic bounds of prior writeoptimized data structures such as Bε trees or LSM trees. These data structures are deployed in highperformance databases and file systems.
more »
« less
Optimal Hashing in External Memory
Hash tables are a ubiquitous class of dictionary data structures. However, standard hash table implementations do not translate well into the external memory model, because they do not incorporate locality for insertions.
Iacono and Pătraşu established an update/query tradeoff curve for externalhash tables: a hash table that performs insertions in O(λ/B) amortized IOs requires Ω(logλ N) expected IOs for queries, where N is the number of items that can be stored in the data structure, B is the size of a memory transfer, M is the size of memory, and λ is a tuning parameter. They provide a complicated hashing data structure, which we call the IP hash table, that meets this curve for λ that is Ω(loglogM +logM N).
In this paper, we present a simpler externalmemory hash table, the Bundle of Arrays Hash Table (BOA), that is optimal for a narrower range of λ. The simplicity of BOAs allows them to be readily modified to achieve the following results:
A new externalmemory data structure, the Bundle of Trees Hash Table (BOT), that matches the performance of the IP hash table, while retaining some of the simplicity of the BOAs.
The CacheOblivious Bundle of Trees Hash Table (COBOT), the first cacheoblivious hash table. This data structure matches the optimality of BOTs and IP hash tables over the same range of λ.
more »
« less
 Award ID(s):
 1637458
 NSFPAR ID:
 10192331
 Date Published:
 Journal Name:
 Leibniz international proceedings in informatics
 Volume:
 107
 ISSN:
 18688969
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


First introduced in 1954, the linearprobing hash table is among the oldest data structures in computer science, and thanks to its unrivaled data locality, linear probing continues to be one of the fastest hash tables in practice. It is widely believed and taught, however, that linear probing should never be used at high load factors; this is because of an effect known as primary clustering which causes insertions at a load factor of 1  1/x to take expected time O(x2) (rather than the intuitive running time of ⇥(x)). The dangers of primary clustering, first discovered by Knuth in 1963, have now been taught to generations of computer scientists, and have influenced the design of some of the most widely used hash tables in production. We show that primary clustering is not the foregone conclusion that it is reputed to be. We demonstrate that seemingly small design decisions in how deletions are implemented have dramatic effects on the asymptotic performance of insertions: if these design decisions are made correctly, then even if a hash table operates continuously at a load factor of 1  (1/x), the expected amortized cost per insertion/deletion is O(x). This is because the tombstones left behind by deletions can actually cause an anticlustering effect that combats primary clustering. Interestingly, these design decisions, despite their remarkable effects, have historically been viewed as simply implementationlevel engineering choices. We also present a new variant of linear probing (which we call graveyard hashing) that completely eliminates primary clustering on any sequence of operations: if, when an operation is performed, the current load factor is 1 1/x for some x, then the expected cost of the operation is O(x). Thus we can achieve the data locality of traditional linear probing without any of the disadvantages of primary clustering. One corollary is that, in the externalmemory model with a data blocks of size B, graveyard hashing offers the following remarkably strong guarantee: at any load factor 1 1/x satisfying x = o(B), graveyard hashing achieves 1 + o(1) expected block transfers per operation. In contrast, past externalmemory hash tables have only been able to offer a 1 + o(1) guarantee when the block size B is at least O(x2). Our results come with actionable lessons for both theoreticians and practitioners, in particular, that well designed use of tombstones can completely change the asymptotic landscape of how the linear probing behaves (and even in workloads without deletions).more » « less

Dictionaries remain the most well studied class of data structures. A dictionary supports insertions, deletions, membership queries, and usually successor, predecessor, and extractmin. In a RAM, all such operations take O(log n) time on n elements. Dictionaries are often crossreferenced as follows. Consider a set of tuples {〈ai,bi,ci…〉}. A database might include more than one dictionary on such a set, for example, one indexed on the a ‘s, another on the b‘s, and so on. Once again, in a RAM, inserting into a set of L crossreferenced dictionaries takes O(L log n) time, as does deleting. The situation is more interesting in external memory. On a Disk Access Machine (DAM), Btrees achieve O(logB N) I/Os for insertions and deletions on a single dictionary and Kelement range queries take optimal O(logB N + K/B) I/Os. These bounds are also achievable by a Btree on crossreferenced dictionaries, with a slowdown of an L factor on insertion and deletions. In recent years, both the theory and practice of external memory dictionaries has been revolutionized by write optimization techniques. A dictionary is write optimized if it is close to a Btree for query time while beating Btrees on insertions. The best (and optimal) dictionaries achieve a substantially improved insertion and deletion cost of amortized I/Os on a single dictionary while maintaining optimal O(log1+B∊ N + K/B) I/O range queries. Although write optimization still helps for insertions into crossreferenced dictionaries, its value for deletions would seem to be greatly reduced. A deletion into a cross referenced dictionary only specifies a key a. It seems to be necessary to look up the associated values b, c … in order to delete them from the other dictionaries. This takes Ω(logB N) I/Os, well above the perdictionary writeoptimization budget of So the total deletion cost is In short, for deletions, write optimization offers an advantage over Btrees in that L multiplies a lower order term, but when L = 2, write optimization seems to offer no asymptotic advantage over Btrees. That is, no known query optimal solution for pairs of crossreferenced dictionaries seem to beat Btrees for deletions. In this paper, we show a lower bound establishing that a pair of crossreferenced dictionaries that are optimal for range queries and that supports deletions cannot match the write optimization bound available to insertonly dictionaries. This result thus establishes a limit to the applicability of writeoptimization techniques on which many new databases and file systems are based. Read More: http://epubs.siam.org/doi/10.1137/1.9781611974782.99more » « less

Inmemory data management systems, such as keyvalue stores, have become an essential infrastructure in today's bigdata processing and cloud computing. They rely on efficient index structures to access data. While unordered indexes, such as hash tables, can perform point search with O(1) time, they cannot be used in many scenarios where range queries must be supported. Many ordered indexes, such as B+ tree and skip list, have a O(log N) lookup cost, where N is number of keys in an index. For an ordered index hosting billions of keys, it may take more than 30 keycomparisons in a lookup, which is an order of magnitude more expensive than that on a hash table. With availability of large memory and fast network in today's data centers, this O(log N) time is taking a heavy toll on applications that rely on ordered indexes. In this paper we introduce a new ordered index structure, named Wormhole, that takes O(log L) worstcase time for looking up a key with a length of L. The low cost is achieved by simultaneously leveraging strengths of three indexing structures, namely hash table, prefix tree, and B+ tree, to orchestrate a single fast ordered index. Wormhole's range operations can be performed by a linear scan of a list after an initial lookup. This improvement of access efficiency does not come at a price of compromised space efficiency. Instead, Wormhole's index space is comparable to those of B+ tree and skip list. Experiment results show that Wormhole outperforms skip list, B+ tree, ART, and Masstree by up to 8.4x, 4.9x, 4.3x, and 6.6x in terms of key lookup throughput, respectively.more » « less

For nearly six decades, the central open question in the study of hash tables has been to determine the optimal achievable tradeoff curve between time and space. Stateoftheart hash tables offer the following guarantee: If keys/values are Θ(logn) bits each, then it is possible to achieve constanttime insertions/deletions/queries while wasting only O(loglogn) bits of space per key when compared to the informationtheoretic optimum. Even prior to this bound being achieved, the target of O(log log n) wasted bits per key was known to be a natural end goal, and was proven to be optimal for a number of closely related problems (e.g., stable hashing, dynamic retrieval, and dynamicallyresized filters). This paper shows that O(log log n) wasted bits per key is not the end of the line for hashing. In fact, for any k ∈ [log∗ n], it is possible to achieve O(k)time insertions/deletions, O(1)time queries, and O(log(k) n) = Ologlog···logn k wasted bits per key (all with high probability in n). This means that, each time we increase inser tion/deletion time by an additive constant, we reduce the wasted bits per key exponentially. We further show that this tradeoff curve is the best achievable by any of a large class of hash tables, including any hash table designed using the current framework for making constanttime hash tables succinct. Our results hold not just for fixedcapacity hash tables, but also for hash tables that are dynamically resized (this is a fundamental departure from what is possible for filters); and for hash tables that store very large keys/values, each of which can be up to no(1) bits (this breaks with the conventional wisdom that larger keys/values should lead to more wasted bits per key). For very small keys/values, we are able to tighten our bounds to o(1) wasted bits per key, even when k = O(1). Building on this, we obtain a constanttime dynamic filter that uses nlogε−1+nloge+o(n) bits of space for a wide choice ofmore » « less