 NSFPAR ID:
 10298570
 Date Published:
 Journal Name:
 Annual Symposium on Foundations of Computer Science
 ISSN:
 02725428
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

For nearly six decades, the central open question in the study of hash tables has been to determine the optimal achievable tradeoff curve between time and space. Stateoftheart hash tables offer the following guarantee: If keys/values are Θ(logn) bits each, then it is possible to achieve constanttime insertions/deletions/queries while wasting only O(loglogn) bits of space per key when compared to the informationtheoretic optimum. Even prior to this bound being achieved, the target of O(log log n) wasted bits per key was known to be a natural end goal, and was proven to be optimal for a number of closely related problems (e.g., stable hashing, dynamic retrieval, and dynamicallyresized filters). This paper shows that O(log log n) wasted bits per key is not the end of the line for hashing. In fact, for any k ∈ [log∗ n], it is possible to achieve O(k)time insertions/deletions, O(1)time queries, and O(log(k) n) = Ologlog···logn k wasted bits per key (all with high probability in n). This means that, each time we increase inser tion/deletion time by an additive constant, we reduce the wasted bits per key exponentially. We further show that this tradeoff curve is the best achievable by any of a large class of hash tables, including any hash table designed using the current framework for making constanttime hash tables succinct. Our results hold not just for fixedcapacity hash tables, but also for hash tables that are dynamically resized (this is a fundamental departure from what is possible for filters); and for hash tables that store very large keys/values, each of which can be up to no(1) bits (this breaks with the conventional wisdom that larger keys/values should lead to more wasted bits per key). For very small keys/values, we are able to tighten our bounds to o(1) wasted bits per key, even when k = O(1). Building on this, we obtain a constanttime dynamic filter that uses nlogε−1+nloge+o(n) bits of space for a wide choice ofmore » « less

Hash tables are a ubiquitous class of dictionary data structures. However, standard hash table implementations do not translate well into the external memory model, because they do not incorporate locality for insertions. Iacono and Pătraşu established an update/query tradeoff curve for externalhash tables: a hash table that performs insertions in O(λ/B) amortized IOs requires Ω(logλ N) expected IOs for queries, where N is the number of items that can be stored in the data structure, B is the size of a memory transfer, M is the size of memory, and λ is a tuning parameter. They provide a complicated hashing data structure, which we call the IP hash table, that meets this curve for λ that is Ω(loglogM +logM N). In this paper, we present a simpler externalmemory hash table, the Bundle of Arrays Hash Table (BOA), that is optimal for a narrower range of λ. The simplicity of BOAs allows them to be readily modified to achieve the following results: A new externalmemory data structure, the Bundle of Trees Hash Table (BOT), that matches the performance of the IP hash table, while retaining some of the simplicity of the BOAs. The CacheOblivious Bundle of Trees Hash Table (COBOT), the first cacheoblivious hash table. This data structure matches the optimality of BOTs and IP hash tables over the same range of λ.more » « less

null (Ed.)RDMA has been an important building block for many highperformance distributed keyvalue stores in recent prior work. To sustain millions of operations per second per node, many of these works use hashing schemes, such as cuckoo hashing, that guarantee that an existing key can be found in a small, constant number of RDMA operations. In this paper, we explore whether linear probing is a compelling design alternative to cuckoobased hashing schemes. Instead of reading a fixed number of bytes per RDMA request, this paper introduces a mathematical model that picks the optimal read size by balancing the cost of performing an RDMA operation with the probability of encountering a probe sequence of a certain length. The model can further leverage optimization hints about the location of clusters of keys, which commonly occur in skewed key distributions. We extensively evaluate the performance of linear probing with a variable read size in a modern cluster to understand the tradeoffs between cuckoo hashing and linear probing. We find that cuckoo hashing outperforms linear probing only in very highly loaded hash tables (load factors at 90% or higher) that would be prohibitively expensive to maintain in practice. Overall, linear probing with variablesized reads using our model has up to 2.8× higher throughput than cuckoo hashing, with throughput as high as 50M lookup operations per second per node.more » « less

We revisit the problem of building static hash tables on the GPU and present an efficient implementation of bucketed hash tables. By decoupling the probing scheme from the hash table inmemory representation, we offer an implementation where the number of probes and the bucket size are the only factors limiting performance. Our analysis sweeps through the hash table parameter space for two probing schemes: cuckoo and iceberg hashing. We show that a bucketed cuckoo hash table (BCHT) that uses three hash functions outperforms alternative methods that use iceberg hashing and a cuckoo hash table that uses a bucket size of one. At load factors as high as 0.99, BCHT enjoys an average probe count of 1.43 during insertion. Using three hash functions only, positive and negative queries require at most 1.39 and 2.8 average probes per key, respectively.more » « less

Despite being one of the oldest data structures in computer science, hash tables continue to be the focus of a great deal of both theoretical and empirical research. A central reason for this is that many of the fundamental properties that one desires from a hash table are difficult to achieve simultaneously; thus many variants offering different tradeoffs have been proposed.
This article introduces Iceberg hashing, a hash table that simultaneously offers the strongest known guarantees on a large number of core properties. Iceberg hashing supports constanttime operations while improving on the state of the art for space efficiency, cache efficiency, and low failure probability. Iceberg hashing is also the first hash table to support a load factor of up to
1  o(1) while being stable, meaning that the position where an element is stored only ever changes when resizes occur. In fact, in the setting where keys are Θ (logn ) bits, the space guarantees that Iceberg hashing offers, namely that it uses at most bits to store\(\log \binom{U}{n} + O(n \log \ \text{log} n)\) n items from a universeU , matches a lower bound by Demaine et al. that applies to any stable hash table.Iceberg hashing introduces new generalpurpose techniques for some of the most basic aspects of hashtable design. Notably, our indirectionfree technique for dynamic resizing, which we call waterfall addressing, and our techniques for achieving stability and veryhigh probability guarantees, can be applied to any hash table that makes use of the frontyard/backyard paradigm for hash table design.