This paper introduces a new data-structural object that we call the tiny pointer. In many applications, traditional\(\log n\)-bit pointers can be replaced with\(o(\log n)\)-bit tiny pointers at the cost of only a constant-factor time overhead and a small probability of failure. We develop a comprehensive theory of tiny pointers, and give optimal constructions for both fixed-size tiny pointers (i.e., settings in which all of the tiny pointers must be the same size) and variable-size tiny pointers (i.e., settings in which the average tiny-pointer size must be small, but some tiny pointers can be larger). If a tiny pointer references an item in an array filled to load factor\(1-\delta\), then the optimal tiny-pointer size is\(\Theta(\log\log\log n+\log\delta^{-1})\)bits in the fixed-size case, and\(\Theta(\log\delta^{-1})\)expected bits in the variable-size case. Our tiny-pointer constructions also require us to revisit several classic problems having to do with balls and bins; these results may be of independent interest. Using tiny pointers, we apply tiny pointers to five classic data-structure problems. We show that:A data structure storing\(n\)\(v\)-bit values for\(n\)keys with constant-factor time modifications/queries can be implemented to take space\(nv+O(n\log^{(r)}n)\)bits, for any constant\(r\gt0\), as long as the user stores a tiny pointer of expected size\(O(1)\)with each key—here,\(\log^{(r)}n\)is the\(r\)-th iterated logarithm.Any binary search tree can be made succinct, meaning that it achieves\((1+o(1))\)times the optimal space, with constant-factor time overhead, and can even be made to be within\(O(n)\)bits of optimal if we allow for\(O(\log^{*}n)\)-time modifications—this holds even for rotation-based trees such as the splay tree and the red-black tree.Any fixed-capacity key-value dictionary can be made stable (i.e., items do not move once inserted) with constant-factor time overhead and\((1+o(1))\)-factor space overhead.Any key-value dictionary that requires uniform-size values can be made to support arbitrary-size values with constant-factor time overhead and with an additional space consumption of\(\log^{(r)}n+O(\log j)\)bits per\(j\)-bit value for an arbitrary constant\(r\gt0\)of our choice.Given an external-memory array\(A\)of size\((1+\varepsilon)n\)containing a dynamic set of up to\(n\)key-value pairs, it is possible to maintain an internal-memory stash of size\(O(n\log\varepsilon^{-1})\)bits so that the location of any key-value pair in\(A\)can be computed in constant time (and with no IOs). In each case tiny pointers allow for us to take a natural space-inefficient solution that uses pointers and make it space-efficient for free. 
                        more » 
                        « less   
                    
                            
                            Iceberg Hashing: Optimizing Many Hash-Table Criteria at Once
                        
                    
    
            Despite being one of the oldest data structures in computer science, hash tables continue to be the focus of a great deal of both theoretical and empirical research. A central reason for this is that many of the fundamental properties that one desires from a hash table are difficult to achieve simultaneously; thus many variants offering different trade-offs have been proposed. This article introduces Iceberg hashing, a hash table that simultaneously offers the strongest known guarantees on a large number of core properties. Iceberg hashing supports constant-time operations while improving on the state of the art for space efficiency, cache efficiency, and low failure probability. Iceberg hashing is also the first hash table to support a load factor of up to1 - o(1)while being stable, meaning that the position where an element is stored only ever changes when resizes occur. In fact, in the setting where keys are Θ (logn) bits, the space guarantees that Iceberg hashing offers, namely that it uses at most\(\log \binom{|U|}{n} + O(n \log \ \text{log} n)\)bits to storenitems from a universeU, matches a lower bound by Demaine et al. that applies to any stable hash table. Iceberg hashing introduces new general-purpose techniques for some of the most basic aspects of hash-table design. Notably, our indirection-free technique for dynamic resizing, which we call waterfall addressing, and our techniques for achieving stability and very-high probability guarantees, can be applied to any hash table that makes use of the front-yard/backyard paradigm for hash table design. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10485534
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Journal of the ACM
- Volume:
- 70
- Issue:
- 6
- ISSN:
- 0004-5411
- Page Range / eLocation ID:
- 1 to 51
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            For nearly six decades, the central open question in the study of hash tables has been to determine the optimal achievable tradeoff curve between time and space. State-of-the-art hash tables offer the following guarantee: If keys/values are Θ(logn) bits each, then it is possible to achieve constant-time insertions/deletions/queries while wasting only O(loglogn) bits of space per key when compared to the information-theoretic optimum. Even prior to this bound being achieved, the target of O(log log n) wasted bits per key was known to be a natural end goal, and was proven to be optimal for a number of closely related problems (e.g., stable hashing, dynamic retrieval, and dynamically-resized filters). This paper shows that O(log log n) wasted bits per key is not the end of the line for hashing. In fact, for any k ∈ [log∗ n], it is possible to achieve O(k)-time insertions/deletions, O(1)-time queries, and O(log(k) n) = Ologlog···logn    k wasted bits per key (all with high probability in n). This means that, each time we increase inser- tion/deletion time by an additive constant, we reduce the wasted bits per key exponentially. We further show that this tradeoff curve is the best achievable by any of a large class of hash tables, including any hash table designed using the current framework for making constant-time hash tables succinct. Our results hold not just for fixed-capacity hash tables, but also for hash tables that are dynamically resized (this is a fundamental departure from what is possible for filters); and for hash tables that store very large keys/values, each of which can be up to no(1) bits (this breaks with the conventional wisdom that larger keys/values should lead to more wasted bits per key). For very small keys/values, we are able to tighten our bounds to o(1) wasted bits per key, even when k = O(1). Building on this, we obtain a constant-time dynamic filter that uses nlogε−1+nloge+o(n) bits of space for a wide choice ofmore » « less
- 
            In this paper we provide anO(mloglogO(1)nlog (1/ϵ))-expected time algorithm for solving Laplacian systems onn-nodem-edge graphs, improving upon the previous best expected runtime of\(O(m \sqrt {\log n} \mathrm{log log}^{O(1)} n \log (1/\epsilon)) \)achieved by (Cohen, Kyng, Miller, Pachocki, Peng, Rao, Xu 2014). To obtain this result we provide efficient constructions of low spectral stretch graph approximations with improved stretch and sparsity bounds. As motivation for this work, we show that for every set of vectors in\(\mathbb {R}^d \)(not just those induced by graphs) and all integerk> 1 there exist an ultra-sparsifier withd− 1 +O(d/k) re-weighted vectors of relative condition number at mostk2. For smallk, this improves upon the previous best known multiplicative factor of\(k \cdot \tilde{O}(\log d) \), which is only known for the graph case. Additionally, in the graph case we employ our low-stretch subgraph construction to obtainn− 1 +O(n/k)-edge ultrasparsifiers of relative condition numberk1 +o(1)fork=ω(log δn) for anyδ> 0: this improves upon the previous work fork=o(exp (log 1/2 −δn)).more » « less
- 
            Abstract In this paper, we focus on constructing unique-decodable and list-decodable codes for the recently studied (t, e)-composite-asymmetric error-correcting codes ((t, e)-CAECCs). Let$$\mathcal {X}$$ be an$$m \times n$$ binary matrix in which each row has Hamming weightw. If at mosttrows of$$\mathcal {X}$$ contain errors, and in each erroneous row, there are at mosteoccurrences of$$1 \rightarrow 0$$ errors, we say that a (t, e)-composite-asymmetric error occurs in$$\mathcal {X}$$ . For general values ofm, n, w, t, ande, we propose new constructions of (t, e)-CAECCs with redundancy at most$$(t-1)\log (m) + O(1)$$ , whereO(1) is independent of the code lengthm. In particular, this yields a class of (2, e)-CAECCs that are optimal in terms of redundancy. Whenmis a prime power, the redundancy can be further reduced to$$(t-1)\log (m) - O(\log (m))$$ . To further increase the code size, we introduce a combinatorial object called a weak$$B_e$$ -set. When$$e = w$$ , we present an efficient encoding and decoding method for our codes. Finally, we explore potential improvements by relaxing the requirement of unique decoding to list-decoding. We show that when the list size ist! or an exponential function oft, there exist list-decodable (t, e)-CAECCs with constant redundancy. When the list size is two, we construct list-decodable (3, 2)-CAECCs with redundancy$$\log (m) + O(1)$$ .more » « less
- 
            Abstract We state a general purpose algorithm for quickly finding primes in evenly divided sub-intervals. Legendre’s conjecture claims that for every positive integern, there exists a prime between$$n^2$$ and$$(n+1)^2$$ . Oppermann’s conjecture subsumes Legendre’s conjecture by claiming there are primes between$$n^2$$ and$$n(n+1)$$ and also between$$n(n+1)$$ and$$(n+1)^2$$ . Using Cramér’s conjecture as the basis for a heuristic run-time analysis, we show that our algorithm can verify Oppermann’s conjecture, and hence also Legendre’s conjecture, for all$$n\le N$$ in time$$O( N \log N \log \log N)$$ and space$$N^{O(1/\log \log N)}$$ . We implemented a parallel version of our algorithm and improved the empirical verification of Oppermann’s conjecture from the previous$$N = 2\cdot 10^{9}$$ up to$$N = 7.05\cdot 10^{13} > 2^{46}$$ , so we were finding 27 digit primes. The computation ran for about half a year on each of two platforms: four Intel Xeon Phi 7210 processors using a total of 256 cores, and a 192-core cluster of Intel Xeon E5-2630 2.3GHz processors.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    