 Award ID(s):
 1637458
 NSFPAR ID:
 10027801
 Date Published:
 Journal Name:
 Proceedings of the TwentyEighth Annual ACMSIAM Symposium on Discrete Algorithms
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

The skip list is an elegant dictionary data structure that is com monly deployed in RAM. A skip list with N elements supports searches, inserts, and deletes in O(logN) operations with high probability (w.h.p.) and range queries returning K elements in O(log N + K) operations w.h.p. A seemingly natural way to generalize the skip list to external memory with block size B is to “promote” with probability 1/B, rather than 1/2. However, there are practical and theoretical obsta cles to getting the skip list to retain its efficient performance, space bounds, and highprobability guarantees. We give an externalmemory skip list that achieves write optimized bounds. That is, for 0 < ε < 1, range queries take O(logBε N + K/B) I/Os w.h.p. and insertions and deletions take O((logBε N)/B1−ε) amortized I/Os w.h.p. Our writeoptimized skip list inherits the virtue of simplicity from RAM skip lists. Moreover, it matches or beats the asymptotic bounds of prior writeoptimized data structures such as Bε trees or LSM trees. These data structures are deployed in highperformance databases and file systems.more » « less

Dyckreachability is a fundamental formulation for program analysis, which has been widely used to capture properlymatchedparenthesis program properties such as function calls/returns and field writes/reads. Bidirected Dyckreachability is a relaxation of Dyckreachability on bidirected graphs where each edge u → ( i v labeled by an open parenthesis “( i ” is accompanied with an inverse edge v → ) i u labeled by the corresponding close parenthesis “) i ”, and vice versa. In practice, many client analyses such as alias analysis adopt the bidirected Dyckreachability formulation. Bidirected Dyckreachability admits an optimal reachability algorithm. Specifically, given a graph with n nodes and m edges, the optimal bidirected Dyckreachability algorithm computes allpairs reachability information in O ( m ) time. This paper focuses on the dynamic version of bidirected Dyckreachability. In particular, we consider the problem of maintaining allpairs Dyckreachability information in bidirected graphs under a sequence of edge insertions and deletions. Dynamic bidirected Dyckreachability can formulate many program analysis problems in the presence of code changes. Unfortunately, solving dynamic graph reachability problems is challenging. For example, even for maintaining transitive closure, the fastest deterministic dynamic algorithm requires O ( n 2 ) update time to achieve O (1) query time. Allpairs Dyckreachability is a generalization of transitive closure. Despite extensive research on incremental computation, there is no algorithmic development on dynamic graph algorithms for program analysis with worstcase guarantees. Our work fills the gap and proposes the first dynamic algorithm for Dyck reachability on bidirected graphs. Our dynamic algorithms can handle each graph update ( i.e. , edge insertion and deletion) in O ( n ·α( n )) time and support any allpairs reachability query in O (1) time, where α( n ) is the inverse Ackermann function. We have implemented and evaluated our dynamic algorithm on an alias analysis and a contextsensitive datadependence analysis for Java. We compare our dynamic algorithms against a straightforward approach based on the O ( m )time optimal bidirected Dyckreachability algorithm and a recent incremental Datalog solver. Experimental results show that our algorithm achieves orders of magnitude speedup over both approaches.more » « less

Hash tables are a ubiquitous class of dictionary data structures. However, standard hash table implementations do not translate well into the external memory model, because they do not incorporate locality for insertions. Iacono and Pătraşu established an update/query tradeoff curve for externalhash tables: a hash table that performs insertions in O(λ/B) amortized IOs requires Ω(logλ N) expected IOs for queries, where N is the number of items that can be stored in the data structure, B is the size of a memory transfer, M is the size of memory, and λ is a tuning parameter. They provide a complicated hashing data structure, which we call the IP hash table, that meets this curve for λ that is Ω(loglogM +logM N). In this paper, we present a simpler externalmemory hash table, the Bundle of Arrays Hash Table (BOA), that is optimal for a narrower range of λ. The simplicity of BOAs allows them to be readily modified to achieve the following results: A new externalmemory data structure, the Bundle of Trees Hash Table (BOT), that matches the performance of the IP hash table, while retaining some of the simplicity of the BOAs. The CacheOblivious Bundle of Trees Hash Table (COBOT), the first cacheoblivious hash table. This data structure matches the optimality of BOTs and IP hash tables over the same range of λ.more » « less

The ranked (or top
k ) document retrieval problem is defined as follows: preprocess a collection{T_{1},T_{2},… ,T_{d}} ofd strings (called documents) of total lengthn into a data structure, such that for any given query(P,k) , whereP is a string (called pattern) of lengthp ≥ 1 andk ∈ [1,d] is an integer, the identifiers of thosek documents that are most relevant toP can be reported, ideally in the sorted order of their relevance. The seminal work by Hon et al. [FOCS 2009 and Journal of the ACM 2014] presented anO(n) space (in words) data structure withO(p+k logk) query time. The query time was later improved toO(p+k) [SODA 2012] and further toO(p/ log_{σn+k}) [SIAM Journal on Computing 2017] by Navarro and Nekrich, whereσ is the alphabet size. We revisit this problem in the external memory model and present three data structures. The first one takesO(n) space and answer queries inO(p/B + log_{B}n + k/B+ log^{*}(n/B) ) I/Os, whereB is the block size. The second one takesO(n log^{*}(n/B) ) space and answer queries in optimalO(p/B + log_{B}n + k/B) I/Os. In both cases, the answers are reported in the unsorted order of relevance. To handle sorted topk document retrieval, we present anO(n log(d/B)) space data structure with optimal query cost. 
We present efficient dynamic data structures for maintaining the union of unit discs and the lower envelope of pseudolines in the plane. More precisely, we present three main results in this paper: (i) We present a linearsize data structure to maintain the union of a set of unit discs under insertions. It can insert a disc and update the union in O (( k +1)log 2 n ) time, where n is the current number of unit discs and k is the combinatorial complexity of the structural change in the union due to the insertion of the new disc. It can also compute, within the same time bound, the area of the union after the insertion of each disc. (ii) We propose a linearsize data structure for maintaining the lower envelope of a set of x monotone pseudolines. It can handle insertion/deletion of a pseudoline in O (log 2 n ) time; for a query point x 0 ∈ ℝ, it can report, in O (log n ) time, the point on the lower envelope with x coordinate x 0 ; and for a query point q ∈ ℝ 2 , it can return all k pseudolines lying below q in time O (log n + k log 2 n ). (iii) We present a linearsize data structure for storing a set of circular arcs of unit radius (not necessarily on the boundary of the union of the corresponding discs), so that for a query unit disc D , all input arcs intersecting D can be reported in O ( n 1/2+ɛ + k ) time, where k is the output size and ɛ > 0 is an arbitrarily small constant. A unitcircle arc can be inserted or deleted in O (log 2 n ) time.more » « less