skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: History-Independent Dynamic Partitioning: Operation-Order Privacy in Ordered Data Structures
A data structure is history independent if its internal representation reveals nothing about the history of operations beyond what can be determined from the current contents of the data structure. History independence is typically viewed as a security or privacy guarantee, with the intent being to minimize risks incurred by a security breach or audit. Despite widespread advances in history independence, there is an important data-structural primitive that previous work has been unable to replace with an equivalent history-independent alternative---dynamic partitioning. In dynamic partitioning, we are given a dynamic set S of ordered elements and a size-parameter B, and the objective is to maintain a partition of S into ordered groups, each of size Θ(B). Dynamic partitioning is important throughout computer science, with applications to B-tree rebalancing, write-optimized dictionaries, log-structured merge trees, other external-memory indexes, geometric and spatial data structures, cache-oblivious data structures, and order-maintenance data structures. The lack of a history-independent dynamic-partitioning primitive has meant that designers of history-independent data structures have had to resort to complex alternatives. In this paper, we achieve history-independent dynamic partitioning. Our algorithm runs asymptotically optimally against an oblivious adversary, processing each insert/delete with O(1) operations in expectation and O(B log N/loglog N) with high probability in set size N.  more » « less
Award ID(s):
2247577 2106827 2212129
PAR ID:
10514506
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the ACM on Management of Data/47th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS)
Volume:
2
Issue:
2
ISSN:
2836-6573
Page Range / eLocation ID:
1 to 27
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In-memory data management systems, such as key-value stores, have become an essential infrastructure in today's big-data processing and cloud computing. They rely on efficient index structures to access data. While unordered indexes, such as hash tables, can perform point search with O(1) time, they cannot be used in many scenarios where range queries must be supported. Many ordered indexes, such as B+ tree and skip list, have a O(log N) lookup cost, where N is number of keys in an index. For an ordered index hosting billions of keys, it may take more than 30 key-comparisons in a lookup, which is an order of magnitude more expensive than that on a hash table. With availability of large memory and fast network in today's data centers, this O(log N) time is taking a heavy toll on applications that rely on ordered indexes. In this paper we introduce a new ordered index structure, named Wormhole, that takes O(log L) worst-case time for looking up a key with a length of L. The low cost is achieved by simultaneously leveraging strengths of three indexing structures, namely hash table, prefix tree, and B+ tree, to orchestrate a single fast ordered index. Wormhole's range operations can be performed by a linear scan of a list after an initial lookup. This improvement of access efficiency does not come at a price of compromised space efficiency. Instead, Wormhole's index space is comparable to those of B+ tree and skip list. Experiment results show that Wormhole outperforms skip list, B+ tree, ART, and Masstree by up to 8.4x, 4.9x, 4.3x, and 6.6x in terms of key lookup throughput, respectively. 
    more » « less
  2. The skip list is an elegant dictionary data structure that is com- monly deployed in RAM. A skip list with N elements supports searches, inserts, and deletes in O(logN) operations with high probability (w.h.p.) and range queries returning K elements in O(log N + K) operations w.h.p. A seemingly natural way to generalize the skip list to external memory with block size B is to “promote” with probability 1/B, rather than 1/2. However, there are practical and theoretical obsta- cles to getting the skip list to retain its efficient performance, space bounds, and high-probability guarantees. We give an external-memory skip list that achieves write- optimized bounds. That is, for 0 < ε < 1, range queries take O(logBε N + K/B) I/Os w.h.p. and insertions and deletions take O((logBε N)/B1−ε) amortized I/Os w.h.p. Our write-optimized skip list inherits the virtue of simplicity from RAM skip lists. Moreover, it matches or beats the asymptotic bounds of prior write-optimized data structures such as Bε trees or LSM trees. These data structures are deployed in high-performance databases and file systems. 
    more » « less
  3. Motivated by fairness concerns, we study existence and computation of portfolios, defined as: given anoptimization problem with feasible solutions D, a class C of fairness objective functions, a set X of feasible solutions is an α-approximate portfolio if for each objective f in C, there is an α-approximation for f in X. We study the trade-off between the size |X| of the portfolio and its approximation factor αfor various combinatorial problems, such as scheduling, covering, and facility location, and choices of C as top-k, ordered and symmetric monotonic norms. Our results include: (i) an α-approximate portfolio of size O(log d*log(α/4))for ordered norms and lower bounds of size \Omega( log dlog α+log log d)for the problem of scheduling identical jobs on d unidentical machines, (ii) O(log n)-approximate O(log n)-sized portfolios for facility location on n points for symmetric monotonic norms, and (iii) logO(d)^(r^2)-size O(1)-approximate portfolios for ordered norms and O(log d)-approximate for symmetric monotonic norms for covering polyhedra with a constant rnumber of constraints. The latter result uses our novel OrderAndCount framework that obtains an exponentialimprovement in portfolio sizes compared to current state-of-the-art, which may be of independent interest. 
    more » « less
  4. We present efficient dynamic data structures for maintaining the union of unit discs and the lower envelope of pseudo-lines in the plane. More precisely, we present three main results in this paper: (i) We present a linear-size data structure to maintain the union of a set of unit discs under insertions. It can insert a disc and update the union in O (( k +1)log 2 n ) time, where n is the current number of unit discs and k is the combinatorial complexity of the structural change in the union due to the insertion of the new disc. It can also compute, within the same time bound, the area of the union after the insertion of each disc. (ii) We propose a linear-size data structure for maintaining the lower envelope of a set of x -monotone pseudo-lines. It can handle insertion/deletion of a pseudo-line in O (log 2 n ) time; for a query point x 0 ∈ ℝ, it can report, in O (log n ) time, the point on the lower envelope with x -coordinate x 0 ; and for a query point q ∈ ℝ 2 , it can return all k pseudo-lines lying below q in time O (log n + k log 2 n ). (iii) We present a linear-size data structure for storing a set of circular arcs of unit radius (not necessarily on the boundary of the union of the corresponding discs), so that for a query unit disc D , all input arcs intersecting D can be reported in O ( n 1/2+ɛ + k ) time, where k is the output size and ɛ > 0 is an arbitrarily small constant. A unit-circle arc can be inserted or deleted in O (log 2 n ) time. 
    more » « less
  5. The ranked (or top-k) document retrieval problem is defined as follows: preprocess a collection{T1,T2,… ,Td}ofdstrings (called documents) of total lengthninto a data structure, such that for any given query(P,k), wherePis a string (called pattern) of lengthp ≥ 1andk ∈ [1,d]is an integer, the identifiers of thosekdocuments that are most relevant toPcan be reported, ideally in the sorted order of their relevance. The seminal work by Hon et al. [FOCS 2009 and Journal of the ACM 2014] presented anO(n)-space (in words) data structure withO(p+klogk)query time. The query time was later improved toO(p+k)[SODA 2012] and further toO(p/logσn+k)[SIAM Journal on Computing 2017] by Navarro and Nekrich, whereσis the alphabet size. We revisit this problem in the external memory model and present three data structures. The first one takesO(n)-space and answer queries inO(p/B+ logBn + k/B+log*(n/B)) I/Os, whereBis the block size. The second one takesO(nlog*(n/B)) space and answer queries in optimalO(p/B+ logBn + k/B)I/Os. In both cases, the answers are reported in the unsorted order of relevance. To handle sorted top-kdocument retrieval, we present anO(nlog(d/B))space data structure with optimal query cost. 
    more » « less