skip to main content


Title: Orthogonal Range Reporting and Rectangle Stabbing for Fat Rectangles
In this paper we study two geometric data structure problems in the special case when input objects or queries are fat rectangles. We show that in this case a significant improvement compared to the general case can be achieved. We describe data structures that answer two- and three-dimensional orthogonal range reporting queries in the case when the query range is a fat rectangle. Our two-dimensional data structure uses O(n) words and supports queries in O(loglog U + k) time, where n is the number of points in the data structure, U is the size of the universe and k is the number of points in the query range. Our three-dimensional data structure needs O(n log^ε U) words of space and answers queries in O(loglog U + k) time. We also consider the rectangle stabbing problem on a set of three-dimensional fat rectangles. Our data structure uses O(n) space and answers stabbing queries in O(log U loglog U + k) time.  more » « less
Award ID(s):
1814026
NSF-PAR ID:
10141614
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proc. Algorithms and Data Structures Symposium (WADS)
Page Range / eLocation ID:
283-295
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The ranked (or top-k) document retrieval problem is defined as follows: preprocess a collection{T1,T2,… ,Td}ofdstrings (called documents) of total lengthninto a data structure, such that for any given query(P,k), wherePis a string (called pattern) of lengthp ≥ 1andk ∈ [1,d]is an integer, the identifiers of thosekdocuments that are most relevant toPcan be reported, ideally in the sorted order of their relevance. The seminal work by Hon et al. [FOCS 2009 and Journal of the ACM 2014] presented anO(n)-space (in words) data structure withO(p+klogk)query time. The query time was later improved toO(p+k)[SODA 2012] and further toO(p/logσn+k)[SIAM Journal on Computing 2017] by Navarro and Nekrich, whereσis the alphabet size. We revisit this problem in the external memory model and present three data structures. The first one takesO(n)-space and answer queries inO(p/B+ logBn + k/B+log*(n/B)) I/Os, whereBis the block size. The second one takesO(nlog*(n/B)) space and answer queries in optimalO(p/B+ logBn + k/B)I/Os. In both cases, the answers are reported in the unsorted order of relevance. To handle sorted top-kdocument retrieval, we present anO(nlog(d/B))space data structure with optimal query cost.

     
    more » « less
  2. For nearly six decades, the central open question in the study of hash tables has been to determine the optimal achievable tradeoff curve between time and space. State-of-the-art hash tables offer the following guarantee: If keys/values are Θ(logn) bits each, then it is possible to achieve constant-time insertions/deletions/queries while wasting only O(loglogn) bits of space per key when compared to the information-theoretic optimum. Even prior to this bound being achieved, the target of O(log log n) wasted bits per key was known to be a natural end goal, and was proven to be optimal for a number of closely related problems (e.g., stable hashing, dynamic retrieval, and dynamically-resized filters). This paper shows that O(log log n) wasted bits per key is not the end of the line for hashing. In fact, for any k ∈ [log∗ n], it is possible to achieve O(k)-time insertions/deletions, O(1)-time queries, and O(log(k) n) = Ologlog···logn 􏰟 􏰞􏰝 􏰠 k wasted bits per key (all with high probability in n). This means that, each time we increase inser- tion/deletion time by an additive constant, we reduce the wasted bits per key exponentially. We further show that this tradeoff curve is the best achievable by any of a large class of hash tables, including any hash table designed using the current framework for making constant-time hash tables succinct. Our results hold not just for fixed-capacity hash tables, but also for hash tables that are dynamically resized (this is a fundamental departure from what is possible for filters); and for hash tables that store very large keys/values, each of which can be up to no(1) bits (this breaks with the conventional wisdom that larger keys/values should lead to more wasted bits per key). For very small keys/values, we are able to tighten our bounds to o(1) wasted bits per key, even when k = O(1). Building on this, we obtain a constant-time dynamic filter that uses n􏰕logε−1􏰖+nloge+o(n) bits of space for a wide choice of 
    more » « less
  3. We present the first near-linear-time algorithm that computes a (1+ε)-approximation of the diameter of a weighted unit-disk graph of n vertices. Our algorithm requires O(n log^2 n) time for any constant ε>0, so we considerably improve upon the near-O(n^{3/2})-time algorithm of Gao and Zhang (2005). Using similar ideas we develop (1+ε)-approximate \emph{distance oracles} of O(1) query time with a likewise improvement in the preprocessing time, specifically from near O(n^{3/2}) to O(n log^3 n). We also obtain similar new results for a number of related problems in the weighted unit-disk graph metric such as the radius and the bichromatic closest pair. As a further application we employ our distance oracle, along with additional ideas, to solve the (1+ε)-approximate \emph{all-pairs bounded-leg shortest paths\/} (apBLSP) problem for a set of n planar points. Our data structure requires O(n^2 log n) space, O(loglog n) query time, and nearly O(n^{2.579}) preprocessing time for any constant ε>0, and is the first that breaks the near-cubic preprocessing time bound given by Roditty and Segal (2011). 
    more » « less
  4. We study theta-joins in general and join predicates with conjunctions and disjunctions of inequalities in particular, focusing on ranked enumeration where the answers are returned incrementally in an order dictated by a given ranking function. Our approach achieves strong time and space complexity properties: with n denoting the number of tuples in the database, we guarantee for acyclic full join queries with inequality conditions that for every value of k , the k top-ranked answers are returned in O ( n polylog n + k log k ) time. This is within a polylogarithmic factor of O ( n + k log k ), i.e., the best known complexity for equi-joins, and even of O ( n + k ), i.e., the time it takes to look at the input and return k answers in any order. Our guarantees extend to join queries with selections and many types of projections (namely those called "free-connex" queries and those that use bag semantics). Remarkably, they hold even when the number of join results is n ℓ for a join of ℓ relations. The key ingredient is a novel O ( n polylog n )-size factorized representation of the query output , which is constructed on-the-fly for a given query and database. In addition to providing the first nontrivial theoretical guarantees beyond equi-joins, we show in an experimental study that our ranked-enumeration approach is also memory-efficient and fast in practice, beating the running time of state-of-the-art database systems by orders of magnitude. 
    more » « less
  5. Given a set of points P and axis-aligned rectangles R in the plane, a point p ∈ P is called exposed if it lies outside all rectangles in R. In the max-exposure problem, given an integer parameter k, we want to delete k rectangles from R so as to maximize the number of exposed points. We show that the problem is NP-hard and assuming plausible complexity conjectures is also hard to approximate even when rectangles in R are translates of two fixed rectangles. However, if R only consists of translates of a single rectangle, we present a polynomial-time approximation scheme. For general rectangle range space, we present a simple O(k) bicriteria approximation algorithm; that is by deleting O(k2) rectangles, we can expose at least Ω(1/k) of the optimal number of points. 
    more » « less