skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A General Technique for Searching in Implicit Sets via Function Inversion
Given a function f from the set [N] to a d-dimensional integer grid, we consider data structures that allow efficient orthogonal range searching queries in the image of f, without explicitly storing it. We show that, if f is of the form [N] → [2w]d for some w = polylog(N) and is computable in constant time, then, for any 0 < α < 1, we can obtain a data structure using Õ(N1-α/3) words of space such that, for a given d-dimensional axis-aligned box B, we can search for some x ∈ [N] such that f (x) ∈ B in time Õ(Nα). This result is obtained simply by combining integer range searching with the Fiat-Naor function inversion scheme, which was already used in data-structure problems previously. We further obtain • data structures for range counting and reporting, predecessor, selection, ranking queries, and combinations thereof, on the set f ([N]), • data structures for preimage size and preimage selection queries for a given value of f, and • data structures for selection and ranking queries on geometric quantities computed from tuples of points in d-space. These results unify and generalize previously known results on 3SUM-indexing and string searching, and are widely applicable as a black box to a variety of problems. In particular, we give a data structure for a generalized version of gapped string indexing, and show how to preprocess a set of points on an integer grid in order to efficiently compute (in sublinear time), for points contained in a given axis-aligned box, their Theil-Sen estimator, the kth largest area triangle, or the induced hyperplane that is the kth furthest from the origin.  more » « less
Award ID(s):
2008551
PAR ID:
10496291
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
SIAM
Date Published:
ISBN:
978-1-61197-793-6
Format(s):
Medium: X
Location:
Alexandria, VA
Sponsoring Org:
National Science Foundation
More Like this
  1. Keil, Mark; Mondal, Debajyoti (Ed.)
    Finding the kth nearest neighbor to a query point is a ubiquitous operation in many types of metric computations, especially those in unsupervised machine learning. In many such cases, the distance to k sample points is used as an estimate of the local density of the sample. In this paper, we give an algorithm that takes a finite metric (P,d) and an integer k and produces a subset S ⊆ P with the property that for any q ∈ P, the distance to the second nearest point of S to q is a constant factor approximation to the distance to the kth nearest point of P to q. Thus, the sample S may be used in lieu of P. In addition to being much smaller than P, the distance queries on S only require finding the second nearest neighbor instead of the kth nearest neighbor. This is a significant improvement, especially because theoretical guarantees on kth nearest neighbor methods often require k to grow as a function of the input size n. 
    more » « less
  2. Mulzer, Wolfgang; Phillips, Jeff M (Ed.)
    We present the first fully dynamic connectivity data structures for geometric intersection graphs achieving constant query time and sublinear amortized update time for many classes of geometric objects in 2D . Our data structures can answer connectivity queries between two objects, as well as "global" connectivity queries (e.g., deciding whether the entire graph is connected). Previously, the data structure by Afshani and Chan (ESA'06) achieved such bounds only in the special case of axis-aligned line segments or rectangles but did not work for arbitrary line segments or disks, whereas the data structures by Chan, Pătraşcu, and Roditty (FOCS'08) worked for more general classes of geometric objects but required n^{Ω(1)} query time and could not handle global connectivity queries. Specifically, we obtain new data structures with O(1) query time and amortized update time near n^{4/5}, n^{7/8}, and n^{20/21} for axis-aligned line segments, disks, and arbitrary line segments respectively. Besides greatly reducing the query time, our data structures also improve the previous update times for axis-aligned line segments by Afshani and Chan (from near n^{10/11} to n^{4/5}) and for disks by Chan, Pătraşcu, and Roditty (from near n^{20/21} to n^{7/8}). 
    more » « less
  3. Real-time decision making in IoT applications relies upon space-efficient evaluation of queries over streaming data. To model the uncertainty in the classification of data being processed, we consider the model of probabilistic strings --- sequences of discrete probability distributions over a finite set of events, and initiate the study of space complexity of streaming computation for different classes of queries over such probabilistic strings. We first consider the problem of computing the probability that a word, sampled from the distribution defined by the probabilistic string read so far, is accepted by a given deterministic finite automaton. We show that this regular pattern matching problem can be solved using space that is only poly-logarithmic in the string length (and polynomial in the size of the DFA) if we are allowed a multiplicative approximation error. Then we show how to generalize this result to quantitative queries specified by additive cost register automata --- these are automata that map strings to numerical values using finite control and registers that get updated using linear transformations. Finally, we consider the case when updates in such an automaton involve tests, and in particular, when there is a counter variable that can be either incremented or decremented but decrements only apply when the counter value is non-zero. In this case, the desired answer depends on the probability distribution over the set of possible counter values that can range from 0 to n for a string of length n. Under a mild assumption, namely probabilities of the individual events are bounded away from 0 and 1, we show that there is an algorithm that can compute all n entries of this probability distribution vector to within additive 1/poly(n) error using space that is only Õ(n). In establishing these results, we introduce several new technical ideas that may prove useful for designing space-efficient algorithms for other query models over probabilistic strings. 
    more » « less
  4. We introduce and study the notion of *an outer bi-Lipschitz extension* of a map between Euclidean spaces. The notion is a natural analogue of the notion of *a Lipschitz extension* of a Lipschitz map. We show that for every map f there exists an outer bi-Lipschitz extension f′ whose distortion is greater than that of f by at most a constant factor. This result can be seen as a counterpart of the classic Kirszbraun theorem for outer bi-Lipschitz extensions. We also study outer bi-Lipschitz extensions of near-isometric maps and show upper and lower bounds for them. Then, we present applications of our results to prioritized and terminal dimension reduction problems, described next. We prove a *prioritized* variant of the Johnson–Lindenstrauss lemma: given a set of points X⊂ ℝd of size N and a permutation (”priority ranking”) of X, there exists an embedding f of X into ℝO(logN) with distortion O(loglogN) such that the point of rank j has only O(log3 + ε j) non-zero coordinates – more specifically, all but the first O(log3+ε j) coordinates are equal to 0; the distortion of f restricted to the first j points (according to the ranking) is at most O(loglogj). The result makes a progress towards answering an open question by Elkin, Filtser, and Neiman about prioritized dimension reductions. We prove that given a set X of N points in ℜd, there exists a *terminal* dimension reduction embedding of ℝd into ℝd′, where d′ = O(logN/ε4), which preserves distances ||x−y|| between points x∈ X and y ∈ ℝd, up to a multiplicative factor of 1 ± ε. This improves a recent result by Elkin, Filtser, and Neiman. The dimension reductions that we obtain are nonlinear, and this nonlinearity is necessary. 
    more » « less
  5. null (Ed.)
    We present approximation and exact algorithms for piecewise regression of univariate and bivariate data using fixed-degree polynomials. Specifically, given a set S of n data points (x1, y1), . . . , (xn, yn) ∈ Rd × R where d ∈ {1, 2}, the goal is to segment xi’s into some (arbitrary) number of disjoint pieces P1, . . . , Pk, where each piece Pj is associated with a fixed-degree polynomial fj : Rd → R, to minimize the total loss function λk+􏰄ni=1(yi −f(xi))2, where λ ≥ 0 is a regularization term that penalizes model complexity (number of pieces) and f : 􏰇kj=1 Pj → R is the piecewise polynomial function defined as f|Pj = fj. The pieces P1,...,Pk are disjoint intervals of R in the case of univariate data and disjoint axis-aligned rectangles in the case of bivariate data. Our error approximation allows use of any fixed-degree polynomial, not just linear functions. Our main results are the following. For univariate data, we present a (1 + ε)-approximation algorithm with time complexity O(nε log1ε), assuming that data is presented in sorted order of xi’s. For bivariate data, we √ present three results: a sub-exponential exact algorithm with running time nO( n); a polynomial-time constant- approximation algorithm; and a quasi-polynomial time approximation scheme (QPTAS). The bivariate case is believed to be NP-hard in the folklore but we could not find a published record in the literature, so in this paper we also present a hardness proof for completeness. 
    more » « less