Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
d-dimensional (for d > 1) efficient range-summability (dD-ERS) of random variables (RVs) is a fundamental algorithmic problem that has applications to two important families of database problems, namely, fast approximate wavelet tracking (FAWT) on data streams and approximately answering range-sum queries over a data cube. Whether there are efficient solutions to the dD-ERS problem, or to the latter database problem, have been two long-standing open problems. Both are solved in this work. Specifically, we propose a novel solution framework to dD-ERS on RVs that have Gaussian or Poisson distribution. Our dD-ERS solutions are the first ones that have polylogarithmic time complexities. Furthermore, we develop a novel k-wise independence theory that allows our dD-ERS solutions to have both high computational efficiencies and strong provable independence guarantees. Finally, we show that under a sufficient and likely necessary condition, certain existing solutions for 1D-ERS can be generalized to higher dimensions.more » « less
-
In this work, we formulate and solve a new type of approximate nearest neighbor search (ANNS) problems called ANNS after linear transformation (ALT). In ANNS-ALT, we search for the vector (in a dataset) that, after being linearly transformed by a user-specified query matrix, is closest to a query vector. It is a very general mother problem in the sense that a wide range of baby ANNS problems that have important applications in databases and machine learning can be reduced to and solved as ANNS-ALT, or its dual that we call ANNS-ALTD. We propose a novel and computationally efficient solution, called ONe Index for All Kernels (ONIAK), to ANNS-ALT and all its baby problems when the data dimension 𝑑 is not too large (say 𝑑 ≤ 200). In ONIAK, a universal index is built, once and for all, for answering all future ANNS-ALT queries that can have distinct query matrices. We show by experiments that, when 𝑑 is not too large, ONIAK has better query performance than linear scan on the mother problem (of ANNS-ALT), and has query performances comparable to those of the state-of-the-art solutions on the baby problems. However, the algorithmic technique behind this universal index approach suffers from a so-called dimension blowup problem that can make the indexing time prohibitively long for a large dataset. We propose a novel algorithmic technique, called fast GOE quadratic form (FGoeQF), that completely solves the (prohibitively long indexing time) fallout of the dimension blowup problem. We also propose a Johnson-Lindenstrauss transform (JLT) based ANNS- ALT (and ANNS-ALTD) solution that significantly outperforms any competitor when 𝑑 is large.more » « less
-
Dan Olteanu and Nils Vortmeier (Ed.)Efficient range-summability (ERS) of a long list of random variables is a fundamental algorithmic problem that has applications to three important database applications, namely, data stream processing, space-efficient histogram maintenance (SEHM), and approximate nearest neighbor searches (ANNS). In this work, we propose a novel dyadic simulation framework and develop three novel ERS solutions, namely Gaussian-dyadic simulation tree (DST), Cauchy-DST and Random Walk-DST, using it. We also propose novel rejection sampling techniques to make these solutions computationally efficient. Furthermore, we develop a novel k-wise independence theory that allows our ERS solutions to have both high computational efficiencies and strong provable independence guarantees.more » « less