<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Approximate Nearest Neighbors in Limited Space</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2018</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10065203</idno>
					<idno type="doi">10.1137/1.9781611975031</idno>
					<title level='j'>Conference on Learning Theory</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Piotr Indyk</author><author>Tal Wagner</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[We consider the (1+ϵ)-approximate nearest neighbor search problem: given a set X of n points in a d-dimensional space, build a data structure that, given any query point y, finds a point x∈X whose distance to y is at most (1+ϵ)minx∈X ‖x−y‖ for an accuracy parameter ϵ∈(0,1). Our main result is a data structure that occupies only O(ϵ^−2 n log(n)log(1/ϵ)) bits of space, assuming all point coordinates are integers in the range {−n^O(1)…n^O(1)}, i.e., the coordinates have O(logn) bits of precision. This improves over the best previously known space bound of O(ϵ^−2 n log(n)^2), obtained via the randomized dimensionality reduction method of Johnson and Lindenstrauss (1984). We also consider the more general problem of estimating all distances from a collection of query points to all data points X, and provide almost tight upper and lower bounds for the space complexity of this problem.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The nearest neighbor search problem is defined as follows: given a set X of n points in a ddimensional space, build a data structure that, given any query point y, returns the point in X closest to y. For efficiency reasons, the problem is often relaxed to approximate nearest neighbor search, where the goal is to find a point x &#8712; X whose distance to y is at most c min x&#8712;X x -y for some approximation factor c &gt; 1. Both problems have found numerous applications in machine learning, computer vision, information retrieval and other areas. In machine learning in particular, nearest neighbor classifiers are popular baseline methods whose classification error often comes close to that of the best known techniques <ref type="bibr">(Efros (2017)</ref>).</p><p>Developing fast approximate nearest neighbor search algorithms have been a subject of extensive research efforts over the last last two decades, see e.g., <ref type="bibr">Shakhnarovich et al. (2006)</ref>; <ref type="bibr">Andoni and Indyk (2017)</ref> for an overview. More recently, there has been increased focus on designing nearest neighbor methods that use a limited amount of space. This is motivated by the need to fit the data set in the main memory <ref type="bibr">(Johnson et al. (2017b,a)</ref>) or an Internet of Things device <ref type="bibr">(Gupta et al. (2017)</ref>). Furthermore, even a simple linear scan over the data is more time-efficient if the data is compressed. The data set compression is most often achieved by developing compact representations of data that approximately preserve the distances between the points (see <ref type="bibr">Wang et al. (2016)</ref> for a survey). Such representations are smaller than the original (uncompressed) representation of the data set, while approximately preserving the distances between points.</p><p>Most of the approaches in the literature are only validated empirically. The currently best known theoretical tradeoffs between the representation size and the approximation quality are summarized in Table <ref type="table">1</ref>, together with their functionalities and constraints.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Bits per point</head><p>Comments No compression d log n Johnson and Lindenstrauss (1984)</p><p>-2 log 2 (n) Estimates distances between any y and all x &#8712; X Kushilevitz et al. ( <ref type="formula">2000</ref>)</p><p>-2 log(n) log(R) Estimates distances between any y and all x &#8712; X, assuming x -y &#8712; [r, Rr] Indyk and Wagner (2017)</p><p>-2 log(n) log(1/ ) Estimates distances between all x, y &#8712; X, does not provably support out-of-sample queries This paper -2 log(n) log(1/ ) Returns an approximate nearest neighbor of y in X Table <ref type="table">1</ref>: Comparison of Euclidean metric sketches with distortion 1 &#177; . We assume that all point coordinates are represented using log &#934; bits, or alternatively that each coordinate is an integer in the range {-&#934; . . . &#934;}. For the sake of exposition, the results depicted in the table assume &#934; = n O(1) . Furthermore, the compression algorithm can be randomized, and the compressed representation must enable approximating distances up to a factor of 1 &#177; with probability 1/n O(1) .</p><p>Unfortunately, in the context of approximate nearest neighbor search, the above representations lead to sub-optimal results. The result from the last row of the table (from <ref type="bibr">Indyk and Wagner (2017)</ref>) cannot be used to obtain provable bounds for nearest neighbor search, because the distance preservation guarantees hold only for pairs of points in the pointset X. <ref type="foot">1</ref> The second-to-last result (from <ref type="bibr">Kushilevitz et al. (2000)</ref>) only estimates distances in a certain range; extending this approach to all distances would multiply the storage by a factor of log &#934;. Finally, the representations obtained via a direct application of randomized dimensionality reduction <ref type="bibr">(Johnson and Lindenstrauss (1984)</ref>) are also larger than the bound from <ref type="bibr">Indyk and Wagner (2017)</ref> by almost a factor of log &#934;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Our results</head><p>In this paper we show that it is possible to overcome the limitations of the previous results and design a compact representation that supports (1 + )-approximate nearest neighbor search, with a space bound essentially matching that of <ref type="bibr">Indyk and Wagner (2017)</ref>. This constitutes the first reduction in the space complexity of approximate nearest neighbor below the "Johnson-Lindenstrauss bound". Specifically, we show the following. Suppose that we want the data structure to answer q approximate nearest neighbor queries in a d-dimensional dataset of size n, in which coordinates are represented by log &#934; bits each. All q queries must be answered correctly with probability 1 -&#948;. (See Section 2 for the formal problem definition).</p><p>Theorem 1.1 For the all-nearest-neighbors problem, there is a sketch of size</p><p>The proof is given in Section 4. We also give a lower bound of &#8486;(n log(n)/ 2 ) for q = 1 and &#948; = 1/n O(1) (Section B), which shows that the first term in the above theorem is almost tight. Interestingly, the representation by itself does not return the (approximate) distance between the query point and the returned neighbor. Thus, we also consider the problem of estimating distances from a query point to all data points. In this setting, a result of <ref type="bibr">Molinaro et al. (2013)</ref> shows that the Johnson-Lindenstrauss space bound is optimal when the number of queries is equal to the number of data points. However, in many settings, the number of queries is often substantially smaller than the dataset size. We give nearly tight upper and lower bounds (up to a factor of log(1/ )) for this problem, showing it is possible to smoothly interpolate between <ref type="bibr">Indyk and Wagner (2017)</ref>, which does not support out-of-sample distance queries, and the Johnson-Lindenstrauss bound.</p><p>Specifically, we show the following. Suppose that we want the data structure to estimate all cross-distances between a set of q queries and all points in X, all of which must be estimated correctly with probability 1 -&#948; (see Section 2 for the formal problem definition).</p><p>Theorem 1.2 For the all-cross-distances problem, there is a sketch of size</p><p>Note that the dependence per point on &#934; is logarithmic, as opposed to doubly logarithmic in Theorem 1.1. We show this dependence is necessary, as per the following theorem.</p><p>Theorem 1.3 Suppose that d 1-&#961; &#8805; -2 log(nq/&#948;), &#934; &#8805; 1/ , and 1/n 0.5-&#961; &#8804; &#8804; 0 for some constants &#961;, &#961; &gt; 0 and a sufficiently small constant 0 . Then, for the all-cross-distances problem, any sketch must use at least</p><p>The proofs are given in Section 5 and Appendix C, respectively.</p><p>Practical variant <ref type="bibr">Indyk et al. (2017)</ref> presented a simplified version of <ref type="bibr">Indyk and Wagner (2017)</ref>, which has slightly weaker size guarantees, but on the other hand is practical to implement and was shown to work well empirically. However, it did not provably support out-of-sample queries.</p><p>Our techniques in this paper can be adapted to their algorithm and endow it with such provable guarantees, while retaining its simplicity and practicality. We elaborate on this in Appendix D.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Our techniques</head><p>The starting point of our representation is the compressed tree data structure from <ref type="bibr">Indyk and Wagner (2017)</ref>. The structure is obtained by constructing a hierarchical clustering of the data set, forming a tree of clusters. The position of each point corresponding to a node in the tree is then represented by storing a (quantized) displacement vector between the point and its "ancestor" in the tree. The resulting tree is further compressed by identifying and post-processing "long" paths in the tree. The intuition is that a subtree at the bottom of such a path corresponds to a cluster of points that is "sufficiently separated" from the rest of the points (see Figure <ref type="figure">1</ref>). This means that the data structure does not need to know the exact position of this cluster in order to estimate the distances between the points in the cluster and the rest of the data set. Thus the data structure replaces each long path by a quantized displacement vector, where the quantization error does not depend on the length of the path. This ensures that the tree does not have long paths, which bounds its total size. Unfortunately, this reasoning breaks down if one of the points is not known in advance, as it is the case for the approximate nearest neighbor problem. In particular, if the query point y lies in the vicinity of the separated cluster, then small perturbations to the cluster position can dramatically affect which points in the cluster are closest to y (see Figure <ref type="figure">2</ref> for an illustration).</p><p>In this paper we overcome this issue by maintaining extra information about the geometry of the point set. First, for each long path, we store not only the quantized displacement vector (which preserves the "global" position of the subtree with respect to the rest of the tree) but also the suffix of the path. Intuitively, this allows us to recover both the most significant bits and the least significant bits of points in the subtree corresponding to the "separated" clusters, which allows us to avoid cases as depicted in Figure <ref type="figure">2</ref>. However, this intuition breaks down when the diameter of the cluster is much larger than the amount of "separation". Thus we also need to store extra information about the position of the subtree points. This is accomplished by storing a hashed representation of a representative point of the subtree (called "the center"). We note that this modification makes our data structure inherently randomized; in contrast, the data structure of <ref type="bibr">Indyk and Wagner (2017)</ref> was deterministic.</p><p>Given the above information, the approximate nearest neighbor search is performed top down, as follows. In each step, we recover and enumerate points in the current subtree, some of which could be centers of "separated" clusters as described above. The "correct" center, guaranteed to contain an approximate nearest neighbor of the query point, is identified by its hashed value (if no hash match is found, then any center is equally good). Note that our data structure does not allow us to compute all distances from the query point y to all points in X (in fact, as mentioned earlier, this task is not possible to achieve within the desired space bound). Instead, it stores just enough information to ensure that the procedure never selects a "wrong" subtree to iterate on.</p><p>Lastly, suppose we also wish to estimate all distances from y to X. To this end, we augment each subtree with the distance sketches due to <ref type="bibr">Kushilevitz et al. (2000)</ref> and <ref type="bibr">Johnson and Lindenstrauss (1984)</ref>. The former allows us to identify the cluster of all approximate nearest neighbors of y (whereas the above algorithm was only guaranteed to return one approximate nearest neighbor). The latter stores the approximate distance from that cluster. These are the smallest distances from y to X, which are the most challenging to estimate; the remaining distances can be estimated based on the hierarchical partition into well-separated clusters, which is already present in the sketch. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Formal Problem Statements</head><p>We formalize the problems in terms of one-way communication complexity. The setting is as follows. Alice has n data points, X = {x 1 , . . . , x n } &#8834; {-&#934; . . . &#934;} d , while Bob has q query points, Y = {y 1 , . . . , y q } &#8834; {-&#934; . . . &#934;} d , where 1 &#8804; q &#8804; n. Distances are Euclidean, and we can assume w.l.o.g. that d &#8804; n.<ref type="foot">foot_2</ref> Let , &#948; &#8712; (0, 1) be given parameters. In the one-way communication model, Alice computes a compact representation (called a sketch) of her data points and sends it to Bob, who then needs to report the output. We define two problems in this model (with private randomness), each parameterized by n, q, d, &#934;, , &#948;:<ref type="foot">foot_3</ref> </p><p>Problem 1 -All-nearest-neighbors: Bob needs to report a (1 + )-approximate nearest neighbor in X for all his points simultaneously, with probability 1 -&#948;. That is, for every j &#8712; [q], Bob reports an index i j &#8712; [n] such that</p><p>Our upper bound for this problem is stated in Theorem 1.1.</p><p>Problem 2 -All-cross-distancess: Bob needs to estimate all distances x i -y j up to distortion (1 &#177; ) simultaneously, with probability 1 -&#948;. That is, for every i &#8712; [n] and j &#8712; [q], Bob reports an estimate E(i, j) such that</p><p>Our upper and lower bounds for this problem are stated in Theorems 1.2 and 1.3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Basic Sketch</head><p>In this section we describe the basic data structure (generated by Alice) used for all of our results.</p><p>The data structure augments the representation from <ref type="bibr">Indyk and Wagner (2017)</ref>, which we will now reproduce. For the sake of readability, the notions from the latter paper (tree construction via hierarchical clustering, centers, ingresses and surrogates) are interleaved with the new ideas introduced in this paper (top-out compression, grid quantization and surrogate hashing). Proofs in this section are deferred to Appendix A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Hierarchical Clustering Tree</head><p>The sketch consists of an annotated hierarchical clustering tree, which we now describe with our modified "top-out compression" step.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Tree construction</head><p>We construct the inter-link hierarchical clustering tree of X: In the bottom level (numbered 0) every point is a singleton cluster, and level &gt; 0 is formed from level -1 by recursively merging any two clusters whose distance is at most 2 , until no two such clusters are present. We repeat this until level log(2 &#8730; d&#934;) , even if all points in X are already joined in one cluster at a lower level. The following observation is immediate.</p><p>Notation Let T * denote the tree. For every tree node v, we denote its level by (v), its associated cluster by C(v) &#8834; X, and its cluster diameter by &#8710;(v). For a point x i &#8712; X, let leaf(x i ) denote the tree leaf whose associated cluster is {x i }.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Top-out compression</head><p>The degree of a node in T * is its number of children. A 1-path with k edges in T * is a downward path u 0 , u 1 , . . . , u k , such that (i) each of the nodes u 0 , . . . , u k-1 has degree 1, (ii) u k has degree either 0 or more than 1, (iii) if u 0 is not the root of T * , then its ancestor has degree more than 1.</p><p>For every node v denote &#923;(v) := log(&#8710;(v)/(2 (v) )). If v is the bottom of a 1-path with more than &#923;(v) edges, we replace all but the bottom &#923;(v) edges with a long edge, and annotate it by the length of the path it represents. More precisely, if the downward 1-path is u 0 , . . . , u k = v and k &gt; &#923;(v), then we connect u 0 directly to u k-&#923;(v) by the long edge, and the nodes u 1 , . . . , u k-&#923;(v)-1 are removed from the tree, and the long edge is annotated with length k -&#923;(v).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lemma 3.2</head><p>The compressed tree has O(n log(1/ )) nodes.</p><p>We henceforth refer only to the compressed tree, and denote it by T . However, for every node v in T , (v) continues to denote its level before compression (i.e., the level where the long edges are counted according to their lengths). We partition T into subtrees by removing the long edges. Let F (T ) denote the set of subtrees.</p><p>Lemma 3.3 Let v be the bottom node of a long edge, and x, x &#8712; C(v). Then x -x &#8804; 2 (v) . Lemma 3.4 Let u be a leaf of a subtree in F (T ), and x, x &#8712; C(u). Then x -x &#8804; 2 (u) .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Surrogates</head><p>The purpose of annotating the tree is to be able to recover a list of surrogates for every point in X. A surrogate is a point whose location approximates x. Since we will need to compare x to a new query point, which is unknown during sketching, we define the surrogates to encompass a certain amount information about the absolute point location, by hashing a coarsened grid quantization of a representative point in each subtree.</p><p>Centers With every tree node v we associate an index c(v) &#8712; [n] such that x c(v) &#8712; C(v), and we call x c(v) the center of C(v). The centers are chosen bottom-up in T as follows. For a leaf v, C(v) contains a single point x i &#8712; X, and we set c(v) = i. For a non-leaf v with children u</p><p>Ingresses Fix a subtree T &#8712; F (T ). To every node u in T , except the root, we will now assign an ingress node, denoted in(u). Intuitively this is a node in the same subtree whose center is close to u, and the purpose is to store the location of u by its quantized displacement from that center (whose location will have been already stored, by induction).</p><p>We will now assign ingresses to all children of a given node v. (Doing this for every v in T defines ingresses for all nodes in T except its root.) Let u 1 , . . . , u k be the children of v, and w.l.o.g. c(v) = c(u 1 ). Consider the graph H v whose nodes are u 1 , . . . , u k , and u i , u j are neighbors if there are points x &#8712; C(u i ) and x &#8712; C(u j ) such that x -x &#8804; 2 (v) . By the tree construction, H v is connected. We fix an arbitrary spanning tree &#964; (v) of H v which is rooted at u 1 .</p><p>For u 1 we set in(u 1 ) := v. For u i with i &gt; 1, let u j be its (unique) direct ancestor in the tree &#964; (v). Let x &#8712; C(u j ) be the closest point to C(u i ) in C(u j ). Note that in T there is a downward path from u j to leaf(x). Let u x be the bottom node in that path that belongs to T . (Equivalently, u x is the bottom node on that downward path that is reachable from u without traversing a long edge.) We set in(u i ) := u x .</p><p>Grid net quantization Assume w.l.o.g. that &#934; is a power of 2. We define a hierarchy of grids aligned with {-&#934; . . . &#934;} d as follows. We begin with the single hypercube whose corners are (&#177;&#934;, . . . , &#177;&#934;) d . We generate the next grid by halving along each dimension, and so on. For every &#947; &gt; 0, let N &#947; be the coarsest grid generated, whose cell side is at most &#947;/ &#8730; d. Note that every cell in N &#947; has diameter at most &#947;. For a point x &#8712; R d , we denote by N &#947; [x] the closest corner of the grid cell containing it.</p><p>We will rely on the following fact about the intersection size of a grid and a ball; see, for example, <ref type="bibr">Har-Peled et al. (2012)</ref>.</p><p>Claim 3.5 For every &#947; &gt; 0, the number of points in N &#947; at distance at most 2&#947; from any given point, is at most O(1) d .</p><p>Surrogates Fix a subtree T &#8712; F (T ). With every node v in T we will now associate a surrogate s * (v) &#8712; R d . Define the following for every node v in T :</p><p>otherwise.</p><p>The surrogates are defined by induction on the ingresses. Induction base: For the root v of T we set s * (v</p><p>Induction step: For a non-root v we denote the quantized displacement of c(v) from its ingress by &#951;</p><p>Hash functions For every level in the tree, we pick a hash function H : N 2 &#8594; [m], from a universal family <ref type="bibr">(Carter and Wegman (1979))</ref>, where</p><p>The O(1) term is the same constant from Claim 3.5 above. For every subtree root v, we store its hashed surrogate</p><p>). We also store the description of each hash function H for every level .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Sketch Size</head><p>The sketch contains the tree T , with each node v annotated by its center c(v), ingress in(u), precision &#947;(v) and quantized displacement &#951;(v) (if applicable). For subtree roots we store their hashed surrogate, and for long edges we store their length. We also store the hash functions {H }.</p><p>Lemma 3.7 The total sketch size is</p><p>As a preprocessing step, Alice can reduce the dimension of her points to O( -2 log(qn/&#948;)) by a Johnson-Lindenstrauss projection. She then augments the sketch with the projection, in order for Bob to be able to project his points as well. By <ref type="bibr">Kane et al. (2011)</ref>, the projection can be stored with O(log d + log(q/&#948;) &#8226; log log((q/&#948;)/ )) bits. This yields the sketch size stated in Theorem 1.1.</p><p>Remark Both the hash functions and the projection map can be sampled using public randomness. If one is only interested in the communication complexity, one can use the general reduction from public to private randomness due to <ref type="bibr">Newman (1991)</ref>, which replaces the public coins by augmenting O(log(nd&#934;)) bits to the sketch (since Alice's input has size O(nd&#934;) bits). The bound in Theorem 1.1 then improves to O n log n&#8226;log(1/ ) 2 + log log &#934; + log q &#948; + log &#934; bits, and the bound in Theorem 1.2 improves to O n 2 log n &#8226; log(1/ ) + log(d&#934;) log q &#948; bits. However, that reduction is non-constructive; we state our bounds so as to describe explicit sketches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Approximate Nearest Neighbor Search</head><p>We now describe our approximate nearest neighbor search query procedure, and prove Theorem 1.1. Suppose Bob wants to report a (1 + )-approximate nearest neighbor in X for a point y &#8712; Y .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Algorithm Report Nearest Neighbor:</head><p>1. Start at the subtree T &#8712; F (T ) that contains the root of T .</p><p>2. Recover all surrogates {s * (v) : v &#8712; T }, by the subroutine below.</p><p>3. Let v be the leaf of T that minimizes y -s * (v) .</p><p>4. If v is the head of a long edge, recurse on the subtree under that long edge. Otherwise v is a leaf in T , and in that case return c(v).</p><p>Subroutine Recover Surrogates: This is a subroutine that attempts to recover all surrogates {s * (v) : v &#8712; T } in a given subtree T &#8712; F (T ), using both Alice's sketch and Bob's point y.</p><p>Observe that to this end, the only information missing from the sketch is the root surrogate s * (r), which served as the induction base for defining the rest of the surrogates. The induction steps are fully defined by (v), in(v), &#947;(v), and &#951;(v), which are stored in the sketch for every node v = r in the subtree. The missing root surrogate was defined as s * (r) = N 2 (r) [x c(r) ]. Instead, the sketch stores its hashed value H (r) (N 2 (r) [x c(r) ]) and the hash function H (r) . <ref type="foot">4</ref>The subroutine attempts to reverse the hash. It enumerates over all points p &#8712; N 2 (r) such that p -y &#8804; 2 &#8226; 2 (r) . For each p it computes H (r) (p). If H (r) (x c(r) ) = H (r) (p) then it sets s * (r) = p and recovers all surrogates accordingly. If either no p, or more than one p, satisfy H (r) (x c(r) ) = H (r) (p), then it proceeds with s * (r) set to an arbitrary point (say, the origin in R d ).</p><p>Analysis. Let r 0 , r 1 , . . . be the roots of the subtrees traversed on the algorithm. Note that they reside on a downward path in T .</p><p>Let t be the smallest such that r t satisfies x c(rt) -y &gt; 2 (rt) . (The algorithm does not identify t, but we will use it for the analysis.) Lemma 4.2 With probability 1 -&#948;/q, for every i = 0, . . . , t -1 simultaneously, the subroutine recovers s * (r i ) correctly as N 2 (r) [x c(r) ]. (Consequently, all surrogates in the subtree rooted by r i are also recovered correctly.)</p><p>Proof Fix a subtree T &#8712; F (T ) rooted in r, that satisfies y -x c(r) &#8804; 2 (r) . Since x c(r)s * (r) &#8804; 2 (r) (by Lemma 3.6), we have y -s * (r) &#8804; 2 &#8226; 2 (r) . Hence the surrogate recovery subroutine tries s * (r) as one of the hash pre-image candidates, and will identify that H (r) (s * (r)) matches the hash stored in the sketch. Furthermore, by Claim 3.5, the number of candidates is at most O(1) d . Since the range of</p><p>&#8226; q/&#948;, then with probability 1 -&#948;/(q log(2 &#8730; d&#934;)) there are no collisions, and s * (r) is recovered correctly. The lemma follows by taking a union bound over the first t subtrees traversed by the algorithm, i.e. those rooted by r i for i = 0, 1, . . . , t -1. Noting that t is upper-bounded by the number of levels in the tree, log(2 &#8730; d&#934;), we get that all the s * (r i )'s are recovered correctly simultaneously with probability 1 -&#948;/q.</p><p>From now on we assume that the event in Lemma 4.2 succeeds, meaning in steps 0, 1, . . . , t -1, the algorithm recovers all surrogates correctly. We henceforth prove that under this event, the algorithm returns a (1 + )-approximate nearest neighbor of y. In what follows, let x * &#8712; X be a fixed true nearest neighbor of y in X.</p><p>Lemma 4.3 Let T &#8712; F (T ) be a subtree rooted in r, such that x * &#8712; C(r). Let v a leaf of T that minimizes y -s * (v) . Then either x * &#8712; C(v), or every z &#8712; C(v) is a (1 + O( ))-approximate nearest neighbor of y.</p><p>Proof Suppose w.l.o.g. by scaling that &lt; 1/6. If x * &#8712; C(v) then we are done. Assume now that x * &#8712; C(u) for a leaf u = v of T . Let := max{ (v), (u)}. We start by showing that y -x * &gt; 1 4 &#8226; 2 . Assume by contradiction this is not the case. Since u is a subtree leaf and x * &#8712; C(u), we have x * -x c(u) &#8804; 2 by Lemma 3.4. We also have x c(u) -s * (u) &#8804; 2 by Lemma 3.6. Together, y -s * (u) &#8804; ( 1 4 + 2 )2 . On the other hand, by the triangle inequality, y -s * (v) &#8805; x * -x c(v) -y -x * -x c(v) -s * (v) . Noting that x * -x c(v) &#8805; 2 (by Lemma 3.1, since x * and x c(v) are separated at level ), y -x * &#8804; 1 4 &#8226; 2 (by the contradiction hypothesis) and x c(v) -s * (v) &#8804; 2 (by Lemma 3.6), we get y -s</p><p>. This contradicts the choice of v.</p><p>The lemma now follows because for every z &#8712; C(v),</p><p>where ( <ref type="formula">1</ref>) and ( <ref type="formula">3</ref>) are by the triangle inequality, ( <ref type="formula">2</ref>) is since y -s * (v) &#8804; y -s * (u) by choice of v, (4) is by Lemmas 3.4 and 3.6, and ( <ref type="formula">5</ref>) is since we have shown that y -x * &gt; 1 4 &#8226; 2 . Therefore z is a (1 + 16 )-approximate nearest neighbor of y.</p><p>Proof of Theorem 1.1. We may assume w.l.o.g. that is smaller than a sufficiently small constant. Suppose that the event in Lemma 4.2 holds, hence all surrogates in the subtrees rooted by r 0 , r 1 , . . . , r t-1 are recovered correctly. We consider two cases. In the first case, x * / &#8712; C(r t ). Let i &#8712; {1, . . . , t} be the smallest such that x * / &#8712; C(r i ). By applying Lemma 4.3 on r i-1 , we have that every point in C(r i ) is a (1 + O( ))-approximate nearest neighbor of y. After reaching r i , the algorithm would return the center of some leaf reachable from r i , and it would be a correct output.</p><p>In the second case, x * &#8712; C(r t ). We will show that every point in C(r t ) is a (1 + O( ))approximate nearest neighbor of y, so once again, once the algorithm arrives at r t it can return anything. By Lemma 3.3, every x &#8712; C(r t ) satisfies</p><p>x -x * &#8804; 2 (rt) .</p><p>(6)</p><p>In particular, x c(rt) -x * &#8804; 2 (rt) . By definition of t we have x c(rt) -y &gt; 2 (rt) . Combining the two yields y - rt) . Combining this with eq. ( <ref type="formula">6</ref>), we find that every x &#8712; C(r t ) satisfies x -x * &#8804; 1-y -x * , and hence y -x &#8804; (1 + 2 ) y -x * (for &#8804; 1/2). Hence x is a (1 + 2 )-nearest neighbor of y.</p><p>The proof assumes the event in Lemma 4.2, which occurs with probability 1 -&#948;/q. By a union bound, the simultaneous success probability of the q query points of Bob is 1 -&#948; as required.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Distance Estimation</head><p>We now prove theorem 1.2. To this end, we augment the basic sketch from Section 3 with additional information, relying on the following distance sketches due to Achlioptas (2001) (following <ref type="bibr">Johnson and Lindenstrauss (1984)</ref>) and <ref type="bibr">Kushilevitz et al. (2000)</ref>. Lemma 5.2 <ref type="bibr">(Kushilevitz et al. (2000)</ref>) Let R &gt; 0 be fixed and let , &#948; &gt; 0. There is a randomized map sk R of vectors in R d into O( -2 log(1/&#948; )) bits, with the following guarantee. For every x, y &#8712; R d , given sk R (x) and sk R (y), one can output the following with probability 1 -&#948; :</p><p>We augment the basic sketch from Section 3 as follows. We sample a matrix M from Lemma 5.1, with &#948; = &#948;/q. In addition, for every level in the tree T , we sample a map sk 2 from Lemma 5.2, with &#948; = &#948;/(q log(2 &#8730; d&#934;)). For every subtree root r in T , we store M x c(r) and sk 2 (r) (x c(r) ) in the sketch. Let us calculate the added size to the sketch:</p><p>Since there are O(n) subtree roots (cf. Lemma 3.7), storing M x c(r) for every r adds O(nd d&#934;) = O( -2 n log(q/&#948;) log(d&#934;)) bits to the sketch. In addition we store the matrix M , which takes O(d d) bits to store, which is dominated by the previous term.</p><p>&#8226; By Lemma 5.2, each sk 2 (r) (x c(r) ) adds O( -2 log(q log(2 &#8730; d&#934;)/&#948;)) bits to the sketch, and as above there are O(n) of these. In addition we store the map sk 2 (r) for every . Each map takes poly(d, log &#934;, log(q/&#948;), 1/ ) bits to store.</p><p>In total, we get the sketch size stated in Theorem 1.2. Next we show how to compute all distances from a new query point y.</p><p>Query algorithm. Given the sketch, an index k &#8712; [n] of a point in X, and a new query point y, the algorithm needs to estimate y -x k up to 1 &#177; O( ) distortion. It proceeds as follows.</p><p>1. Perform the approximate nearest neighbor query algorithm from Section 4. Let r 0 , r 1 , . . . be the downward sequence of subtree roots traversed by it.</p><p>2. For each r j , estimate from the sketch whether y -x c(r j ) &#8804; 2 (r j ) . This can be done by Lemma 5.2, since the sketch stores sk 2 (r j ) (x c(r j ) ) and also the map sk 2 (r j ) , with which we can compute sk 2 (r j ) (y).</p><p>3. Let t be the smallest j that satisfies y-x c(r j ) &gt; 2 (r j ) according the estimates of Lemma 5.2. (This attempts to recover from the sketch the same t as defined in the analysis in Section 4.)</p><p>4. Let t k &#8712; {0, . . . , t} be the maximal such that x k &#8712; C(r t k ).</p><p>(In words, r t k is the root of the subtree in which x k and y "part ways".)</p><p>5. If t k = t, return M y -M x c(rt) . Note that M and M x c(rt) are stored in the sketch.</p><p>6. If t k &lt; t, let v k be the bottom node on the downward path from r t k to leaf(x k ) that does not traverse a long edge. Return y -s * (v k ) .</p><p>Analysis. Fix a query point y. Define the "good event" A(y) as the intersection of the following:</p><p>1. For every subtree root r j traversed by the query algorithm above, the invocation of Lemma 5.2 on sk 2 (r j ) (x c(r j ) ) and sk 2 (r j ) (y) succeeds in deciding whether y -x c(r j ) &#8804; 2 (r j ) . Specifically, this ensures that y -x c(r j ) &#8804; 2 (r j ) for every j &lt; t, and y -x c(rt) &#8805; (1 -)2 (rt) .</p><p>Recalling that we invoked the lemma with &#948; = &#948;/(q log(2 &#8730; d&#934;)), we can take a union bound and succeed in all levels simultaneously with probability 1 -&#948;/q.</p><p>2. M y -M x c(rt) = (1 &#177; ) y -x c(rt) . By Lemma 5.1 this holds with probability 1 -&#948;/q.</p><p>Altogether, A(y) occurs with probability 1 -O(&#948;/q). Lemma 5.3 Conditioned on A(y) occuring, with probabiliy 1-&#948;/q, Lemma 4.2 holds. Namely, the query algorithm correctly recovers all surrogrates in the subtrees rooted by r j for j = 0, 1, . . . , t-1.</p><p>Proof The proof of Lemma 4.2 in Section 4 relied on having y -x c(r j ) &#8804; 2 (r j ) for every j &lt; t. Conditioning on A(y) ensures this holds.</p><p>Proof of Theorem 1.2. Let A * (y) denote the event in which both A(y) occurs and the conclusion of Lemma 4.2 occurs. By the above lemma, A * (y) happens with probability 1 -O(&#948;/q). From now on we will assume that A * (y) occurs, and conditioned on this, we will show that the distance from y to any data point can be deterministically estimated correctly. To this end, fix k &#8712; [n] and suppose our goal is to estimate y -x k . Let t k and v k be as defined by the distance query algorithm above. We handle the two cases of the algorithm separately.</p><p>Case I: t k = t. This means x k &#8712; C(r t ). By Lemma 3.3 we have x k -x c(rt) &#8804; 2 (rt) . By the occurance of A * (y) we have y -x c(rt) &gt; (1 -)2 (rt) . Together, y -</p><p>. This means that y -x c(rt) is a good estimate for y -x k . Since A * (y) occurs, it holds that M y -M x c(rt) = (1 &#177; ) y -x c(rt) , hence M y -M x c(rt) is also a good estimate for y -x k , and this is what the algorithm returns.</p><p>Case II: t k &lt; t. Let T t k be the subtree rooted by r t k . By the occurance of A * (y), all surrogates in T t k are recovered correctly, and in particular s * (v k ) is recovered correctly. By Lemma 3.6 we have</p><p>, and by Lemma 3.4 (noting that</p><p>Let v be the leaf in T t k that minimizes y -s * (v) (over all leaves of T t k ). Equivalently, v is the top node of the long edge whose bottom node is r t k +1 . Let := max{ (v), (v k )}. By choice of t k we have v = v k , hence the centers of these two leaves are separated already at level , hence x c(v k ) -x c(v) &#8805; 2 by Lemma 3.1. By two applications of Lemma 3.6 we have</p><p>, which was shown above, yields</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>and this is what the algorithm returns.</head><p>Conclusion: Combining both cases, we have shown that for any query point y, all distances from y to X can be estimated correctly with probability 1 -O(&#948;/q). Taking a union bound over q queries, and scaling &#948; and appropriately by a constant, yields the theorem.</p><p>&#948; &lt; 1/n 2 . By a union bound over all i, j &#8712; [n] we can recover all of x 1 , . . . , x n simultaneously, and the theorem follows.</p><p>Definition C.3 (Augmented Indexing) In the Augmented Indexing problem AugInd(k, &#948;), Alice gets a vector A with k entries, whose elements are entries of a universe of size 20/&#948;. Bob gets an index i &#8712; [k], an element e, and the elements A(i ) for every i &lt; i. Bob needs to decide whether e = A(i), and succeed with probability 1 -&#948;.</p><p>Jayram and Woodruff <ref type="bibr">(2013)</ref> give a one-way communication lower bound of &#8486;(k log(1/&#948;)) for this problem. The main component in <ref type="bibr">Molinaro et al. (2013)</ref> is a modified one-way communication model, in which the protocol is allowed to abort with a substantially larger (constant) probability than it is allowed to err. We will refer to it simply as the abortion model and refer to <ref type="bibr">Molinaro et al. (2013)</ref> for the exact definition (which we will not require). They prove the same lower bound for Augmented Indexing.</p><p>Lemma C.4 (informal) In the abortion model, the one-way communication complexity of AugInd(k, &#948;) is &#8486;(k log(1/&#948;)).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.2.2. Variants of Augmented Indexing</head><p>We start by defining a variant of augmented indexing that will be suitable for out purpose.</p><p>Definition C.5 In the Matrix Augmented Indexing problem M atAugInd(k, m, &#948;), Alice gets a matrix A of order k &#215; m, whose entries are elements of a universe of size 1/&#948;. Bob gets indices i &#8712; [k] and j &#8712; [m], an element e, and the elements A(i, j ) for every j &lt; j . Bob needs to decide whether e = A(i, j), and succeed with probability 1 -&#948;.</p><p>This problem is clearly at least as difficult as AugInd(km, &#948;) from Definition C.3, since in the latter Bob gets more information (namely, if we arrange the vector A in AugInd(km, &#948;) as a k &#215; m matrix, then Bob gets all entries of A which lexicographically precede A(i, j)). We get the following immediate corollary from Lemma C.4. <ref type="bibr">Corollary C.6</ref> In the abortion model, the one-way communication complexity of M atAugInd(k, m, &#948;) is &#8486;(km log(1/&#948;)). <ref type="bibr">Molinaro et al. (2013)</ref> reformulate Augmented Indexing so that Alice's input is a set instead of vector. Similary, we reformulate Matrix Augmented Indexing as follows.</p><p>Definition C.7 Let m &gt; 0 and k &gt; 0 be integers, and &#948; &#8712; (0, 1). Partition the interval [m/&#948;] into m intervals I 1 , . . . , I m of size 1/&#948; each.</p><p>In the Augmented Set List problem AugSetList(k, m, &#948;), Alice gets a list of subsets S 1 , . . . , S k &#8834; [m/&#948;], such that each S i has size exactly m and contains exactly one element from each interval I 1 , . . . , I m . Bob gets an index i &#8712; [k], an element e &#8712; [m/&#948;] and a subset T of S i that contains exactly the elements of S i that are smaller than e. Bob needs to decide whether e &#8712; S i , and succeed with probability at least 1 -&#948;.</p><p>The equivalence to Matrix Augmented Indexing is not hard to show; the details are similar to <ref type="bibr">Molinaro et al. (2013)</ref> and we omit them here. By the equivalence, we get the following corollary from <ref type="bibr">Corollary C.6.</ref> this pruning rule to the quadtree of <ref type="bibr">Indyk et al. (2017)</ref> (instead of their "bottom-out" rule) is sufficient to obtain a sketch that provably supports out-of-sample approximate nearest neighbor queries. Thus, the sketching algorithm is nearly unchanged.</p><p>We remark that in Section 3 we introduced two additional modifications: grid-net quantization and surrogate hashing. These were required in order to prove Theorems 1.1 and 1.2, but in the framework of <ref type="bibr">Indyk et al. (2017)</ref> they turn out to be unnecessary: grid-net quantization is already organically built into the quadtree approach of <ref type="bibr">Indyk et al. (2017)</ref>, and surrogate hashing only served to avoid a O(log log n) factor in the sketch size (see footnote 4), but in <ref type="bibr">Indyk et al. (2017)</ref> this factor is tolerated anyway.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D.1. Sketching Algorithm Recap</head><p>For completeness, let us briefly describe the sketching algorithm of <ref type="bibr">Indyk et al. (2017)</ref> (the reader is referred to that paper for more formal details), with our modification. To this end, set &#923; = log 16d 1.5 log &#934; &#948; .</p><p>Suppose w.l.o.g. that &#934; is a power of 2. The sketching algorithm proceeds in three steps:</p><p>1. Random shifted grids: Impose a randomly shifted enclosing hypercube on the data points X.</p><p>More precisely, choose a uniformly random shift &#963; &#8712; {-&#934;, . . . , &#934;} d , and set the enclosing hypercube to be H</p><p>Since X &#8834; {-&#934;, . . . , &#934;} d , it is indeed enclosed by H. We then half H along every dimension to create a finer grid with 2 d cells, and proceed so (recursively halving every cell along every dimension) to create a hierarchy of nested grids, with log(4&#934;) + &#923; hierarchy levels. The top level is numbered &#934; + 2, which is the log the side length of H, and the next levels are decrementing, so that the grid cells in level have side length 2 .</p><p>2. Quadtree construction: Construct the quadtree which is naturally associated with the nested grids: the root corresponds to H, its children correspond to the non-empty cells of the next grid in the hierarchy (a cell is non-empty if it contains a point in X), and so on. Each tree edge is annotated by a bitstring of length d, that marks whether the child cell coincides with the bottom half (bit 0) or the top half (bit 1) of the parent cell in each dimension.</p><p>3. Middle-out compression: For every path of degree-1 tree nodes whose length is more than 2&#923;, we keep its top &#923; and bottom &#923;, and replace its remaining middle portion by a long edge. This removes the edge annotations of the middle section (this achieving compression). We label each long edge with the length of the path it replaces.</p><p>In the remainder of this section we prove the following.</p><p>Theorem D.1 The above algorithm, with the above setting of &#923;, runs in time &#213;(nd(log &#934; + &#923;)) and produces a sketch for the all-nearest-neighbors problem, whose size in bits is</p><p>The sketch size is the same as in <ref type="bibr">Indyk et al. (2017)</ref>, except that we keep at most 2&#923; instead of &#923; nodes per 1-path, which increases the sketch size by only a factor of 2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D.2. Basic lemmas</head><p>We start with some useful properties of the above sketch, which are analogous to lemmas from in Section 3. In the notation below, for a node v in the quadtree, C(v) denotes the subset of points in X that are contained in the grid cell associated with v. As in Section 3, the quadtree is partitioned into a set F (T ) of subtrees by removing the long edges.</p><p>Lemma D.2 (analog of Lemma 3.1) For every point x &#8712; X, with probability 1 -&#948;, the following holds. If z &#8712; R d is any point outside the grid cell that contains x in level of the quadtree, then</p><p>Proof The setting of &#923; is such that with probability 1 -&#948;, in every level of the quadtree, the grid cell that contains x also contains the ball at radius 8 -1 &#8226; 2 -&#923; &#8730; d around x. (This property is known as "padding".) The lemma is just a restatement of this property. See Lemma 1 and Equation (1) in <ref type="bibr">Indyk et al. (2017)</ref> for details.</p><p>Lemma D.3 (analog of Lemmas 3.3, 3.4, 3.6) Let v be a node in the quadtree, and x, x &#8712; R d points contained in the grid cell associated with v. Then x -x &#8804; 2 (v) &#8730; d.</p><p>Proof The grid cell associated with v is a hypercube with side 2 (v) and diameter 2 (v) &#8730; d.</p><p>Before proceeding let us make the following point about the quadtree.</p><p>Claim D.4 For every leaf v of the quadtree, C(v) contains a single point of X, and v is the bottom of a 1-path of length at least &#923;.</p><p>Proof Refining the quadtree grid hierarchy for log(4&#934;) levels ensures that each grid cell contains at most one point from X, and refining for &#923; additional levels ensures that each leaf is the bottom of a 1-path of length at least &#923;.</p><p>Claim D.5 Every subtree leaf in the quadtree is the bottom of a 1-path of length at least &#923;.</p><p>Proof If v is a leaf of the quadtree, this follows from Claim D.4. Otherwise this follows from middle-out compression.</p><p>Next we define centers and surrogates. Centers c(v) are chosen similarly to Section 3. The surrogate s * (v) of every tree node v is simply defined to be the "bottom-left" (i.e. minimal in all dimensions) corner of the grid cell associated with v.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D.3. Approximate Nearest Neighbor Search</head><p>Finally, we can describe the query algorithm and complete its analysis. Let y be a query point for which we need to report an approximate nearest neighbor from the sketch. The query algorithm is the same as in Section 4: starting with the subtree that contains the quadtree root, it recovers the surrogates in the current subtree and chooses the subtree v whose surrogate is the closest to y. If v is a quadtree leaf, its center is returned as the approximate nearest neighbor. Otherwise, the algorithm proceeds by recursion on the subtree under v.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Surrogate recovery</head><p>The difference is in the way we recover the surrogates of a given subtree. In Section 4 this was done using the surrogate hashes. Here we will use a simpler, deterministic surrogate recovery subroutine. Let s * (H) &#8712; R d the surrogate of the quadtree root. (We store this point explicitly in the sketch, and it will be convenient to think of it w.l.o.g. as the the origin in R d .) As observed in <ref type="bibr">Indyk et al. (2017)</ref>, for every tree node v, if we concatenate the bits annotating the edges on the path from the root to v, we get the binary expansion of the point s * (H) + s * (v). Therefore, we can recover s * (v) from the sketch, as long as the path from the root to v does not traverse a long edge.</p><p>If the path to v contains long edges (and thus missing bits in the binary expansion of s * (v)), the algorithm completes these bits from the binary expansion of y. Let r 0 , r 1 , . . . be the subtree roots traversed by the algorithm, and let T 0 , T 1 , . . . be the corresponding subtrees. Let t be the smallest such that the algorithm does not recover the surrogates in T t correctly (because the bits missing on the long edge connecting T t-1 to T t are not truly equal to those of y). As in Section 4, the query algorithm does not know t (it simply always assumes that the bits of y are the correct missing ones), but we will use it for analysis. Note that by definition of t, all surrogates in the subtrees rooted at r 0 , . . . , r t-1 are recovered correctly. Thus, the event from Lemma 4.2 holds deterministically.</p><p>Proof of Theorem D.1 Let x * &#8712; X be a fixed true nearest neighbor of y in X (chosen arbitrarily if there is more than one). We shall assume that the event in Lemma D.2 occurs for x * .  <ref type="bibr">(11)</ref> where ( <ref type="formula">7</ref>) and ( <ref type="formula">9</ref>) are by the triangle inequality, ( <ref type="formula">8</ref>) is since y -s * (v) &#8804; y -s * (u) by choice of v, (10) is by applying Lemma D.3 to each of the last four summands, and ( <ref type="formula">11</ref>) is since we have shown that y -x * &gt; -1 2 &#8730; d. Therefore z is a (1 + 4 )-approximate nearest neighbor of y.</p><p>Now we prove that the query algorithm returns an approximate nearest neighbor for y. We may assume w.l.o.g. that is smaller than a sufficiently small constant. We consider two cases. In the first case, x * / &#8712; C(r t ). Let i &#8712; {1, . . . , t} be the smallest such that x * / &#8712; C(r i ). By applying Lemma D.6 on r i-1 , we have that every point in C(r i ) is a (1 + O( ))-approximate nearest neighbor of y. After reaching r i , the algorithm would return the center of some leaf reachable from r i , and it would be a correct output.</p><p>In the second case, x * &#8712; C(r t ) contains a true nearest neighbor of y. We will show that every point in C(r t ) is a (1 + O( ))-approximate nearest neighbor of y, so once again, once the algorithm arrives at r t it can return anything. By definition of t, we know that y does not reside in the grid cell associated with r t . Since y does reside in that cell, we have y -x * &#8805; 8 -1 2 (rt)-&#923; &#8730; d by Lemma D.2. On the other hand, by Claim D.5, r t is the bottom of a 1-path of length at least &#923;, and therefore any two points in C(r t ) are contained in the same grid cell at level (r t ) -&#923;, whose diameter is 2 (rt)-&#923; &#8730; d. In particular, for every x &#8712; C(r t ) we have x -x * &#8804; 2 (rt)-&#923; &#8730; d &#8804; 1 8 y -x * . Altogether we get y -x &#8804; y -x * + x * -x &#8804; (1 + 1 8 ) y -x * , so x &#8712; C(r t ) is a (1 + )-approximate nearest neighbor of y in X.</p><p>The proof assumes the event in Lemma D.2 holds for x * , which happens with probability 1 -&#948;. To handle q queries, we can scale &#948; down to &#948;/q and take a union bound over the q nearest neighbors of the q query points</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>c 2018 P. Indyk &amp; T. Wagner.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_1"><p>We note, however, that a simplified version of this method, described inIndyk et al. (2017), was shown to have good empirical performance for nearest neighbor search.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_2"><p>Any N -point Euclidean metric can be embedded into N -1 dimensions.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_3"><p>Throughout we use [m] to denote {1, . . . , m}, for an integer m &gt; 0.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_4"><p>Note that fully storing the root surrogates is prohibitive: N 2 (r) has &#920;(2&#8730;d&#934;/2 (r) ) d cells, hence storing a cell ID takes &#8486;(d log d) bits, and since there can be &#8486;(n) subtree roots, this would bring the total sketch size to &#8486;(nd log d).</p></note>
		</body>
		</text>
</TEI>
