<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>On Efficient Range-Summability of IID Random Variables in Two or Higher Dimensions</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10378567</idno>
					<idno type="doi"></idno>
					<title level='j'>26th International Conference on Database Theory (ICDT 2023)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Huayi Wang Jingfan Meng</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[d-dimensional (for d > 1) efficient range-summability (dD-ERS) of random variables (RVs) is a fundamental algorithmic problem that has applications to two important families of database problems, namely, fast approximate wavelet tracking (FAWT) on data streams and approximately answering range-sum queries over a data cube. Whether there are efficient solutions to the dD-ERS problem, or to the latter database problem, have been two long-standing open problems. Both are solved in this work. Specifically, we propose a novel solution framework to dD-ERS on RVs that have Gaussian or Poisson distribution. Our dD-ERS solutions are the first ones that have polylogarithmic time complexities. Furthermore, we develop a novel k-wise independence theory that allows our dD-ERS solutions to have both high computational efficiencies and strong provable independence guarantees. Finally, we show that under a sufficient and likely necessary condition, certain existing solutions for 1D-ERS can be generalized to higher dimensions.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Efficient range-summability (ERS) of random variables (RVs) is a fundamental algorithmic problem that has been studied for nearly two decades <ref type="bibr">[5,</ref><ref type="bibr">22,</ref><ref type="bibr">6,</ref><ref type="bibr">16]</ref>. This problem has so far been defined only in one dimension (1D) as follows. Let X 0 , X 1 , &#8226; &#8226; &#8226; , X &#8710;-1 be a list of underlying RVs each of which has the same target distribution X. Here, the (index) universe size &#8710; is typically a large number (say &#8710; = 2 64 ). A 1D-ERS problem calls for the following oracle for answering range-sum queries over (realizations of) these underlying RVs. At initialization, the oracle chooses a random outcome &#969; from the sample space &#8486;, which mathematically determines the (values of the) realizations X 0 (&#969;), X 1 (&#969;), &#8226; &#8226; &#8226; , X &#8710;-1 (&#969;); here the phrase "mathematically determines" emphasizes that (an implementation of) the oracle does not actually realize these RVs (and pay the O(&#8710;) time cost) at initialization. Thereafter,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>XX:2 On Efficient Range-Summability of IID RVs in Multiple Dimensions</head><p>given any query range [l, u) &#8796; {l, l + 1, &#8226; &#8226; &#8226; , u -1} that lies in the universe [0, &#8710;), the oracle is required to return S[l, u) &#8796; u-1 i=l X i (&#969;), the sum of the realizations of all underlying RVs in the range. This requirement is called the consistency requirement, which is one of the two essential requirements for the ERS oracle. We will show that such an ERS oracle can be efficiently implemented using hash functions. With such an implementation, the outcome &#969; corresponds to the seeds of these hash functions.</p><p>The other essential requirement is correct distribution, which has two aspects. The first aspect is that the underlying RVs X 0 , X 1 , &#8226; &#8226; &#8226; , X &#8710;-1 each has the same target (marginal) distribution X. The second aspect is that these RVs should satisfy certain independence guarantees. Ideally, it is desired for these RVs to be mutually independent, but this comes at a high storage cost as we will elaborate shortly. In practice, another type of independence guarantee, namely k-wise independence (in the sense that any subset of k underlying RVs are independent), is good enough for most applications when k &#8805; 4. We will show that our solution for ERS in d &gt; 1 dimensions can provide k-wise independence guarantee at a small storage cost of O(log d &#8710;) for an arbitrarily large k.</p><p>A straightforward but naive way to answer a range-sum query, say over [l, u), is simply to sum up the realization of every underlying RV X l (&#969;), X l+1 (&#969;), &#8226; &#8226; &#8226; , X u-1 (&#969;) in the query range. This solution, however, has a time complexity of O(&#8710;) when ul is O(&#8710;). In contrast, an efficient solution should be able to do so with only O(polylog(&#8710;)) time complexity. Indeed, all existing ERS solutions <ref type="bibr">[2,</ref><ref type="bibr">5,</ref><ref type="bibr">22,</ref><ref type="bibr">6,</ref><ref type="bibr">16]</ref> have O(log &#8710;) time complexity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Related Work on 1D-ERS</head><p>There are in general two families of solutions to the ERS problem in 1D, following two different approaches. The first approach is based on error correction codes (ECC). Solutions taking this approach include BCH3 <ref type="bibr">[22]</ref>, EH3 <ref type="bibr">[5]</ref>, and RM7 <ref type="bibr">[2]</ref>. This approach has two drawbacks. First, it works only when the target distribution X is Rademacher. Second, although it guarantees 3-wise (in the case of BCH3 and EH3) or 7-wise (in the case of RM7) independence among the underlying RVs, almost all empirical independence beyond that is destroyed. In addition, RM7 is very slow in practice <ref type="bibr">[22]</ref>.</p><p>The second approach is based on a data structure called dyadic simulation tree (DST), which we will describe in Subsection 3.1. The DST-based approach was first briefly mentioned in <ref type="bibr">[6]</ref> and later fully developed in <ref type="bibr">[16]</ref>. The DST-based approach is better than the ECCbased approach in two aspects. First, it supports a wider range of target distributions including Gaussian, Cauchy, Rademacher <ref type="bibr">[16]</ref>, and Poisson (see Appendix C). Second, it provides stronger independence guarantees at a low computational cost. For example, when implemented using the tabulation hashing scheme <ref type="bibr">[25]</ref>, it guarantees 5-wise independence at a much lower computational cost than RM7 <ref type="bibr">[16]</ref>. We will describe a nontrivial generalization of this result to 2D in Section 4.</p><p>J. Meng and H. Wang and J. Xu and M. Ogihara XX:3 l j &lt; u j for each dimension j = 1, 2, &#8226; &#8226; &#8226; , d. We define [ &#8407; l, &#8407; u) as the dD rectangular range "cornered" by these two points in the sense</p><p>where &#215; is the Cartesian product.</p><p>A dD-ERS problem calls for the following oracle. At initialization, the oracle chooses an outcome &#969; that mathematically determines the realization X &#8407; i (&#969;) for each &#8407; i &#8712; [0, &#8710;) d . Thereafter, given any dD range [ &#8407; l, &#8407; u), the oracle needs to return in O(polylog(&#8710;)) time S[ &#8407; l, &#8407; u) &#8796; &#8407; i&#8712;[ &#8407; l,&#8407; u) X &#8407; i (&#969;), the sum of the realizations of all underlying RVs in this dD range. Unless otherwise stated, the vectors that appear in the sequel are assumed to be column vectors. We write them in boldface and with a rightward arrow on the top like in "&#8407; x".</p><p>Several 1D-ERS solutions have been proposed as an essential building block for efficient solutions to several database problems. In two such database problems that we will describe in Section 2, their 1D solutions, both proposed in <ref type="bibr">[7]</ref>, can be readily generalized to dD if their underlying 1D-ERS oracles can be generalized to dD. In fact, in <ref type="bibr">[17]</ref>, authors stated explicitly that the only missing component for their solutions of the 1D database problems to be generalized to 2D was an efficient 2D-ERS oracle where X is the Rademacher distribution (Pr[X = 1] = Pr[X = 0] = 0.5, aka. single-step random walk). However, until this paper, no solution to any dD-ERS problem for d &gt; 1 has been proposed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.3">Our dD-ERS Solutions</head><p>In this paper, we propose novel solutions to the two dD-ERS problems wherein the target distributions are Gaussian and Poisson respectively. We refer to these two problems as dD Gaussian-ERS and dD Poisson-ERS, respectively. Both solutions generalize the corresponding DST-based 1D-ERS solutions to higher dimensions and have a low time complexity of O(log d &#8710;) per range-sum query. Our dD Gaussian-ERS solution, in particular, is based on the Haar wavelet transform (HWT), since DST is equivalent to HWT when (and only when) the target distribution X is Gaussian, as will be shown in Subsection 3.2.</p><p>Furthermore, we identify a sufficient condition that, if satisfied by the target distribution X, guarantees that the corresponding DST-based 1D-ERS solution can be generalized to a dD-ERS solution. We will prove in Appendix A that Gaussian and Poisson are two "nice" distributions that satisfy this sufficient condition. We will also show that, for all such "nice" distributions (including those we might discover in the future), this generalization process (from 1D to dD) follows a universal algorithmic framework that can be characterized as the Cartesian product of d DSTs. We will also provide strong evidence that X "being nice" is likely necessary for this DST generalization (from 1D to dD) to be feasible (see <ref type="bibr">Section 5)</ref>.</p><p>Unfortunately, so far we have not found any "nice" distribution other than Gaussian and Poisson. Hence dD-ERS for other target distributions remains an open problem, and is likely not solvable by the (generalized) DST approach. We emphasize this is not a shortcoming of the DST approach: That we have obtained computationally efficient solutions in the cases of Gaussian and Poisson is already a pleasant surprise, as the dD-ERS problem has been open for nearly two decades. Furthermore, we will show that our dD Gaussian-ERS solution leads to computationally efficient solutions to both aforementioned database problems (to be described in Section 2), by answering their calls for a dD Gaussian-ERS or equivalent oracle.</p><p>Our dD Gaussian-ERS and Poisson-ERS solutions both support two different types of independence guarantees, at different storage costs. The first type is the ideal case in which the &#8710; d underlying RVs are mutually independent. As will be shown in Section 3, we can achieve this ideal case by paying O(T log d &#8710;) storage cost, where T is the total number of range-sum queries to be answered (i.e., O(log d &#8710;) storage cost per range query). The second type is also quite strong: The &#8710; d underlying RVs are k-wise independent, where the constant</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>XX:4</head><p>On Efficient Range-Summability of IID RVs in Multiple Dimensions k can be arbitrarily large. In Section 4, we propose a k-wise independence scheme that can provide the second type of guarantees by employing O(log d &#8710;) k-wise independent hash functions. Its storage cost is quite small: only O(log d &#8710;) for storing the seeds of these hash functions. We emphasize that the issue of how strong this independence guarantee (among the underlying RVs) needs to be affects only the storage cost of our Gaussian-ERS and Poisson-ERS solutions, and is orthogonal to all other issues described in earlier paragraphs such as the O(log d &#8710;) time complexity of both solutions and the sufficient and likely necessary condition for a DST-based dD-ERS solution to exist.</p><p>This k-wise independence scheme makes our dD-ERS solutions very practically useful for two reasons. First, such a k-wise independent hash function in practice requires a very short seed (not longer than a few kilobytes), and each hash operation can be computed in nanoseconds <ref type="bibr">[3,</ref><ref type="bibr">20]</ref>. Second, most applications of ERS only require the underlying RVs to be 4-wise independent <ref type="bibr">[7,</ref><ref type="bibr">17]</ref>.</p><p>The contributions of this work can be summarized as follows. First, we provide the first set of answers to the long-standing open question whether there is an efficient solution to any dD-ERS problem for d &gt; 1. Second, our Gaussian-ERS solution solves a long-standing open problem in data streaming that we will describe next. Third, our k-wise independence theory and hashing scheme make our dD ERS solutions very practically useful.</p><p>The rest of the paper is organized as follows. In Section 2, we describe two applications of our dD Gaussian-ERS solutions. In Section 3, we first describe our HWT-based Gaussian-ERS scheme in 1D, and then generalize it to 2D and dD. In Section 4, we describe our k-wise independence theory and scheme. In Section 5, we propose a sufficient and likely necessary condition on the target distribution for the DST approach to be generalized to dD. Finally, we conclude the paper in Section 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Applications of dD Gaussian-ERS</head><p>In this section, we introduce two important applications of our dD Gaussian-ERS solution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Fast Approximate Wavelet Tracking</head><p>The first application is to the problem of fast approximate wavelet tracking (FAWT) on data streams <ref type="bibr">[7,</ref><ref type="bibr">4]</ref>. We first introduce the FAWT problem in 1D <ref type="bibr">[7]</ref>, or 1D-FAWT for short. In this problem, the precise system state is comprised of a &#8710;-dimensional vector &#8407; s, each scalar of which is a counter. The precise system state at any moment of time is determined by a data stream, in which each data item is an update to one such counter (called a point update) or all counters in a 1D range (called a range update). In 1D-FAWT, &#8407; s is considered a &#8710;-dimensional signal vector that is constantly "on the move" caused by the updates in the data stream. Let &#8407; r be the (&#8710;-dimensional) vector of HWT coefficients of &#8407; s. Clearly, &#8407; r is also a "moving target". We denote as &#8407; r t the snapshot of &#8407; r at a time t. In 1D-FAWT, the goal is to closely track (the precise value of) &#8407; r over time using a sketch, in the sense that at moment t, we can recover from the sketch an estimate &#8407; r &#8242; t of &#8407; r t , such that &#8741;&#8407; r t -&#8407; r &#8242; t &#8741; 2 is small. An acceptable solution should use a sketch whose size (space complexity) is only O(polylog(&#8710;)), and be able to maintain the sketch with a computation time cost of O(polylog(&#8710;)) per point or range update.</p><p>The first solution to 1D-FAWT was proposed in <ref type="bibr">[7]</ref>. It requires the efficient computation of an arbitrary scalar in H&#8407; x, where H is the &#8710; &#215; &#8710; Haar matrix (to be defined in Subsubsection 3.2.1) and &#8407; x is a &#8710;-dimensional vector of 4-wise independent Rademacher RVs. A key step of this computation is to compute a range-sum of 4-wise independent Rademacher RVs (in 1D), that is used therein as a Tug-of-War (ToW) sketch <ref type="bibr">[1]</ref> for "sketching" the L 2 difference (approximation error) between the signal vector and its FAWT approximation. An aforementioned ECC-based ERS solution is used therein to tackle this Rademacher-ERS problem. Authors of <ref type="bibr">[17]</ref> stated that if they could find a solution to this Rademacher-ERS problem in dD, then the 1D-FAWT solution in <ref type="bibr">[7]</ref> would become a dD-FAWT solution. The first solution to dD-FAWT, proposed in <ref type="bibr">[4]</ref>, explicitly bypassed this ERS problem.</p><p>We note that the 1D-FAWT solution above continues to work, and its time and space complexities remain the same, if we replace the &#8407; x with a &#8710;-dimensional vector of 4-wise independent standard Gaussian RVs. This is because, with this replacement, the aforementioned ToW sketch becomes a Gaussian Tug-of-War (GToW) sketch (which maps a data item to a Gaussian RV instead of a Rademacher RV) <ref type="bibr">[10]</ref>, and ToW and GToW are known to have the same (&#1013;, &#948;) accuracy bound <ref type="bibr">[1,</ref><ref type="bibr">10]</ref> for sketching the L 2 norm of a data stream (used here for sketching the aforementioned L 2 difference). Based on this insight, our dD Gaussian-ERS solution can be used to construct a dD-FAWT solution as follows. We simply change, in the contingent dD-FAWT solution proposed in <ref type="bibr">[7]</ref>, the distribution of all &#8710; d underlying 4-wise independent RVs from Rademacher to Gaussian. With this replacement, this contingent solution will finally work, provided we can solve the resulting dD Gaussian-ERS problem. The latter problem is solved by our k-wise (with k = 4 here) independence scheme, to be described in Section 4. The resulting dD-FAWT solution has the same time and space complexity of O(log d &#8710;) as that proposed in <ref type="bibr">[4]</ref> for achieving the same accuracy guarantee.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Range-Sum Queries over Data Cube</head><p>Our second application is to the problem of approximately answering range-sum queries over a data cube <ref type="bibr">[8]</ref> that is similarly "on the move" propelled by the (point or range) updates that arrive in a stream. This problem can be formulated as follows. The precise system state is comprised of &#8710; d counters, namely &#963; &#8407; i for i &#8712; [0, &#8710;) d , that are "on the move". Given a range [ &#8407; l, &#8407; u) at moment t, the goal is to approximately compute the sum of counter values in this range C[ &#8407; l, &#8407; u) &#8796; &#8407; i&#8712;[ &#8407; l,&#8407; u) &#963; &#8407; i (t), where &#963; &#8407; i (t) is the value of the counter &#963; &#8407; i at moment t. A desirable solution to this problem in dD should satisfy three requirements (in which multiplicative terms related to the desired (&#1013;, &#948;) accuracy bound are ignored). First, any range-sum query is answered in O(polylog(&#8710;)) time. Second, its space complexity is O(polylog(&#8710;)). Third, every point or range update to the system state is processed in O(polylog(&#8710;)) time. It has been a long-standing open question whether there is a solution to this problem that satisfies all three requirements when d &gt; 1. For example, solutions producing exact answers (to the range queries) <ref type="bibr">[9,</ref><ref type="bibr">23,</ref><ref type="bibr">11]</ref> all require O(&#8710; d log &#8710;) space and hence do not satisfy the second requirement; and Haar+ tree <ref type="bibr">[13]</ref> works only on static data, and hence does not satisfy the third requirement.</p><p>In 1D, a solution that satisfies all three requirements (with d = 1) was proposed in <ref type="bibr">[7,</ref><ref type="bibr">6]</ref>. It involves 1D-ERS computations on 4-wise independent underlying RVs where the target distribution is either Gaussian or Rademacher, which are tackled using a DST-based (in <ref type="bibr">[6]</ref>) or a ECC-based (in <ref type="bibr">[7]</ref>) 1D-ERS solution, respectively. As shown in <ref type="bibr">[7,</ref><ref type="bibr">6]</ref>, this range-sum query solution can be readily generalized to dD if the ERS computations above can be performed in dD. This gap is again filled by our k-wise (k = 4) independence scheme for dD Gaussian-ERS, resulting in the first dD solution that satisfies all three requirements, all with O(log d &#8710;) (time or space) complexity (ignoring &#1013; and &#948; terms).</p><p>In the resulting dD solution, we maintain O(log(1/&#948;)/&#1013; 2 ) (independent instances of) sketches that each "sketches" the content (counter values) of the data cube. Here we describe only one such sketch, which we denote as A, since these sketches are statistically and</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>XX:6</head><p>On Efficient Range-Summability of IID RVs in Multiple Dimensions functionally identical. At any time t, A(t) should track the current system state, namely (&#963; &#8407; i (t))'s, as follows:</p><p>a set of &#8710; d 4-wise independent standard Gaussian underlying RVs that have one-to-one correspondences with the set of &#8710; d counters as follows: Each X &#8407; i is associated with a counter &#963; &#8407; i . If we implement these &#8710; d RVs using (an instance of) our dD Gaussian-ERS solution, then we can keep the value of A(t) up-to-date, with a time complexity of O(log d &#8710;) per point or range update (to the system state). Then, given a query range [ &#8407; l, &#8407; u) at time t, we estimate the range-sum of counters C[ &#8407; l, &#8407; u) from the sketch A(t) using A(t) &#8226; S[ &#8407; l, &#8407; u) as the estimator. These O(log(1/&#948;)/&#1013; 2 ) estimators, one obtained from each sketch, are then combined to produce a final estimation that has the following accuracy guarantee (that is the same as in the 1D case). With probability at least 1&#948;, the final estimation deviates from the actual value of</p><p>is the number of counters in the query range, and &#8741;&#963;(t</p><p>is the L 2 norm of the system state. Since each sketch uses an independent dD Gaussian-ERS scheme instance, our dD solution satisfies all three aforementioned requirements, all with O(log d &#8710; log(1/&#948;)/&#1013; 2 ) time and space complexity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">A Closer Comparison with Related Work</head><p>In this section, per referees' requests, we provide an in-depth comparison of this work with prior works on 1D-FAWT <ref type="bibr">[7,</ref><ref type="bibr">6]</ref>, on dD-FAWT <ref type="bibr">[4]</ref>, and on 1D data cube <ref type="bibr">[6]</ref>.</p><p>We start with explaining how the dD-FAWT solution proposed in <ref type="bibr">[4]</ref> manages to avoid confronting the dD-ERS problem. The dD-FAWT solution <ref type="bibr">[4]</ref> maintains ToW sketches for groups of wavelet coefficients in the wavelet domain. As explained earlier, each ToW sketch "measures" the L 2 norm (and hence the total energy by squaring) of such a group. By the property of HWT, each point or range update to the system state in the time domain translates into O(log d &#8710;) updates to the sketches the wavelet domain; we also use this property in our solution to keep its time complexity below O(log d &#8710;) as shown in Subsection 3.4. To solve the dD-FAWT using these sketches in the wavelet domain, we need only to identify the groups that are (hierarchical) "L 2 heavy hitters" <ref type="bibr">[4]</ref>. In <ref type="bibr">[4]</ref>, a binary search tree built on these sketches is used to search for such "L 2 heavy hitters" in O(log &#8710; &#8226; log log &#8710;???) time. Since this dD-FAWT solution <ref type="bibr">[4]</ref> does not involve computing the range sums of the Rademacher RVs underlying the ToW sketches, it does not need to formulate or solve any ERS problem.</p><p>In comparison, should we try to extend the 1D-FAWT solution proposed in <ref type="bibr">[7]</ref>, which maintains the ToW sketches in the time domain, to dD without the aforementioned Rademacherby-Gaussian replacement, the underlying Rademacher RVs would have to be efficiently range-summable to keep the time complexity of each point or range update to the sketches low. However, this appears to be a tall order for now: For d &gt; 1, no ECC-based Rademacher-ERS solution has ever been found as explained earlier, and a DST-based Rademacher-ERS solution is unlikely to exist, as we will show in Section 5 and Appendix B.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>JFM: I think you can "avoid" ERS by moving everything, including the input streams and the query ranges into the wavelet domain. That is exactly what our</head><p>algorithm is doing. Cormode used 4-wise independent ToW sketches. If they had proposed i.i.d. Gaussian sketches, then their solution would be identical to ours. We want to stress that although this idea might sound obvious, nobody has ever realized that this wavelet-based solution is also an ERS solution. A referee asked whether the 1D data cube solution proposed in <ref type="bibr">[7,</ref><ref type="bibr">6]</ref> can be extended to dD using the same ERS avoidance strategy of maintaining the sketches in the wavelet domain as used in <ref type="bibr">[4]</ref>. The answer to this question is negative for the following reason. The ToW or GToW sketches are used in all three FAWT and data cube solutions <ref type="bibr">[7,</ref><ref type="bibr">6,</ref><ref type="bibr">4]</ref>, whether the sketches are maintained in the time domain or in the wavelet domain. As explained in Subsection 2.2, to use such a sketch A(t) for answering a data cube query, we have to compute A(t) &#8226; S[ &#8407; l, &#8407; u), which involves computing the range sum S[ &#8407; l, &#8407; u).</p><p>Now we highlight a key difficulty that we believe has prevented authors of <ref type="bibr">[7,</ref><ref type="bibr">6,</ref><ref type="bibr">17]</ref> from solving the dD-ERS problem and extending their FAWT and data cube solutions from 1D to dD: The Rademacher or Gaussian RVs underlying the sketches need to be both 4-wise independent and efficiently range-summable. Authors of <ref type="bibr">[7,</ref><ref type="bibr">17]</ref> tried to extend a magic hash function, that induces such Rademacher RVs in 1D, to dD. However, as explained extending it to dD, a dD Rademacher-ERS solution is unlikely to exist. Authors of <ref type="bibr">[6]</ref> proposed the 1D-DST that laid the foundation of this work and our prior work <ref type="bibr">[16]</ref>. A key innovation of <ref type="bibr">[6]</ref> is that the 1D-ERS is achieved via a 1D-DST instead of a magic hash function. However, their DST-based 1D Gaussian-ERS solution still relies on a magic hash function, called Nisan's PRG (Pseudorandom Generator) <ref type="bibr">[19]</ref>, to provide 4-wise independence among the underlying Gaussian RVs. The use of Nisan's PRG <ref type="bibr">[19]</ref> however restricts the applicability and the extensibility of the 1D-DST approach, since Nisan's PRG provides independence guarantees only for memory-constrained applications such as data streaming <ref type="bibr">[10]</ref>. It is also not clear whether the 1D-DST approach powered by Nisan's PRG can be extended to dD. In comparison, in this work, the dD-DST (that exists for Gaussian and Poisson) data structure is engineered to provide k-wise independence among underlying RVs. Each underlying hash function, one the other hand, only needs to output k-wise independent hash values, which as mentioned earlier is easy to achieve. Note k-wise independence among underlying RVs is fundamentally different and much harder to attain than k-wise independence among hash values (produced by a hash function), since each underlying RV can be a weighted sum of O(log d &#8710;) such hash values, as we will elaborate in Section 4. JFM: I think the difficulty here is that the underlying RVs need to be range-summable, while the hash values are not.</p><p>Finally, we state a key difference between this work and <ref type="bibr">[7,</ref><ref type="bibr">6,</ref><ref type="bibr">17,</ref><ref type="bibr">4]</ref> with respect to wavelets. In this work, ERS is the end and wavelets is the means, whereas in <ref type="bibr">[7,</ref><ref type="bibr">6,</ref><ref type="bibr">17]</ref> it is the other way around. In <ref type="bibr">[4]</ref>, wavelets is the end, but <ref type="bibr">[4]</ref> cleverly avoids using ERS as the means as just explained.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Our Solution to dD Gaussian-ERS</head><p>In this section, we describe our dD Gaussian-ERS solution that answers a range-sum query in</p><p>To explain this solution with best clarity, for now we require it to provide the aforementioned ideal guarantee that the &#8710; d underlying RVs are mutually independent, with the understanding that this requirement affects only the space complexity of our solution. In the next section, this requirement will be relaxed to these RVs being k-wise independent, and as a result, the space complexity of our solution is reduced to O(log d &#8710;).</p><p>Our solution can be summarized as follows. Let &#8407; x denote the &#8710; d underlying standard Gaussian RVs, namely X &#8407; i for &#8407; i &#8712; [0, &#8710;) d , arranged (in the dictionary order of &#8407; i) into a &#8710; d -dimensional vector. Then, after the dD Haar wavelet transform (HWT) is performed on &#8407; x, we obtain another &#8710; d -dimensional vector &#8407; w whose scalars are the HWT coefficients of &#8407; x. Our solution builds on the following two observations. The first observation is that scalars in &#8407; x are i.i.d. standard Gaussian RVs if and only if scalars in &#8407; w are (see Theorem 2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>XX:8</head><p>On Efficient Range-Summability of IID RVs in Multiple Dimensions</p><p>(a) A general DST.</p><p>23:1 The second observation is that the answer to any dD range-sum query can be expressed as a weighted sum of O(log d &#8710;) scalars (HWT coefficients) in &#8407; w (see <ref type="bibr">Lemma 10)</ref>. Our algorithm is simply to generate and remember only these O(log d &#8710;) HWT coefficients (that participate in this range-sum query). Our solution satisfies the correct distribution requirement (with mutual independence guarantee) by the first observation. Since the first observation is true only when the target distribution is Gaussian, this HWT-based solution does not work for any other target distribution.</p><p>In the following, we first introduce the concept of the dyadic simulation tree (DST) in 1D in Subsection 3.1. Then, we show that 1D DST is equivalent to 1D HWT in the Gaussian case and present our HWT-based Gaussian-ERS algorithm for 1D, in Subsection 3.2. Finally, we describe our HWT-based Gaussian-ERS algorithms for 2D and dD in Subsection 3.3 and Subsection 3.4, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">A Brief Introduction to DST</head><p>In this section, we briefly introduce the concept of the DST, which as mentioned earlier was proposed in <ref type="bibr">[16]</ref> as a general solution approach to the one-dimensional (1D) ERS problems for arbitrary target distributions.</p><p>We say that [l, u) is a 1D dyadic range if there exist integers j &#8805; 0 and m &#8805; 0 such that l = j &#8226; 2 m and u = (j + 1) &#8226; 2 m . We call the sum on a dyadic range a dyadic range-sum. Note that any underlying RV X i is a dyadic range-sum (on the dyadic range [i, i + 1)). Let each underlying RV X i have standard Gaussian distribution N (0, 1). In the following, we focus on how to compute a dyadic range-sum, since any (general) 1D range can be "pieced together" using at most 2 log 2 &#8710; dyadic ranges <ref type="bibr">[22]</ref>. We illustrate the process of computing dyadic range-sums using a "small universe" example (with &#8710; = 4) shown in Figure <ref type="figure">1a</ref>. To begin with, the total sum of the universe S[0, 4) sitting at the root of the tree is generated directly from its distribution N (0, 4). Then, S[0, 4) is split into two children, the half-range-sums S[0, 2) and S <ref type="bibr">[2,</ref><ref type="bibr">4)</ref>, such that RVs S[0, 2) and S <ref type="bibr">[2,</ref><ref type="bibr">4)</ref> sum up to S[0, 4), are (mutually) independent, and each has distribution N (0, 2). This is done by generating the RVs S[0, 2) and S[2, 4) from a conditional (upon S[0, 4)) distribution that will be specified shortly. Afterwards, S[0, 2) is split in a similar way into two i.i.d. underlying RVs X 0 and X 1 , and so is S[2, 4) (into X 2 and X 3 ). As shown in Figure <ref type="figure">1a</ref>, the four underlying RVs are the leaves of the DST. We now specify the aforementioned conditional distribution used for each split. Suppose the range-sum to split consists of 2n underlying RVs, and that its value is equal to z. The lower half-range-sum S l (the left child in Figure <ref type="figure">1a</ref>) is generated from the following conditional pdf (or pmf):</p><p>where &#981; n (&#8226;) is the pdf (or pmf) of X * n , the n th convolution power of the target distribution,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>23:1</head><p>Figure <ref type="figure">2</ref> An illustration of the HWT formula &#8407; y = H4&#8407; x.</p><p>and &#981; 2n (&#8226;) is the pdf (or pmf) of X * 2n . Then, the upper half-range-sum (the right child) is defined as S u &#8796; z -S l . It was shown in <ref type="bibr">[16]</ref> that splitting a (parent) RV using this conditional distribution guarantees that the two resulting RVs S l and S u are i.i.d. This guarantee holds regardless of the target distribution. However, computationally efficient procedures for generating an RV S l with distribution f (x | z) are found only when the target distribution is one of the few "nice" distributions: Gaussian, Cauchy, and Rademacher as shown in <ref type="bibr">[16]</ref>, and Poisson as shown in Appendix C. Among them, Gaussian distribution has a nice property that an RV S l with distribution f (x | z) can be generated as a linear combination of z and a "fresh" standard Gaussian RV</p><p>Here, Y being "fresh" means it is independent of all other RVs. This linearly decomposable property has a pleasant consequence that every dyadic range-sum generated by this 1D Gaussian-DST can be recursively decomposed to a linear combination of some i.i.d. standard Gaussian RVs, as illustrated in Figure <ref type="figure">1b</ref>. In this example, let Y 0 , Y 1 , Y 2 and Y 3 be four i.i.d. standard Gaussian RVs. The total sum of the universe S[0, 4) is written as 2Y 0 , because they have the same distribution N (0, 4). Then, it is split into two half-range-sums S[0, 2) &#8796; Y 0 + Y 1 and S[2, 4) &#8796; Y 0 -Y 1 using the linear decomposition above with z = 2Y 0 and a fresh RV Y 1 . Finally, S[0, 2) and S <ref type="bibr">[2,</ref><ref type="bibr">4)</ref> are similarly split into the four underlying RVs using fresh RVs Y 2 and Y 3 , respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">HWT Representation of 1D Gaussian-DST</head><p>In this section, we show that when the target distribution is Gaussian, a DST is mathematically equivalent to a Haar wavelet transform (HWT) in the 1D case. We will also show that this equivalence carries over to higher dimensions. Note that this equivalence does not apply to any target distribution other than Gaussian, and hence the HWT representation cannot replace the role of DST in general. In the following, we describe in Subsubsection 3.2.2 our HWT-based 1D Gaussian-ERS solution that has O(log &#8710;) time complexity, after making some mathematical preparations in Subsubsection 3.2.1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1">Mathematical Preliminaries</head><p>It is not hard to verify that, if we apply HWT (to be specified soon) to the four underlying RVs shown in Figure <ref type="figure">1b</ref>, namely </p><p>, and H 4 is the 4 &#215; 4 Haar matrix H 4 . This example is illustrated as a matrix-vector multiplication in Figure <ref type="figure">2</ref>.</p><p>The above example in which &#8710; = 4 can be generalized to an arbitrary universe size &#8710; (that is a power of 2) as follows. In general, HWT is defined as &#8407; w = H &#8710; &#8407; x, where &#8407; w and &#8407; x are both &#8710;-dimensional vectors, and H &#8710; is a &#8710; &#215; &#8710; Haar matrix. To simplify notations,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>XX:10 On Efficient Range-Summability of IID RVs in Multiple Dimensions</head><p>we drop the subscript &#8710; in the sequel. In wavelet terms, &#8407; x is called a discrete signal vector and &#8407; w is called the HWT coefficient vector. Clearly, the i th HWT coefficient is the inner product between &#8407; x and the i th row of H, for i = 0, 1, &#8226; &#8226; &#8226; , &#8710; -1. In the wavelet theory, we index each HWT coefficient as</p><p>where m + &#8796; max{0, m}) in the dictionary order of (m, j), and refer to the corresponding row (transposed into a column vector) in H that computes W m j as the HWT vector &#8407; &#968; m j . Hence we have W m j &#8796; &#10216;&#8407; x, &#8407; &#968; m j &#10217; by definition. In wavelet terms, parameter m is called scale and parameter j is called location. In Figure <ref type="figure">2</ref>, the 4 HWT coefficients and 4 HWT vectors from top to bottom are on 3 different scales (-1, 0, and 1) and are "assigned" 3 different colors accordingly.</p><p>We define the indicator vector of a 1D range R, denoted as 1 R , as a &#8710;-dimensional 0-1 vector, the i th scalar of which takes value</p><p>Throughout this paper, the indicator vectors are the only vectors that are not written in boldface with a rightward arrow on the top. We now specify the HWT vectors. Every HWT vector &#8407;</p><p>is special: Its corresponding coefficient W -1 0 reflects the scaled (by &#8710; -1/2 ) range-sum of the entire universe, whereas every other HWT coefficient is the (scaled) difference of two rangesums. Every other HWT vector &#8407; &#968; m j , for m = 0, 1, &#8226; &#8226; &#8226; , log 2 &#8710; -1 and j = 0, 1, &#8226; &#8226; &#8226; , 2 m -1, corresponds to the dyadic range I m j &#8796; [j&#8710;/2 m , (j + 1)&#8710;/2 m ) in the sense the latter serves as the support of the former: &#8407; &#968; m j is defined by setting the first half of I m j to the value 2 m /&#8710;, the second half of I m j to the value -2 m /&#8710;, and the rest of the universe [0, &#8710;) \ I m j to the value 0. Note that &#8407; &#968; m j has the same number of scalars with value 2 m /&#8710; as those with value -2 m /&#8710;, so &#10216; &#8407; &#968; m j , 1 I m j &#10217; = 0. From the definition above, H is known to be orthonormal <ref type="bibr">[18]</ref>, so the following theorem can be applied to it.</p><p>&#9654; Theorem 1 <ref type="bibr">([14]</ref>). Let M be an n &#215; n matrix. If M is orthonormal, then it has the following two properties: 1. M T = M -1 , and M T is also orthonormal.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2.</head><p>Given any two n-dimensional vectors &#8407; x, &#8407; y, we have &#10216;&#8407; x, &#8407; y&#10217; = &#10216;M&#8407; x, M&#8407; y&#10217;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Let &#8407;</head><p>w be a &#8710;-dimensional vector of i.i.d. standard Gaussian RVs. We mathematically define the vector of underlying RVs &#8407; x = (X 0 , X 1 , &#8226; &#8226; &#8226; , X &#8710;-1 ) T as &#8407; x &#8796; H T &#8407; w. Hence, we have &#8407; w = H&#8407; x by the first property in Theorem 1. The underlying RVs defined this way are i.i.d. standard Gaussian, by the following theorem.</p><p>&#9654; Theorem 2 (Proposition 3.3.2 in <ref type="bibr">[26]</ref>). Let &#8407; x = M &#8407; w where M is an orthonormal matrix. Then &#8407; x is a vector of i.i.d. standard Gaussian RVs if and only if &#8407; w is.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2">Our HWT-based Algorithm for 1D-ERS</head><p>Given any range [l, u), we compute its range-sum S[l, u) as &#10216;&#8407; w, H1 [l,u) &#10217;, which is the sum of the HWT coefficients in &#8407; w weighted by the scalars in H1 [l,u) . This weighted sum can be computed in O(log &#8710;) time, because, by Theorem 3, the &#8710;-dimensional vector H1 [l,u) contains only O(log &#8710;) nonzero scalars (weights), and by Remark 5, for each such scalar, its index can be located and its value computed in O(1) time. We refer to the O(log &#8710;) corresponding scalars in &#8407; w whose weights are nonzero as participating HWT coefficients in the sequel.</p><p>To provide the aforementioned ideal guarantee of mutual independence (among the &#8710; underlying RVs), for each such participating HWT coefficient (which is a standard Gaussian RV), we generate the RV and remember its realization (in memory) if it has never been generated before (say for answering an earlier range-sum query), or retrieve its realization from memory otherwise. The space complexity of this algorithm is O(min{T log &#8710;, &#8710;}), since each of the T range-sum queries involves O(log &#8710;) participating HWT coefficients. This algorithm satisfies the aforementioned consistency requirement, because &#10216;&#8407; w, Proof. On scale m = -1, there is only one HWT coefficient anyway, so the claim trivially holds. We next prove the claim for any fixed m &#8805; 0. For each HWT vector &#8407; </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3.</head><p>Otherwise, I m j partially intersects [l, u). This case may happen only to at most two (I m j )'s: the one that covers l and the one that covers u -1. In this case, r m j can be nonzero. &#9664; &#9654; Remark 5. Each scalar r m j (in H1 [l,u) ) that may be nonzero can be identified and computed in O(1) time as follows. Note r m j may be nonzero only in the case (3) above, in which j is equal to either &#8970;l2 m /&#8710;&#8971; or &#8970;(u -1)2 m /&#8710;&#8971;. As a result, if r m j &#824; = 0, its value can be computed in two steps <ref type="bibr">[23]</ref>. First, intersect [l, u) with the first half and the second half of I m j , respectively. Second, scale the size of the first intersection minus the size of the second by 2 m /&#8710;, as was explained by the third last sentence above Theorem 1.</p><p>The following lemma is a special case of Lemma 4 where l = u -1. This lemma holds, because in case (3) above, for m = -1, 0, 1, &#8226; &#8226; &#8226; , log 2 &#8710; -1, there exists a unique dyadic interval I m j that covers l = u -1 (namely, the one with j = &#8970;l2 m /&#8710;&#8971;).</p><p>&#9654; Lemma 6. Given any l &#8712; [0, &#8710;), H1 {l} has exactly one nonzero scalar on each scale.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Range-Summable Gaussian RVs in 2D</head><p>In the following, we describe in Subsubsection 3.3.2 our 2D Gaussian-ERS solution that has O(log 2 &#8710;) time complexity, after making some mathematical preparations in Subsubsection 3.3.1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.1">Mathematical Preliminaries</head><p>Like in the 1D case, our 2D Gaussian-ERS solution builds on the 2D-HWT &#8407; w = H &#8855;2 &#8407; x. Here the vector &#8407; x is comprised of the &#8710; 2 underlying RVs X &#8407; i for &#8407; i &#8712; [0, &#8710;) 2 , listed in the dictionary XX:12 On Efficient Range-Summability of IID RVs in Multiple Dimensions order; and the vector &#8407; w is comprised of the resulting &#8710; 2 2D-HWT coefficients. The &#8710; 2 &#215; &#8710; 2 2D-HWT matrix H &#8855;2 is the self Kronecker product (defined next) of the &#8710; &#215; &#8710; 1D-HWT matrix H. &#9654; Definition 7. Let A be a p &#215; q matrix and B be a t &#215; v matrix. Then their Kronecker product A &#8855; B is the following pt &#215; qv matrix.</p><p>We now state two theorems concerning the Kronecker product.</p><p>&#9654; Theorem 8 (Theorem 13.3 in <ref type="bibr">[14]</ref>). Let P, Q, R, T be four matrices such that the matrix products P &#8226; R and Q &#8226; T are well-defined. Then</p><p>&#9654; Theorem 9 (Corollary 13.8 in <ref type="bibr">[14]</ref>). The Kronecker product of two orthonormal matrices is also orthonormal.</p><p>Now we describe the &#8710; 2 2D HWT coefficients and the order in which they are listed in &#8407; w. Recall that in the 1D case, each HWT coefficient takes the form W m j , where m is the scale, and the j is the location. In the 2D case, each dimension has its own pair of scale and location parameters that is independent of the other dimension. For convenience of presentation, we refer to these two dimensions as vertical (the first) and horizontal (the second), respectively. We denote the vertical scale and location pair as m 1 and j 1 , and the horizontal pair as m 2 and j 2 . Each HWT coefficient takes the form W m1,m2 j1,j2 . In the 2D case, there are</p><p>We now give a 2D example in which &#8710; = 4. In this 2D example, there are &#8710; 2 = 16 HWT coefficients. To facilitate the "color coding" of different scales, we arrange the 16 HWT coefficients into a 4 &#215; 4 matrix W shown in Figure <ref type="figure">3</ref>. W is the only matrix that we write in boldface in order to better distinguish it from its scalars (W m1,m2 j1,j2 )'s. Figure <ref type="figure">3</ref> contains three differently colored rows of heights 1, 1, and 2 respectively, that correspond to vertical scales m 1 = -1, 0, 1 respectively, and contains three differently colored columns that correspond to the three horizontal scales. Their "Cartesian product" contains 9 "color cells" that correspond to the 9 different scales (values of (m 1 , m 2 )). For example, the cell colored in pink corresponds to scale (1, 1) and contains 4 HWT coefficients W 1,1 0,0 , W 1,1 0,1 , W 1,1 1,0 , W 1,1 1,1 . The vector &#8407; w is defined from W by flattening its 16 scalars in the row-major order, as shown at the bottom of Figure <ref type="figure">3</ref>.</p><p>Like in the 1D case, let &#8407; w be a vector of &#8710; 2 i.i.d. standard Gaussian RVs. As explained earlier, the vector &#8407; x of &#8710; 2 underlying RVs are mathematically defined as &#8407; x &#8796; (H &#8855;2 ) T &#8407; w. The RVs in &#8407; x are i.i.d. standard Gaussian by Theorem 2, because H &#8855;2 is an orthonormal matrix by Theorem 9.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.2">Our HWT-Based Algorithm for 2D-ERS</head><p>Our 2D-ERS algorithm (that guarantees mutual independence among the underlying RVs) is similar to the 1D-ERS algorithm described earlier. Given any 2D range</p><p>Here the 2D indicator vector 1 [ &#8407; l,&#8407; u) is defined as the result of flattening J. Meng and H. Wang and J. Xu and M. Ogihara XX:13</p><p>Figure <ref type="figure">3</ref> The 2D-HWT coefficients, arranged both as a matrix W and as a flattened vector &#8407; w T .</p><p>the following &#8710; &#215; &#8710; matrix in row-major order: For &#8407; i &#8712; [0, &#8710;) 2 , the &#8407; i th scalar in the matrix </p><p>) ). By Theorem 3, both H1 [l1,u1) and</p><p>) have O(log &#8710;) nonzero scalars, so their Kronecker product has O(log 2 &#8710;) nonzero scalars. &#9664;</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Generalization to Higher Dimensions</head><p>Our HWT-based Gaussian-ERS solution, just like HWT itself, can be naturally generalized to higher dimensions as follows. In dimension d &gt; 2, we continue to have the inverse HWT formula &#8407; x &#8796; M T &#8407; w, where &#8407; x is the vector of &#8710; d underlying RVs (arranged in dictionary order of &#8407; i), &#8407; w is the vector of their HWT coefficients (that are i.i.d. standard Gaussian RVs), and M is the</p><p>, where H is the 1D Haar matrix described above. Since M is orthonormal by Theorem 9, the RVs in &#8407; x are i.i.d. standard Gaussian by Theorem 2.</p><p>In our dD-ERS algorithm (that guarantees mutual independence among the underlying RVs), given a dD range </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">k-wise Independence Theory</head><p>In this section, in all subsequent paragraphs, we assume d = 2 (2D) for notational simplicity.</p><p>All our statements and proofs can be readily generalized to higher dimensions. Recall that,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>XX:14 On Efficient Range-Summability of IID RVs in Multiple Dimensions</head><p>for guaranteeing mutual independence among the &#8710; d underlying RVs, our HWT-based dD Gaussian-ERS needs to remember (the realization of) every participating HWT coefficient that was generated for answering a past range-sum query, which can lead to high storage overhead when the number of queries T is large. In this section we propose a k-wise independence theory and scheme that guarantees that the &#8710; d underlying Gaussian RVs are k-wise independent. It does so by using O(log d &#8710;) k-wise independent hash functions (described next) instead. This scheme has the same time complexity of O(log d &#8710;) as the idealized Gaussian-ERS solution, and a much smaller space complexity of O(log d &#8710;), for storing the seeds of O(log d &#8710;) k-wise independent hash functions. This scheme significantly extends its 1D version proposed in <ref type="bibr">[16]</ref>. Finally, we note this scheme works also for our Poisson-ERS solution. We however will not explain how it works in this paper, since doing so would involve drilling down to the messy and lengthy detail of the Cartesian product of d &gt; 1 DSTs (since we cannot use the relatively clean and simple dD HWT in the Poisson case).</p><p>A k-wise independent hash function h(&#8226;) has the following property: Given an arbitrary set of</p><p>Such hash functions are very computationally efficient when k is a small number such as k = 2 (roughly 2 nanoseconds per hash) and k = 4 (several nanoseconds per hash) <ref type="bibr">[3,</ref><ref type="bibr">24,</ref><ref type="bibr">20]</ref>. Typically, the hash values are (uniform random) integers. We can map them to Gaussian RVs using a deterministic function g(&#8226;) such as the Box-Muller transform <ref type="bibr">[21]</ref>.</p><p>Recall (from Figure <ref type="figure">3</ref>) that the &#8710; 2 HWT coefficients in the vector &#8407; w are on</p><p>Our scheme uses (log 2 &#8710; + 1) 2 independent k-wise independent hash functions that we denote as h m1,m2 (&#8226;), for</p><p>During the initialization phase, we uniformly randomly seed these (log 2 &#8710; + 1) 2 hash functions; once seeded, they are fixed thereafter as usual. As mentioned earlier, these seeds correspond to the outcome &#969; that fixes (mathematically defines) the HWT coefficient vector &#8407; w. Our scheme can be stated literally in one sentence: Each such (seeded and fixed) h m1,m2 (&#8226;) is solely responsible for hash-generating any HWT coefficient on scale (m 1 , m 2 ) that is participating (as defined earlier) in answering a range-sum query. In other words, for any scale</p><p>is mathematically defined as g(h m1,m2 (j 1 , j 2 )), where g(&#8226;) is the aforementioned deterministic function (that maps an integer to a Gaussian RV). Hence our scheme has a much lower space complexity of O(log 2 &#8710;), for remembering the seeds of the O(log 2 &#8710;) hash functions.</p><p>The following theorem states that our scheme achieves its intended objective of ensuring that the &#8710; 2 underlying RVs mathematically defined by it are k-wise independent. In this theorem and proof, we denote the vector of &#8710; 2 HWT coefficients and the vector of &#8710; 2 underlying RVs both mathematically defined by our scheme as &#8407; v and &#8407; z, respectively. We do so to distinguish this vector pair from the original vector pair &#8407; w and &#8407; x that are mathematically defined by the idealized scheme (that guarantees mutual independence). Recall that &#8407; z = M T &#8407; v and &#8407; x = M T &#8407; w, where M = H &#8855;2 is the 2D HWT matrix, and that &#8407; x is comprised of i.i.d. standard Gaussian RVs.</p><p>&#9654; Theorem 11. The vector &#8407; z is comprised of k-wise independent standard Gaussian RVs. Proof. It suffices to prove that any k distinct scalars in &#8407; z -say the (i 1 ) th , (i 2 ) th , &#8226; &#8226; &#8226; , (i k ) th scalars -are i.i.d. standard Gaussian. Let &#8407; z &#8242; be the k-dimensional vector comprised of these k scalars. Let (M T ) &#8242; be the k &#215; &#8710; 2 matrix formed by the (i 1 ) th , (i 2 ) th , &#8226; &#8226; &#8226; , (i k ) th rows in M T . Then, we have &#8407; z &#8242; = (M T ) &#8242; &#8407; v. Now let the random vector &#8407; x &#8242; be defined as (M T ) &#8242; &#8407; w. Then &#8407; x &#8242; is comprised of k i.i.d. standard Gaussian RVs, as its scalars are a subset of those of &#8407; x. Hence, to prove that the scalars in &#8407; z &#8242; are i.i.d. standard Gaussian RVs, it suffices to prove the claim that &#8407; z &#8242; has the same distribution as &#8407; x &#8242; .</p><p>We prove this claim using Proposition 12. To this end, we first write &#8407; z &#8242; and &#8407; x &#8242; each as the sum of N = (log 2 &#8710; + 1) 2 independent random vectors. Recall that in Subsection 3.3, we have classified the HWT coefficients in &#8407; w and &#8407; v, and the columns of M T (called HWT vectors there) into N different (m 1 , m 2 ) scales (colors in Figure <ref type="figure">3</ref>). Recall that n m1,m2 scalars in &#8407; w and &#8407; v, and accordingly n m1,m2 columns of M T , have scale (m 1 , m 2 ). Let &#8407; w m1,m2 and &#8407; v m1,m2 be the n m1,m2 -dimensional vectors comprised of the coefficients classified to scale (m 1 , m 2 ) in &#8407; w and &#8407; v, respectively. Let (M T ) &#8242; m1,m2 be the k &#215; n m1,m2 matrix comprised of the columns of (M T ) &#8242; classified to scale (m 1 , m 2 ). Then, we have</p><p>, where both summations are over all N scales. The N summands in the RHS of the first equation are independent random vectors, because for each scale (m 1 , m 2 ) &#8712; [-1, log 2 &#8710;) 2 , all scalars in &#8407; v m1,m2 are generated by the same per-scale hash function h m1,m2 (&#8226;), which is independent of all N -1 other per-scale hash functions. The same can be said about the N summands in the RHS of the second equation, since &#8407; w is comprised of i.i.d. RVs by design. To prove this claim using Proposition 12, it remains to prove the fact that for each scale (m 1 , m 2 ) &#8712; [-1, log 2 &#8710;) 2 , the pair of random vectors (M T ) &#8242; m1,m2 &#8407; v m1,m2 and (M T ) &#8242; m1,m2 &#8407; w m1,m2 have the same distribution. This fact can be proved as follows. Note that for each scale (m 1 , m 2 ) &#8712; [-1, log 2 &#8710;) 2 , each row in (M T ) &#8242; m1,m2 has exactly one nonzero scalar, since the corresponding row in M T , or equivalently the corresponding column in M , has exactly one nonzero scalar at each scale (m 1 , m 2 ), due to Lemma 13. Therefore, although the number of columns in (M T ) &#8242; m1,m2 can be as many as O(&#8710; 2 ), at most k of them (one for each row), say the</p><p>, and these k scalars are i.i.d. Gaussian RVs since they are all generated by the same k-wise independent hash function h m1,m2 (&#8226;). Note that (M T ) &#8242; m1,m2 &#8407; w m1,m2 is the same function of the (&#945; 1 ) th , (&#945; 2 ) th , &#8226; &#8226; &#8226; , (&#945; k ) th scalars in &#8407; w m1,m2 , which are i.i.d. Gaussian RVs by design. Hence, (M T ) &#8242; m1,m2 &#8407; v m1,m2 has the same distribution as (M T ) &#8242; m1,m2 &#8407; w m1,m2 . &#9664; &#9654; Proposition 12. Suppose random vectors &#8407; x and &#8407; z each is the sum of N independent random vectors as follows: Proof. The 2D indicator vector can be decomposed to the Kronecker product of two 1D</p><p>indicator vectors as</p><p>) by Theorem 8.</p><p>The claim above follows from Lemma 6, which implies that H1 {i1} and H1 {i2} each has exactly one nonzero scalar on each 1D scale. (equivalent to the HWT-based approach in the 1D Gaussian case) to any dimension d &#8805; 2, we wonder whether we can do the same for all target distributions. By "computationally efficiently", we mean that a generalized solution should be able to compute any dD range-sum in O(log d &#8710;) time like in the Gaussian case.</p><p>Unfortunately, it appears hard, if not impossible, to generalize the DST approach to dD for arbitrary target distributions. We have identified a sufficient condition on the target distribution for such an efficient generalization to exist. We prove the sufficiency by proposing a DST-based universal algorithmic framework (described in Appendix C in the interest of space) that solves the dD-ERS problem for any target distribution satisfying this condition. Unfortunately, so far only two distributions, namely Gaussian and Poisson, are known to satisfy this condition, as is elaborated in Appendix A. We also describe in Appendix B two example distributions that do not satisfy this sufficient condition, namely Cauchy and Rademacher. In the following, we specify this condition and explain why it is "almost necessary".</p><p>For ease of presentation, in the following, we fix the number of dimensions d at 2. We assume all underlying RVs, X i1,i2 for (i 1 , i 2 ) in the 2D universe [0, &#8710;) 2 , are i.i.d. with a certain target distribution X. This assumption is appropriate for our reasoning below about the time complexity of a 2D ERS solution, since as shown earlier this time complexity is not affected by the strength of the independence guarantee provided, in the cases of Gaussian and Poisson. In a 2D universe, any 2D range can be considered the Cartesian product of its horizontal and vertical 1D ranges. We say a 2D range is dyadic if and only if its horizontal and vertical 1D ranges are both dyadic. Since any general (not necessarily dyadic) 1D range can be "pieced together" using O(log &#8710;) 1D dyadic ranges <ref type="bibr">[22]</ref>, it is not hard to show, using the Cartesian product argument, that any general 2D range can be "pieced together" using O(log 2 &#8710;) 2D dyadic ranges. Hence in the following, we focus on the generation of only 2D dyadic range-sums. We assume all underlying RVs, X i1,i2 for (i 1 , i 2 ) in the 2D universe [0, &#8710;) 2 , are i.i.d. with a certain target distribution X.</p><p>We need to introduce some additional notations. We define each horizontal strip-sum <ref type="bibr">)</ref> as the sum of range [0, &#8710;) &#215; [i, i + 1). We denote as S the total sum of all underlying RVs in the universe, i.e.,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>S &#8796;</head><p>i . Now we are ready to state this sufficient condition. For ease of presentation, we break it down into two parts. The first part, stated in the following formula, states that the vector of vertical strip-sums and the vector of horizontal strip-sums in [0, &#8710;) 2 are conditionally independent given the total sum S.</p><p>The second part is that this conditional independence relation holds for the two corresponding vectors in any 2D dyadic range (that is not necessarily a square). Intuitively, this condition says that how a 2D dyadic range-sum is split horizontally is conditionally independent (upon this 2D range-sum) of how it is split vertically. Roughly speaking, this condition implies that the 1D-DST governing the horizontal splits is conditionally independent of the other 1D-DST governing the vertical splits. Hence, our DST-based universal algorithmic framework for 2D can be viewed as the Cartesian product of the two 1D-DSTs, as will be elaborated in Appendix C.</p><p>In the following, we offer some intuitive evidence why this condition is likely necessary. Without loss of generality, we consider the generation of an arbitrary horizontal In this example, the sum of the 2D range [6, 7) &#215; [6, 8) is to be generated according to a conditional distribution parameterized by the sums of its three DGAs. To do so, however, the sum of each such DGA is to be generated according to the sums of the DGA's three DGAs, and so on. Given any 2D dyadic range, since it has O(log 2 &#8710;) distinct ancestors (DGAs, DGAs' DGAs, and so on), we can generate its sum in O(log 2 &#8710;) time, by arranging the computations of the sums of these O(log 2 &#8710;) ancestors in the dynamic programming order. One can think of these O(log 2 &#8710;) 4-way-split procedures as the Cartesian product of the O(log &#8710;) binary splits along the horizontal 1D-DST and those along the vertical 1D-DST. As explained in Section 5, the sufficient condition makes taking this Cartesian product possible.</p><p>In a 4-way-split procedure, degenerated cases arise when the 2D dyadic range whose sum is to be generated spans the entire 1D universe on either dimension or on both dimensions. In the former case, this 2D dyadic range has only one parent and no lattice grandparent. For example, the range [4, 6) &#215; [0, 8) has only a vertical parent [4, 8) &#215; [0, 8). In this case, the 4-way-split degenerates to the 2-way split in 1D as already explained in (1). In the latter case, which is the only boundary condition of the aforementioned dynamic programming, the 2D dyadic range is the entire 2D universe. In this case, its sum is directly generated from the distribution X * &#8710; 2 . In the cases of both Gaussian and Poisson target distributions, X * &#8710; 2 takes a simple form and can be generated efficiently <ref type="bibr">[21,</ref><ref type="bibr">15]</ref>.</p><p>So far we have only explained why a 2D dyadic range-sum can be generated in O(log 2 &#8710;) time. In fact, any general 2D range-sum can also be generated in O(log 2 &#8710;) time due to the following two "2D facts". First, any general 2D range can be "pieced together" using O(log 2 &#8710;) 2D dyadic ranges, as explained in Section 5. Second, these O(log 2 &#8710;) 2D dyadic ranges together have only O(log 2 &#8710;) distinct (2D DST) ancestors for the following reason. The following "1D fact" was proved in Corollary 1 in <ref type="bibr">[16]</ref>: Every 1D general range can be "pieced together" using O(log &#8710;) 1D dyadic ranges and these O(log &#8710;) 1D dyadic ranges share O(log &#8710;) common ancestors on the 1D DST. Since a 2D general range is by definition the Cartesian product of two 1D ranges (namely vertical and horizontal), "multiplying" this 1D fact for the vertical 1D range by the 1D fact for the horizontal 1D range proves the second 2D fact.</p><p>We can generalize the universal algorithmic framework above to dD. In the generalized framework, the 2-way conditional independence relation in 2D, namely formula Equation <ref type="formula">2</ref>, becomes a d-way conditional independence relation in dD, and the 4-way-split procedure in 2D becomes a 2 d -way-split procedure in dD. We omit the details of this generalization in the interest of space.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>4-Way-Split Procedure for Poisson:</head><p>While the algorithmic framework of performing 4-way splits recursively is the same for any target distribution that satisfies the sufficient condition, the exact 4-way-split procedure is different for each such target distribution. In the following, we specify only the 4-way-split procedure for Poisson, since that for Gaussian has already been seamlessly "embedded" in the HWT-based solution. Suppose given a 2D dyadic range [ &#8407; l, &#8407; u), we need to compute S[ &#8407; l, &#8407; u) by performing a 4-way split. We denote as S h , XX:22 On Efficient Range-Summability of IID RVs in Multiple Dimensions S v , and S g the sums of the horizontal parent, the vertical parent, and the lattice grandparent of [ &#8407; l, &#8407; u), respectively. Then S[ &#8407; l, &#8407; u) is generated using the following conditional distribution:</p><p>where f (x | s h , s v , s g ) = 1/ (x!(s hx)!(s vx)!(s gs hs v + x)!) and c &#8796; &#8734; y=0 f (y | s h , s v , s g ) is a normalizing constant. As explained, in a degenerated case, this split is the same as a 2-way split in the 1D case. In the case of Poisson, by plugging the Poisson pmfs &#981; n (x) = e -n n x /(x!) and &#981; 2n (x) = e -2n (2n) x /(x!) into (1), we obtain f (x | z) = 2 -z z x , which is the pmf of Binomial(z, 1/2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D Proof of Theorem 14</head><p>Proof. Note that for any (i , i 2 ) in [0, &#8710;) 2 , S H i1 intersects S V i2 on exactly one underlying RV X i1,i2 , so S H i1 -X i1,i2 , S V i2 -X i1,i2 , and X i1,i2 are independent. By Lemma 16, S H i1 and S V We prove by contradiction. Suppose Z is not with probability 1 a constant. Then for any set I such that Pr(Z &#8712; I) = 1, I must contain at least two numbers. Let z 1 &lt; z 2 be two such numbers such that the pdf (or pmf) of Z is nonzero at z 1 and z 2 . Since both F X (a -Z) and F Y (b -Z) are non-increasing functions of Z, and Cov(F X (a -Z), F Y (b -Z)) = 0, either F X (a-Z) or F Y (b-Z) must be a constant with probability 1 by Theorem 17. Without loss of generality, suppose F X (a-Z) is a constant with probability 1. Then, F X (a-z 1 ) = F X (a-z 2 ) holds for arbitrary values of a. Define a sequence x 0 &#8796; z 1 , x 1 &#8796; z 1 + (z 2z 1 ),</p><p>&#8226; &#8226; &#8226; such that we have lim i&#8594;&#8734; x i = &#8734;. Then, we have F X (x 0 ) = F X (x 1 ) = F X (x 2 ) = &#8226; &#8226; &#8226; , where the k-th equation F X (x k-1 ) = F X (x k ) is by applying F X (az 2 ) = F X (az 1 ) with a = 2z 1 + k &#8226; (z 2z 1 ). Since lim i&#8594;&#8734; x i = &#8734;, we have F X (x 0 ) = lim inf i&#8594;&#8734; F X (x i ) &#8805; lim inf x&#8594;&#8734; F X (x). Similarly, we can prove F X (x 0 ) &#8804; lim sup x&#8594;-&#8734; F X (x). By the properties of cdf, we have lim inf x&#8594;&#8734; F X (x) = 1 and lim sup x&#8594;-&#8734; F X (x) = 0. As a result, we have proved 0 &#8805; 1, which is a contradiction. &#9664; &#9654; Theorem 17 <ref type="bibr">([12]</ref>). Let Z be an RV, and f (&#8226;) and g(&#8226;) be two non-increasing functions. Then the covariance Cov(f (Z), g(Z)) &#8805; 0 if it exists. Furthermore, Cov(f (Z), g(Z)) = 0 if and only if either f (Z) or g(Z) is a constant with probability 1.</p></div></body>
		</text>
</TEI>
