<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Super-polynomial accuracy of one dimensional randomized nets using the median of means</title></titleStmt>
			<publicationStmt>
				<publisher>American Mathematical Society</publisher>
				<date>03/01/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10513780</idno>
					<idno type="doi">10.1090/mcom/3791</idno>
					<title level='j'>Mathematics of Computation</title>
<idno>0025-5718</idno>
<biblScope unit="volume">92</biblScope>
<biblScope unit="issue">340</biblScope>					

					<author>Zexin Pan</author><author>Art Owen</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<p>Let<inline-formula content-type='math/mathml'><math alttext='f'><semantics><mi>f</mi><annotation encoding='application/x-tex'>f</annotation></semantics></math></inline-formula>be analytic on<inline-formula content-type='math/mathml'><math alttext='left-bracket 0 comma 1 right-bracket'><semantics><mrow><mo stretchy='false'>[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo stretchy='false'>]</mo></mrow><annotation encoding='application/x-tex'>[0,1]</annotation></semantics></math></inline-formula>with<inline-formula content-type='math/mathml'><math alttext='StartAbsoluteValue f Superscript left-parenthesis k right-parenthesis Baseline left-parenthesis 1 slash 2 right-parenthesis EndAbsoluteValue less-than-or-slanted-equals upper A alpha Superscript k Baseline k factorial'><semantics><mrow><mrow class='MJX-TeXAtom-ORD'><mo stretchy='false'>|</mo></mrow><msup><mi>f</mi><mrow class='MJX-TeXAtom-ORD'><mo stretchy='false'>(</mo><mi>k</mi><mo stretchy='false'>)</mo></mrow></msup><mo stretchy='false'>(</mo><mn>1</mn><mrow class='MJX-TeXAtom-ORD'><mo>/</mo></mrow><mn>2</mn><mo stretchy='false'>)</mo><mrow class='MJX-TeXAtom-ORD'><mo stretchy='false'>|</mo></mrow><mo>⩽<#comment/></mo><mi>A</mi><msup><mi>α<#comment/></mi><mi>k</mi></msup><mi>k</mi><mo>!</mo></mrow><annotation encoding='application/x-tex'>|f^{(k)}(1/2)|\leqslant A\alpha ^kk!</annotation></semantics></math></inline-formula>for some constants<inline-formula content-type='math/mathml'><math alttext='upper A'><semantics><mi>A</mi><annotation encoding='application/x-tex'>A</annotation></semantics></math></inline-formula>and<inline-formula content-type='math/mathml'><math alttext='alpha greater-than 2'><semantics><mrow><mi>α<#comment/></mi><mo>></mo><mn>2</mn></mrow><annotation encoding='application/x-tex'>\alpha >2</annotation></semantics></math></inline-formula>and all<inline-formula content-type='math/mathml'><math alttext='k greater-than-or-slanted-equals 1'><semantics><mrow><mi>k</mi><mo>⩾<#comment/></mo><mn>1</mn></mrow><annotation encoding='application/x-tex'>k\geqslant 1</annotation></semantics></math></inline-formula>. We show that the median estimate of<inline-formula content-type='math/mathml'><math alttext='mu equals integral Subscript 0 Superscript 1 Baseline f left-parenthesis x right-parenthesis normal d x'><semantics><mrow><mi>μ<#comment/></mi><mo>=</mo><msubsup><mo>∫<#comment/></mo><mn>0</mn><mn>1</mn></msubsup><mi>f</mi><mo stretchy='false'>(</mo><mi>x</mi><mo stretchy='false'>)</mo><mspace width='thinmathspace'/><mrow class='MJX-TeXAtom-ORD'><mi mathvariant='normal'>d</mi></mrow><mi>x</mi></mrow><annotation encoding='application/x-tex'>\mu =\int _0^1f(x)\,\mathrm {d} x</annotation></semantics></math></inline-formula>under random linear scrambling with<inline-formula content-type='math/mathml'><math alttext='n equals 2 Superscript m'><semantics><mrow><mi>n</mi><mo>=</mo><msup><mn>2</mn><mi>m</mi></msup></mrow><annotation encoding='application/x-tex'>n=2^m</annotation></semantics></math></inline-formula>points converges at the rate<inline-formula content-type='math/mathml'><math alttext='upper O left-parenthesis n Superscript minus c log left-parenthesis n right-parenthesis Baseline right-parenthesis'><semantics><mrow><mi>O</mi><mo stretchy='false'>(</mo><msup><mi>n</mi><mrow class='MJX-TeXAtom-ORD'><mo>−<#comment/></mo><mi>c</mi><mi>log</mi><mo><#comment/></mo><mo stretchy='false'>(</mo><mi>n</mi><mo stretchy='false'>)</mo></mrow></msup><mo stretchy='false'>)</mo></mrow><annotation encoding='application/x-tex'>O(n^{-c\log (n)})</annotation></semantics></math></inline-formula>for any<inline-formula content-type='math/mathml'><math alttext='c greater-than 3 log left-parenthesis 2 right-parenthesis slash pi squared almost-equals 0.21'><semantics><mrow><mi>c</mi><mo>></mo><mn>3</mn><mi>log</mi><mo><#comment/></mo><mo stretchy='false'>(</mo><mn>2</mn><mo stretchy='false'>)</mo><mrow class='MJX-TeXAtom-ORD'><mo>/</mo></mrow><msup><mi>π<#comment/></mi><mn>2</mn></msup><mo>≈<#comment/></mo><mn>0.21</mn></mrow><annotation encoding='application/x-tex'>c> 3\log (2)/\pi ^2\approx 0.21</annotation></semantics></math></inline-formula>. We also get a super-polynomial convergence rate for the sample median of<inline-formula content-type='math/mathml'><math alttext='2 k minus 1'><semantics><mrow><mn>2</mn><mi>k</mi><mo>−<#comment/></mo><mn>1</mn></mrow><annotation encoding='application/x-tex'>2k-1</annotation></semantics></math></inline-formula>random linearly scrambled estimates, when<inline-formula content-type='math/mathml'><math alttext='k slash m'><semantics><mrow><mi>k</mi><mrow class='MJX-TeXAtom-ORD'><mo>/</mo></mrow><mi>m</mi></mrow><annotation encoding='application/x-tex'>k/m</annotation></semantics></math></inline-formula>is bounded away from zero. When<inline-formula content-type='math/mathml'><math alttext='f'><semantics><mi>f</mi><annotation encoding='application/x-tex'>f</annotation></semantics></math></inline-formula>has a<inline-formula content-type='math/mathml'><math alttext='p'><semantics><mi>p</mi><annotation encoding='application/x-tex'>p</annotation></semantics></math></inline-formula>’th derivative that satisfies a<inline-formula content-type='math/mathml'><math alttext='lamda'><semantics><mi>λ<#comment/></mi><annotation encoding='application/x-tex'>\lambda</annotation></semantics></math></inline-formula>-Hölder condition then the median of means has error<inline-formula content-type='math/mathml'><math alttext='upper O left-parenthesis n Superscript minus left-parenthesis p plus lamda right-parenthesis plus epsilon Baseline right-parenthesis'><semantics><mrow><mi>O</mi><mo stretchy='false'>(</mo><msup><mi>n</mi><mrow class='MJX-TeXAtom-ORD'><mo>−<#comment/></mo><mo stretchy='false'>(</mo><mi>p</mi><mo>+</mo><mi>λ<#comment/></mi><mo stretchy='false'>)</mo><mo>+</mo><mi>ϵ<#comment/></mi></mrow></msup><mo stretchy='false'>)</mo></mrow><annotation encoding='application/x-tex'>O( n^{-(p+\lambda )+\epsilon })</annotation></semantics></math></inline-formula>for any<inline-formula content-type='math/mathml'><math alttext='epsilon greater-than 0'><semantics><mrow><mi>ϵ<#comment/></mi><mo>></mo><mn>0</mn></mrow><annotation encoding='application/x-tex'>\epsilon >0</annotation></semantics></math></inline-formula>, if<inline-formula content-type='math/mathml'><math alttext='k right-arrow normal infinity'><semantics><mrow><mi>k</mi><mo stretchy='false'>→<#comment/></mo><mi mathvariant='normal'>∞<#comment/></mi></mrow><annotation encoding='application/x-tex'>k\to \infty</annotation></semantics></math></inline-formula>as<inline-formula content-type='math/mathml'><math alttext='m right-arrow normal infinity'><semantics><mrow><mi>m</mi><mo stretchy='false'>→<#comment/></mo><mi mathvariant='normal'>∞<#comment/></mi></mrow><annotation encoding='application/x-tex'>m\to \infty</annotation></semantics></math></inline-formula>. The proof techniques use methods from analytic combinatorics that have not previously been applied to quasi-Monte Carlo methods, most notably an asymptotic expression from Hardy and Ramanujan on the number of partitions of a natural number.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In this paper we introduce and study a median-of-means approach to randomized quasi-Monte Carlo (RQMC) sampling. Specifically, for f :[ 0 , 1] &#8594; R we let &#956;r for r =1 ,. . . ,2 k -1 be independent estimates of &#956; = 1 0 f (x)dx computed using the random linear scrambling of <ref type="bibr">[22]</ref> applied to a (0,m,1)-net in base 2 and our estimate of &#956; is &#956;(k) =m e d ( &#956;1 ,...,&#956; 2k-1 ). We find for some infinitely differentiable integrands that this median-of-means approach converges faster than any polynomial rate in n =2 m . By this we mean that for some c&gt;0 the probability of an error larger than n -c log(n) approaches zero as the number of sampled points n =2 m &#8594;&#8734;.</p><p>A key ingredient in the proofs is the formula by Hardy and Ramanujan <ref type="bibr">[13]</ref> for the number p(n) of ways to partition the natural number n into a sum of natural numbers. Their formula for this is</p><p>.</p><p>We believe that this use of analytic combinatorics in RQMC is new and we expect further connections to develop.</p><p>There have been several recent results on super-polynomial convergence for quasi-Monte Carlo (QMC). Suzuki <ref type="bibr">[27]</ref>, working in a weighted space of infinitely differentiable functions on [0, 1] d , proved the existence of digital nets with worst case error C(d)exp(-c(d)log(n) 2 ). Under further conditions on the weights defining the space, a dimension free worst case error C exp(-c log(n) p ) holds for some 1 &lt;p&lt;2.</p><p>Dick et al. <ref type="bibr">[6]</ref> give a construction of a super-polynomially convergent method. At ac o s to fO(nd log(n) 2 ) they use a component-by-component construction to get dimension-independent super-polynomial convergence using interlaced polynomial lattice rules. These are higher order digital nets. A higher order digital net can attain an error of &#213;(n -&#945; ) when the integrand's mixed partial derivatives of total order up to the integer &#945; 1a r ea l li nL 2 [0, 1] d <ref type="bibr">[5]</ref>. Here &#213; means that logarithmic factors are not shown. Under scrambling, <ref type="bibr">[5]</ref> shows that the root mean squared error (RMSE) is &#213;(n -&#945;-1/2 ). To obtain super-polynomial convergence <ref type="bibr">[6]</ref> must let the order of their higher-order digital nets increase with n. The medianof-means formulation allows one to use ordinary scrambled Sobol' points though in some uses we must take a median of a (slowly) growing number k of replicates. The LatNet builder tool of <ref type="bibr">[20]</ref> constructs QMC and RQMC point sets using some random searches. Those searches seek to optimize a figure of merit (FOM) that quantifies worst case error over a class of integrands. For the precise definitions of each FOM, see that paper. Figure <ref type="figure">1 there</ref> shows some examples where the median FOM shows curvature on a log-log scale for dimension d = 6. This is consistent with super-polynomial accuracy, though they present a median FOM instead of the FOM of a median estimate.</p><p>The algorithm we study here provides another approach. We will see that when f is smooth, most of the randomized net estimates are very close to the true value, for large m. The variance is dominated by a relatively small number of bad outcomes. By taking the median of a number of independent estimates we can reduce the impact of the few bad outcomes. Each RQMC estimate is a mean of function evaluations. Then our combined estimates are a median of means.</p><p>Median-of-means algorithms have many uses in theoretical computer science, though the means used there have not usually been based on RQMC. See for example, <ref type="bibr">[16]</ref> and <ref type="bibr">[19]</ref>. Kunsch et al. <ref type="bibr">[18]</ref> present several uses of the median of means in two stage numerical integration algorithms and they give further references to the literature. Since our preprint appeared, there has been further work on median methods for QMC by <ref type="bibr">[11]</ref>. They choose rank one lattice generating vectors completely at random for integration problems in Korobov spaces and they choose polynomial lattice rules randomly for some weighted Sobolov spaces. By taking the median of a number of such randomly generated estimates they attain the best convergence rates possible for the smoothness levels they study and they are able to avoid complicated parameter searches. Hofstadler and Rudolf <ref type="bibr">[15]</ref> use median of means to get some strong laws of large numbers for integration methods. Gobet et al. <ref type="bibr">[9]</ref> use median of means to get robust RQMC estimates.</p><p>An outline of this paper is as follows. Section 2 provides some notation and definitions of the scrambling we use and the resulting estimates. One key quantity is a scrambling matrix M with m columns and entries in {0, 1}. The accuracy of RQMC is limited by phenomena where for some nonempty L &#8834; N,t h er o w so fM for &#8712; L sum to 0 in F m 2 . Section 3 explains this bottleneck to convergence and Theorem 3.1 writes the RQMC error as a sum of random variables, one for each problematic subset L. This section is less technical than the later ones with our proofs. Section 4 has a one dimensional numerical example. We see super-linear convergence for the median of 11 RQMC replicates, up to a point. The RMSE reaches an asymptote for a Sobol' sequence computed to 32 bits. Switching to a 64 bit computation, the super-linearity continues to some higher sample sizes. There is also a six dimensional example, where the standard deviation of median estimates drops faster than that of mean estimates. Section 5 studies the population median of scrambled nets for dimension d = 1. This is the median of the distribution of the RQMC estimate. Theorem 5.6 establishes a super-polynomial rate for that quantity for certain infinitely differentiable functions. A critical step there is to bound the number of nonempty subsets L &#8834; N that have a small value of &#8712;L .W ed ot h i s using combinatorial results including the one by <ref type="bibr">[13]</ref>. The comprehensive reference is <ref type="bibr">[8]</ref>. Theorem 5.10 provides super-polynomial convergence for the median of 2k-1 independently generated RQMC estimates. Section 6 considers the case where the p'th derivative of f satisfies a &#955;-H&#246;lder condition for 0 &lt;&#955; 1. Theorem 6.1 bounds the probability that the error is much more than n -p-&#955; with corollaries showing super-polynomial convergence for the population median and the median of 2k -1 independent estimates.</p><p>We close this section with a few contextual remarks. In the one dimensional setting, there are already very accurate integration rules for extremely smooth integrands <ref type="bibr">[4]</ref>. The RQMC method here has an advantage in being an equally weighted average of the n function values, instead of having large weights of both positive and negative signs. The one dimensional case will take on greater interest if the findings and proofs in this article can be generalized to d 1. The multidimensional example in Section 4 is therefore encouraging as are the empirical results in <ref type="bibr">[20]</ref>.</p><p>One of the original motivations for RQMC was to get error estimates. It later emerged that randomizing QMC can also increase accuracy <ref type="bibr">[23]</ref>. Error estimation for a median of independently sampled means is more complicated than for a mean of such means. We can readily get a nonparametric confidence interval for the population median of the &#956;r , using the binomial distribution because the true median &#952; satisfies Pr(&#956; r &lt;&#952;)=1/2. However, the quantity of most direct interest is E(&#956; r ), not med(&#956; r ).</p><p>We had initially considered the case where instead of a random linear scramble we had taken a completely random generator matrix with all entries independent and identically (IID) U{0, 1} random variables. A similar result holds: the median estimate converges with super-polynomial accuracy for certain infinitely differentiable f , though of course the bad outcomes can be even worse. For instance, there is a 2 -m 2 probability that the upper m &#215; m submatrix of the generator matrix is all zeros. Then all n =2 m RQMC points would lie in the interval [0, 1/n]a n dt h e resulting error would generally fail to vanish as n &#8594;&#8734;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Notation and background</head><p>We study the random linear scrambling of <ref type="bibr">[22]</ref> including a digital shift, in one dimension. Our focus is on base 2 apart from a few remarks later. For an integrand f :[ 0 , 1] &#8594; R we will estimate &#956; = 1 0 f (x)dx assumed to exist by &#956; =(1/n) n-1 i=0 f (x i ) for carefully chosen points x i &#8712; [0, 1]. We use N for the set of positive integers. For m &#8712; N,w el e t[ m]d e n o t et h es e t {1,...,m} and for n &#8712; N we let Z n denote the set {0, 1,...,n-1}. We investigate a scrambled digital net of n =2 m points x i &#8712; [0, 1) for i =0,1,...,n -1.</p><p>We will make frequent use of sets L &#8834; N of finite cardinality. We write |L| for their cardinality as well as L 1 = &#8712;L and some additional notation about these sets L will be introduced as needed. For a matrix M we use M (L, :) to denote the Licensed to Stanford Univ. Prepared on Thu Aug 3 16:13:41 EDT 2023 for download from IP 171.66.13.39.</p><p>License or copyright restrictions may apply to redistribution; see <ref type="url">https://www.ams.org/journal-terms-of-use</ref> submatrix whose row indices are in L and to extract a single row we write M ( , :) instead of M ({ }, :).</p><p>The indicator function of the event E is sometimes written 1{E}. This quantity takes the value 1 when E holds and 0 otherwise.</p><p>We assume throughout that C &#8712;{ 0, 1} m&#215;m is a nonrandom matrix that is of full rank m over F 2 , that is, it has full rank in arithmetic modulo 2. This matrix C defines the 'unscrambled' version of our QMC points which will be a (0,m,1)-net in base 2. For instance C could be the m &#215; m identity matrix as it would be for the van der Corput points.</p><p>For i &#8712; Z 2 m we let i =(i 1 ,i 2 ,...,i m ) T where i <ref type="bibr">)</ref> we let a =( a 1 ,a 2 ,...,a m ) T . These two definitions intersect only for {0} where they both yield 0. The representation for a &#8712; [0, 1) can be taken to any finite number E m of bits that we denote by a[E]w h e nw e need to specify the precision. When a has two base 2 representations, we work with the one that has finitely many nonzero bits. The points of the unscrambled net are given by a i &#8712;{k/2 m | k &#8712; Z 2 m }&#8834;[0, 1) that satisfy</p><p>In our presentation below we will omit noting that bitwise arithmetic is done modulo two, when that is clear from context.</p><p>To scramble the points, we introduce a random matrix M &#8712;{ 0, 1} E&#215;m for E m. The upper triangular elements of M are all 0, the diagonal elements of M are all 1, and the elements below the diagonal are IID U{0, 1}.</p><p>We also introduce a random digital shift D = &#8734; k=1 2 -k D k with bits D k that are independent U{0, 1} variables independent of M .N o t et h a tP r ( 0 D&lt;1) = 1. The random digital shift serves to make the estimates &#956; unbiased estimates of &#956; and our proofs require that property.</p><p>For i &#8712; Z 2 m , linearly scrambled points x i to precision E without a digital shift are defined by (2.1)</p><p>for bits</p><p>When we add the random digital shift, we randomize infinitely many bits. We define scrambling of precision E to mean that x i has bits (2.2)</p><p>We will use the term 'random linear scrambling' to refer to linear scrambling that includes the digital shift. This usage is common. Another usage calls that affine scrambling with linear scrambling excluding the digital shift.</p><p>Let f :[ 0 , 1] &#8594; R. We need to specify the precision of our estimates and to do this, we define</p><p>where the bits of x i are given by (2.2). When f is continuous on [0, 1], define &#956;&#8734; = lim E&#8594;&#8734; &#956;E . Later when we replicate these quantities, the replicates will be denoted by &#956;E,r and &#956;&#8734;,r .W el e t&#969; f (t) denote the modulus of continuity of f over [0, 1]. Later, &#969; f (1) will be a convenient shorthand for sup 0 x 1 f (x) -inf 0 x 1 f (x).</p><p>Lemma 2.1. For any M &#8712;{0, 1} &#8734;&#215;m and D &#8712; [0, 1)</p><p>, where &#956;E is constructed using the first E m rows of M .</p><p>Proof. Let x i [E]b ex i under scrambling with precision E and x i [&#8734;]b ex i under scrambling in the infinite precision limit. For any given M and D in random linear scrambling, x i [E] has the same first E bits as x i [&#8734;], so</p><p>where k indexes the bits of</p><p>The main object of our study is the median of 2k -1 independently sampled replicates of a randomized QMC algorithm on m points. We may take k to be a function of m.W e w r i t ek =&#937; ( m) to mean that lim inf m&#8594;&#8734; k(m)/m &gt; 0a n d similarly k =&#937;(m 2 ) means that lim inf m&#8594;&#8734; k(m)/m 2 &gt; 0. In practice k would be nondecreasing in m though our results do not require this.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">A bottleneck in convergence</head><p>It is well known that the variance of &#956; under nested uniform scrambling attains O(n -3 ) convergence when d =1a n df &#8712; C[0, 1], a great improvement upon the O(n -1 ) rate of naive Monte Carlo. Corollary 3.8 of <ref type="bibr">[28]</ref> shows that random linear scrambling with a digital shift has the same variance as nested uniform scrambling for (0,m,1)-nets. Increased smoothness does not improve this rate outside of trivial settings with zero variance. Here we give a simple argument to illustrate that limitation. Understanding such bounds leads us to an expression for the integration error below, on which we base our study of medians.</p><p>If n =2 m and M (m +1, :) happens to be 0, then by the relationship x i,m+1 = m+1 j=1 M m+1,j a ij + D m+1 , we immediately see that x i,m+1 = D m+1 for all i.G e ometrically, this means for each interval [i/n, (i +1)/n), the samples are either all in the left half interval (if D m+1 = 0) or all in the right half interval (if D m+1 =1 ) . If we assume for simplicity that D m+1 = 1 and the scrambling has precision m +1, then each sample is actually uniform on the right half of the interval it lands in and Licensed to Stanford Univ. Prepared on Thu Aug 3 16:13:41 EDT 2023 for download from IP 171.66.13.39.</p><p>License or copyright restrictions may apply to redistribution; see <ref type="url">https://www.ams.org/journal-terms-of-use</ref> we can approximate the error by its expectation given that M (m +1, :) = 0 and D m+1 =1as:</p><p>If instead D m+1 = 0, then all the x i fall in the left half interval and the expected error is like that above, but with the opposite sign. Hence the conditional expectation of |&#956; RQMC -&#956;| cannot be of lower order than n -1 when M (m +1, :) = 0. Because each entry of M (m +1, :) is independently 0 or 1 with equal probability, Pr(M (m +1, :) = 0)=2 -m and those rare outcomes alone make Var(&#956; RQMC )a t least of order 2 -m (n -1 ) 2 = n -3 . Theorem 3.1 makes the above reasoning rigorous.</p><p>The main takeaway is that the rare event M (m +1 , :) = 0 makes a major contribution to the variance. Curious readers may ask what happens if we explicitly avoid the event M (m +1, :) = 0. This is indeed what is done in the affine striped matrix (ASM) scrambling from <ref type="bibr">[24]</ref>. For base 2, the matrix M of ASM scrambling is nonrandom and described by M kj =1fork j and M kj =0fork&lt;j. Therefore M (m +1 , :) = 1 and ASM scrambling is able to attain the Var(&#956;)=O(n -4 ) convergence rate when f is bounded on [0, 1) <ref type="bibr">[24,</ref><ref type="bibr">Proposition 3.7]</ref>.</p><p>A similar question arises: can ASM scrambling converge faster than O(n -4 ) under stronger smoothness assumptions? The answer is again no. Assume for simplicity that D m+1 = D m+2 = 0 and that the scrambling has precision E = m+2. Because M (m +1, :) = M (m +2, :),</p><p>No wwithineac hin terv al[i/n, (i +1)/n), the sampling is either uniform in the leftmost quarter [i/n, (i+0.25)/n) or uniform in the rightmost quarter [(i+0.75)/n, (i+ 1)/n). Suppose without loss of generality that the sampling for interval i is in the rightmost quarter. Then as in the analysis of random linear scrambling, we can approximate the integration error over</p><p>When sampling for observation i is in the leftmost quarter the approximate error as above is</p><p>The f terms each contribute an error of O(n -2 ). The sign of the f terms depends on the nonrandom a i . Carefully chosen a i could possibly bring cancellation among the f terms, leaving a total error o(n -2 )f r o mt h ef terms. However, no such cancellation is possible for the f terms. Therefore, if we did manage to cancel the f terms we would still have an error</p><p>This implies that |&#956; RQMC -&#956;| is at least of order n -2 ,a n ds oV a r (&#956; RQMC ) cannot converge faster than O(n -4 ). Notice that in this case M is nonrandom, so we do not need to multiply that squared error by an event probability like 2 -m as we did in the previous example. One may summarize from the above heuristic reasoning that whenever a set L of rows of M satisfies &#8712;L M ( , :) = 0, there is an associated error of order 2 -&#8712;L . This is indeed true by Theorem 3.1. Before stating Theorem 3.1 we introduce some notation. Let</p><p>Each L &#8712;Lidentifies a set of row indices for M . Each of these finite nonempty subsets of natural numbers potentially contributes an error that scales like 2 -L 1 where L 1 = &#8712;L .W ea l s ou s e D(L) 1 = &#8712;L D . This quantity will appear as the exponent of -1 where only its value modulo two matters.</p><p>where</p><p>Licensed to Stanford Univ. Prepared on Thu Aug 3 16:13:41 EDT 2023 for download from IP 171.66.13.39.</p><p>License or copyright restrictions may apply to redistribution; see <ref type="url">https://www.ams.org/journal-terms-of-use</ref> and scalars B L from Appendix A satisfy</p><p>A&#945; k k! for all k &#8712; N is not really a more stringent assumption than f being analytic on [0, 1]. To be analytic on a closed interval requires f to be analytic on some open interval containing it. Then for the Taylor expansion of f centered at 1/2 to have a radius of convergence larger than 1/2, it is necessary that |f</p><p>We see that for each L &#8712;L , the corresponding term in (3.1) contains a factor depending on M times a factor depending on D.I t h e l p s t h a t M and D are independent random quantities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Numerical examples</head><p>The function f (x)=x exp(x) has integral &#956; =1o v e r[ 0 , 1]. We selected this f because it is infinitely differentiable as our theory requires, and it is not a polynomial and is not symmetric or antisymmetric. Those are factors that might make a function artificially easy to integrate by a specially tuned numerical method. There is also no special feature in the function at values like 1/2 or more generally integers divided by a power of 2 that might confer an advantage for Sobol' points which are generated in base 2.</p><p>We sampled this function with random linear scrambling for 0 m 15. For this we used the Sobol function in the QMCPy software of <ref type="bibr">[3]</ref>. We took the median of k = 11 RQMC integral estimates R = 250 times.</p><p>Figure <ref type="figure">1</ref> shows how the RMSE of the median of 11 RQMC estimates decreases with n as open circles connected by dashed lines. It appears to decrease at a superpolynomial rate until it reaches a limit of about 10 -9 . The Sobol' points in QMCPy default to 32 bits for the linear scramble with the digital shift carried out more bits. Our theory is for infinitely many bits. We redid the computations using 64 bits for the linear scramble, resulting in the solid points connected by solid lines. With 64 bits the apparent super-polynomial convergence holds through the entire range of sample sizes in Figure <ref type="figure">1</ref>.</p><p>Figure <ref type="figure">1</ref> also shows the RMSE of a single RQMC estimate of which there were 250&#215;11 = 2750. There is a reference curve at the n -3/2 rate interpolating the value for n = 1. A dashed line below that by a factor of &#8730; 11 corresponds to accuracy using an average of 11 RQMC estimates that could have been done at the same cost as the median of 11 RQMC estimates. In the next sections we prove that the median RQMC estimate converges at a super-polynomial rate. We also show that the sample median of k RQMC estimates attains such a rate when k grows slowly with m.</p><p>At tiny sample sizes like 1, 2 and 4 we see that a mean of 11 RQMC estimates was more accurate than a median of 11 RQMC estimates. By n 16, we see the median doing better than the dotted reference line applicable to the mean of 11 RQMC estimates.</p><p>We also investigated a six dimensional function that computes a midpoint voltage for an output transformerless (OTL) push-pull circuit. The function is given by <ref type="bibr">[26]</ref>   points. The solid line with solid points repeats that calculation using 64 bits instead of 32. The dashed line with solid points connects RMSEs of 2750 RQMC estimates without taking a median. The solid reference line is proportional to n -3/2 , running through the plain RQMC value for n = 1. The dashed line is lower by a factor of &#8730; 11 to estimate the RMSE that a mean of 11 estimates would have. which includes a link to describe the electronics background as well as some code. The results are shown in Figure <ref type="figure">2</ref>. We used scrambled Sobol' points from QMCPy. Because the true mean is not known, we plot the standard deviation instead of the RMSE. While the curve shows an apparent better rate in this multivariate problem it does not account for the bias induced by taking a median instead of a mean. That issue is outside the scope of the present article.</p><p>We note in passing that graphical rendering applications of QMC while not having much smoothness can also benefit from using a large number E of bits. See <ref type="bibr">[17]</ref> for a discussion of QMC for rendering. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Convergence rate of the median</head><p>As we see in Theorem 3.1, sets L with &#8712;L M ( , :) = 0 contribute to the RQMC error and the upper bound on that contribution contains the factor 2 -L 1 , so that sets L with small L 1 are of great concern. In the examples in Section 3 we saw that this can be the major source of error in both scrambled nets and ASM sampling. Is there a way to avoid such bad events? One approach is to redesign the scrambling to avoid &#8712;L M ( , :) = 0 for certain L. See for instance the higher order digital nets of <ref type="bibr">[5]</ref> and polynomial lattice rules in <ref type="bibr">[10]</ref>. Another approach, which is the main focus of this paper, is to take the median instead of the mean of several QMC simulations. Below we show that the median of random linear scrambling with infinite precision converges to &#956; at a super-polynomial rate when f satisfies the condition in Theorem 3.1.</p><p>In random linear scrambling, because M ([m], :) is nonsingular and M ( , :) for &gt;mhas each entry independently U{0, 1},</p><p>As a result, the event &#8712;L M ( , :) = 0 is unlikely to happen for a set L with small L 1 . We use Lemma 5.1 to control the number of L &#8712;Lwith small L 1 .</p><p>The limit in (5.2) holds with m &#8594;&#8734;through real values. Our primary use of it is for integers m 1 but we will also use it for nonintegers.</p><p>Remark 5.2. The sequence in equation (5.2) is in fact monotonically decreasing for 20 m 512, so one can reasonably guess that the bound in equation ( <ref type="formula">5</ref>.3) applies to m&gt;512 as well, although we do not have a proof for this. Lemma 5.3. In random linear scrambling, there exists a constant C such that for all m 1 and any 0 &lt;1,</p><p>When m 512,w ec a nc h o o s eC to be 0.4.</p><p>Proof. Equation (5.2) implies that there exists a constant</p><p>C for all m 1. We apply this inequality with m replaced by m &#8730; 1 -and apply the union bound to all events &#8712;L M ( , :</p><p>where We are going to apply Chebyshev's inequality to bound the probability that &#956;&#8734; is far from &#956;. For that, we first prove that the random sign terms S L (D)= (-1) D(L) 1 in Theorem 3.1 are pairwise independent Rademacher (i.e., U{-1, 1}) random variables. Then for any &#951;&gt;0 and m 3, the random linear scrambling estimate &#956;&#8734; satisfies</p><p>where C &#952; is a positive number depending only on &#952;, defined in equation (5.12) and C is the constant from Lemma 5.3.I fm 512, we can replace C by 0.4.I fa l s o m max(( &#8730; 2&#955;&#952;) -1 , 3log(&#952;m)+3), then we can replace C &#952; by 3770 max(1,&#952; -1 ).</p><p>Proof. First we condition on M and apply Chebyshev's inequality. For c&gt;0</p><p>By Theorem 3.1 and Lemma 5.5,</p><p>Let H be the event min{</p><p>Conditioning on H,w es e et h a t</p><p>Pr(H) .</p><p>Furthermore,</p><p>be the largest integer no larger than &#8730; 2N . Then according to (3.3)</p><p>where the last inequality uses the fact that factorial (or rather the Gamma function) is logarithmically convex, which implies the maximum is attained at either |L| =0 or |L| = &#8730; 2N . By Stirling's approximation</p><p>e 1/12 .</p><p>Applying the above to the bound for B L in equation (3.3) we get</p><p>Next, by Corollary 2 of Bidar <ref type="bibr">[2]</ref> (5.10)</p><p>This problem that Bidar studies is different from that of Hardy and Ramanujan, because the elements of L must be distinct while Hardy and Ramanujan's formula</p><p>Licensed to Stanford Univ. Prepared on Thu Aug 3 16:13:41 EDT 2023 for download from IP 171.66.13.39.</p><p>License or copyright restrictions may apply to redistribution; see <ref type="url">https://www.ams.org/journal-terms-of-use</ref> involves sums of not necessarily distinct numbers. Combining equations (5.7), (5.9), and (5.10)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Now define</head><p>(5.12)</p><p>To see that C &#952; is indeed finite, notice that (&#952;</p><p>)), so p(N ) grows at a sub-exponential rate in N . More explicitly, for some &#961;&gt;1w e want to find conditions on m so that p(N +1)/p(N ) &lt;&#961;for N &#955;m 2 . It is enough to have d log(p(N ))/dN&lt;log(&#961;)f o rN &#955;m 2 , where we let p take positive real valued arguments. To further simplify the calculation, we assume m ( &#8730; 2&#955;&#952;) -1 so that &#952; &#8730; 2N 1f o rN &#955;m 2 .T h e nd l o g ( p(N ))/dN is decreasing in N and we only need to verify that d log(p(N ))/dN&lt;log(&#961;)a tN = &#955;m 2 . A lengthy but straightforward calculation shows that this holds when log(&#961;)m&gt;log(&#952;m)</p><p>To present the above inequality in a simpler form, we choose &#961; =4/1.1 and approximate the inequality numerically with a sufficient condition that m 3log(&#952;m)+3. In summary, when m max((</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3770.">(5.13)</head><p>We see that C &#952; is finite and then</p><p>where the last equality follows from &#955; = 3(log(2)) 2 /&#960; 2 . This bounds the first term in (5.11) when summed over N as in (5.6).</p><p>For the second term, we use the inequality</p><p>Then using (5.14) and 2 m =exp(&#960; &#955;m 2 /3) and the assumption that m 3,</p><p>Using the bounds for both terms</p><p>The bound (5.5) follows by choosing ) under the given conditions follows by <ref type="bibr">(5.13)</ref>.</p><p>We can interpret Theorem 5.6 as placing some control on the probability that the error |&#956; &#8734; -&#956;| is appreciably larger than 2 -&#955;m 2 = n -&#955; log 2 (n) . That probability cannot be larger than &#951; + O(1/ &#8730; m) for any &#951;&gt;0. Corollaries 5.7 and 5.8 show that this provides some control on the distribution of |&#956; &#8734; -&#956;|. The median of that distribution must converge rapidly to zero. Then further below we translate this property into a property of the sample median.</p><p>Corollary 5.7. Under the conditions of Theorem 5.6 let med(&#956; &#8734; ) be the median of the distribution of &#956;&#8734; . Then for 3 m 512</p><p>Proof. Choose &#951; =1/2 -0.4/ &#8730; 3 and apply Theorem 5.6, we see that</p><p>where we have used &#951; =1 /2 -0.4/ &#8730; 3 &gt; 1/4 to replace 1/ &#8730; &#951; by 2, m 512 to replace C by 0.4a n dm 3 to bound 0.4/ &#8730; m by 0.4/ &#8730; 3. This conclusion follows once we notice the above probability must exceed 1/2ifmed(&#956; &#8734; ) falls outside that interval.</p><p>Corollary 5.8. For f analytic on [0, 1],</p><p>for any &gt;0.</p><p>Proof. Remark 3.2 shows such f satisfies the assumption of Theorem 5.6 for some A and &#945;, so equation <ref type="bibr">(5.15)</ref> shows E Var( &#956;&#8734;</p><p>w h e r ew eh a v eu s e dL e m m a5 . 3t ob o u n dP r ( H c ). The same argument in Corollary 5.7 shows | med(&#956; &#8734; ) -&#956;| &lt;cfor large enough m.</p><p>Remark 5.9. Because</p><p>Corollary 5.8 shows that the median of &#956;&#8734; converges to &#956; faster than any polynomial rate.</p><p>In practice, one can only use finite-precision scrambling and estimate the population median of &#956; by the median of a finite number of replicated samples. Below we present results for the sample median. </p><p>Proof. Let &#956;E,r for r = 1,..., 2k -1 be independently sampled estimates of &#956; using linear scrambling of precision E with n =2 m . For each of these, let &#956;&#8734;,r be the corresponding infinite precision sample value obtained from the same scrambling matrix M and digital shift D that &#956;E,r uses. The median of the &#956;&#8734;,r is denoted by &#956;(k)</p><p>therem ustbeatleastk of the &#956;&#8734;,r with |&#956; &#8734;,r -&#956;| &gt;&#961;. By applying Theorem 5.6 to each &#956;&#8734;,r , we find that</p><p>The result follows using the union bound on all 2k-1 k possible sets of k estimates &#956;&#8734;,r with errors above &#961; along with |&#956;</p><p>Corollary 5.11. Under the conditions of Theorem 5.10 suppose that 8 m 512 and E &#955;m 2 .T h e n</p><p>Pr |&#956;</p><p>Proof. We begin by noting that 2k-1 k &lt; 4 k holds for integers k 1. It holds for k = 1 and to complete an induction argument we find for k 1t h a t</p><p>We cho ose &#951; =1/25 in Theorem 5.10. Then for 512 m 8</p><p>where m 8w a su s e dt om a k e&#951; +0.4/ &#8730; m 3/16. Now the conclusion follows because 2 -E 2 -&#955;m 2 and &#969; f (t) f &#8734; t.</p><p>Corollary 5.12. Assume that E &#955;m 2 and that k is nondecreasing in m.T h e n</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E(|&#956;</head><p>Licensed to Stanford Univ. Prepared on Thu Aug 3 16:13:41 EDT 2023 for download from IP 171.66.13.39.</p><p>License or copyright restrictions may apply to redistribution; see <ref type="url">https://www.ams.org/journal-terms-of-use</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="822">ZEXIN PAN AND ART B. OWEN</head><p>In particular, when k =&#937;(m),theMSEof&#956;</p><p>E converges to &#956; at a super-polynomial rate. If further k =&#937;(m 2 ),t h e n E(|&#956;</p><p>Proof. We first introduce a parameter 0 &lt;1 and change the event H we used in the proof of Theorem 5.6 to be</p><p>Then as in equation (5.16), we can choose</p><p>and conclude that Pr(|&#956; &#8734; -&#956;| &gt;c) &lt;&#951;+Pr(H c ). By Lemma 5.3,</p><p>With this choice, c =2 -&#955;(1-)m 2 +O(m log m) and a similar reasoning as in Theorem 5.10 shows</p><p>w h e r ew eh a v eu s e d&#969; f</p><p>So the second term in equation (5.17 </p><p>where &#950; L,y L (D) is a complex number of modulus 1 (actually an integer power of exp(2&#960; &#8730; -1/b)) and B L,y L is a constant obeying a bound similar to that in Theorem 3.1. After applying the union bound as in Lemma 5.3, one can prove that with high probability all b -L 1 are small and the convergence of the median is super-polynomial. However, the constant c in | med(&#956; &#8734; ) -&#956;| = O(n -c log(n) )m u s t be smaller. To see this, notice that each L is associated with (b -1) |L| distinct y L , so</p><p>which is obviously smallest and simplest for b =2. Letq(N )= L&#8712;L 1 L 1 =N be the number of L with L 1 = N . We know that</p><p>for some constant C from VIII.24 of <ref type="bibr">[8]</ref>. Applying the arithmetic versus geometric mean inequality (2)/&#960; 2 is the rate constant that Remark 5.9 shows is attained for the base 2 case up to an arbitrarily small .</p><p>To be clear, the above heuristic reasoning does not prove the asymptotic convergence rate of the median of random linear scrambling with base b 3i ss l o w e r . I t only suggests the proof strategy used in our paper cannot produce a better upper bound than the base 2 case. Results may change if one can reason more cleverly and replace the union bound with a tighter bound.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Convergence rate under finite differentiability</head><p>Although our method is designed for smooth target functions, in applications one may not know the exact smoothness but still want to apply this method. In this section, we show that the median converges at almost the optimal rate for the class of p times differentiable functions C p ([0, 1]) whose p'th derivatives are &#955;-H&#246;lder continuous. The &#955; in H&#246;lder continuity should not be confused with the &#955; from Lemma 5.1.</p><p>First we state the counterpart to Theorem 5.6 in this setting. By a partition of [0, 1] we mean a sequence of increasing numbers 0 = x 0 &lt;x 1 &lt; &#8226;&#8226;&#8226; &lt;x N =1where N &#8712; N.F o r0&lt;&#955; 1 and a function f over [0, 1], we use the &#955;-variation measure <ref type="bibr">Chapter 14.4]</ref>, in which P is the set of all partitions of [0, 1]. Notice that when &#955; =1 ,V 1 (f ) is the total variation of f .I ff is &#955;-H&#246;lder continuous, namely |f (x i ) -f (x i-1 )| C|x i -x i-1 | &#955; for some constant C,t h e nV &#955; (f )i sfi n i t e . </p><p>holds for any &#951;&gt;0 and 0 &lt;1,w h e r e</p><p>Proof. The proof resembles that of Theorem 5.6 and is given in Appendix C.</p><p>We have the following immediate consequences.</p><p>Corollary 6.2. Let med(&#956; &#8734; ) be the median of the random variable &#956;&#8734; .T h e n under the conditions of Theorem 6.1</p><p>for any &gt; 0.</p><p>Proof. Apply Theorem 6.1 with &#951; =1 /m and 0 &lt; &lt; /(p + &#955;). The probability on the right hand side of (6.1) is below 1/2 and the given bound for |&#956; &#8734; -&#956;| is o(2 -(p+&#955;)m+ m ).</p><p>Turning to the sample median, the smoother f is, the smaller are the probable errors. If we want to control the expected squared error of the sample median, then we will need k to be large enough and smoother f will demand larger k as shown next. We do not need k to grow without bound. Corollary 6.3. Suppose that the scrambling has precision E and &#956;(k) E is the sample median of 2k -1 independent copies of &#956;E . Under the conditions of Theorem 6.1, (i) for any &gt;0,w h e nm is large enough so that </p><p>Proof. The following proof uses the same proof strategy as that of Lemma 14.10 of <ref type="bibr">[7]</ref>.</p><p>By equation (14.5) of <ref type="bibr">[7]</ref> for b =2</p><p>Applying the above bound, we get</p><p>This proves the base case. Next we prove the bound for u 2 by induction on u.</p><p>If</p><p>Hence we can apply the induction hypothesis with u -1 to equation (A.1) to get (1) .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Now notice that</head><p>Licensed to Stanford Univ. Prepared on Thu Aug 3 16:13:41 EDT 2023 for download from IP 171.66.13.39.</p><p>License or copyright restrictions may apply to redistribution; see <ref type="url">https://www.ams.org/journal-terms-of-use</ref> </p><p>We also have</p><p>This completes the proof.</p><p>Proof. First we split f (k)i n t ot w op a r t s :</p><p>w h e r ew eh a v eu s e d</p><p>, where (i) uses the Taylor expansion of (1-x) -n around x =0ev aluatedatx = &#945;/2 and n</p><p>We can bound | 1/2 0 f (x)wal k (x)dx| in a similar way. Now the conclusion follows from equation (A.2).</p><p>Proof of Theorem 3.1. By the Walsh function decomposition,</p><p>where B L is a coefficient depending on L that satisfies </p><p>For |L k | = p and |L k | &gt;p, the second and third parts, respectively, of <ref type="bibr">[7,</ref><ref type="bibr">Theorem 14.15]</ref> apply. We will use the larger upper bound from the third part. Because we assume that sup</p><p>because 3(e -1) + 2 &lt; 8. We add 2 instead of 1 here to handle the case |L k | = p, where |f (p-1) (x)| &lt;Abut |f (p) (x)| might not be smaller than A.</p><p>The conclusion follows once we use this estimate for f (k) in equation (A.3) and define</p><p>Next we prove the counterpart of Lemma 5.3. To shorten some lengthy expressions we use the notation  Proof. We will make frequent use of this quantity:</p><p>.</p><p>This provides a bijection between strictly decreasing positive integers L(j)t h a t sum to N and nonincreasing positive integers K(j) that sum to N -d(d -1)/2. It follows that q(N, d) equals the number of ways to partition N -d(d -1)/2i n t od positive integers. Hence by <ref type="bibr">[1]</ref> [1, 4. where we have used the 'upper summation' identity for binomial coefficients <ref type="bibr">[12]</ref> to simplify the sum over n. To bound the second sum, we consider separate cases for each value of L(p +1) and L 1,p . The argument is similar to the one used in Lemma C.2, but not so similar that we could just cite that lemma. First </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Licensed to Stanford Univ. Prepared on Thu Aug 3 16:13:41 EDT 2023 for download from IP 171.66.13.39.License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-useMEDIAN OF MEANS FOR RANDOMIZED NETS</p></note>
		</body>
		</text>
</TEI>
