<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>The Limiting Poisson Law of Massive MIMO Detection With Box Relaxation</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>11/01/2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10302844</idno>
					<idno type="doi">10.1109/JSAIT.2020.3039964</idno>
					<title level='j'>IEEE Journal on Selected Areas in Information Theory</title>
<idno>2641-8770</idno>
<biblScope unit="volume">1</biblScope>
<biblScope unit="issue">3</biblScope>					

					<author>Hong Hu</author><author>Yue M. Lu</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Estimating a binary vector from noisy linear measurements is a prototypical problem for MIMO systems. A popular algorithm, called the box-relaxation decoder, estimates the target signal by solving a least squares problem with convex constraints. This paper shows that the performance of the algorithm, measured by the number of incorrectly-decoded bits, has a limiting Poisson law. This occurs when the sampling ratio and noise variance, two key parameters of the problem, follow certain scalings as the system dimension grows. Moreover, at a well-defined threshold, the probability of perfect recovery is shown to undergo a phase transition that can be characterized by the Gumbel distribution. Numerical simulations corroborate these theoretical predictions, showing that they match the actual performance of the algorithm even in moderate system dimensions.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Motivations</head><p>Consider the problem of estimating a binary vector &#946; &#8712; {-1, 1} p from noisy linear measurements in the form of y = A&#946; + w.</p><p>(1)</p><p>Here, A &#8712; R n&#215;p is a known sensing matrix and w &#8764; N (0, &#963; 2 p I n ) denotes an unknown noise vector. This is a prototypical model for multi-user detections in MIMO communication systems <ref type="bibr">[1]</ref>, <ref type="bibr">[2]</ref>. It also arises in other applications such as compressed sensing <ref type="bibr">[3]</ref>, source separation <ref type="bibr">[4]</ref>, and image processing <ref type="bibr">[5]</ref>.</p><p>H. Hu and Y. M. Lu are with the John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA (e-mails: honghu@g.harvard.edu and yuelu@seas.harvard.edu). This work was supported by the Harvard FAS Dean's Fund for Promising Scholarship, and by the US National Science Foundation under grants CCF-1718698 and CCF-1910410.</p><p>Various algorithms have been proposed to solve <ref type="bibr">(1)</ref>. Examples include sphere decoding <ref type="bibr">[6]</ref>, zero-forcing <ref type="bibr">[7]</ref>, approximate message passing <ref type="bibr">[8]</ref>, Markov chain Monte Carlo methods <ref type="bibr">[9]</ref>, and semidefinite programming <ref type="bibr">[10]</ref>. Among them, a convex-optimization based method, known as the box-relaxation decoder <ref type="bibr">[11]</ref>- <ref type="bibr">[13]</ref>, is popular in practice due to its simplicity and efficiency.</p><p>The method consists of merely two steps: (1) solve a box-constrained least squares problem</p><p>and ( <ref type="formula">2</ref>) obtain an estimate of &#946; by taking the sign of x * , i.e., &#946; = sign(x * ).</p><p>The performance of this algorithm can be measured by the bit error rate (BER):</p><p>where 1 {&#8226;} denotes the indicator function. The achievable BER depends on two key parameters:</p><p>the noise variance &#963; 2 p , and the sampling ratio &#948; p def = n/p.</p><p>Under the assumption that the sensing matrix A has i.i.d. normal entries, the authors of <ref type="bibr">[12]</ref>, <ref type="bibr">[13]</ref> analyzed the asymptotic BER achieved by the box-relaxation decoder. They show that, as n, p &#8594; &#8734; with &#948; p &#8594; &#948; &#8712; ( 1 2 , &#8734;) and &#963; 2 p &#8801; &#963; 2 &gt; 0, the BER converges in probability to a deterministic limit, i.e., BER P -&#8594; E(&#948;, &#963; 2 ) &#8712; 0, 1  2 .</p><p>(</p><p>This means that for any &#963; 2 &gt; 0 and &#948; &gt; 1 2 , the algorithm can asymptotically achieve a weak recovery of &#946;: it is better than random guess, but &#946; always contains a nonzero fraction of errors.</p><p>Moreover, one can show that</p><p>The expressions in <ref type="bibr">(5)</ref>, together with (4), suggest that the asymptotic BER can be made arbitrarily small if we increase the number of measurements or reduce the noise variance. This then raises a tantalizing question: is there a regime of (&#948; p , &#963; 2 p ) such that the box-relaxation decoder can perfectly recover the target signal? Existing results in <ref type="bibr">[12]</ref>, <ref type="bibr">[13]</ref> cannot answer this question, for two reasons. First, BER p&#8594;&#8734; -&#8594; 0 only guarantees that the number of error bits</p><p>is sublinear in p, but it contains no information about the actual distribution of N e , including</p><p>whether N e = 0. The second issue is subtle but important. It has to do with the specific order with which the limits are taken in (4) and <ref type="bibr">(5)</ref>. There, we first send the dimension p &#8594; &#8734; before letting &#948; p &#8594; &#8734; or &#963; 2 p &#8594; 0. In practice, p is large but always finite, and thus the speed with which &#948; p &#8594; &#8734; and &#963; 2 p &#8594; 0 [e.g., &#963; 2 p = O(1/p) vs. &#963; 2 p = O(1/ log p)] makes all the difference. The goal of this paper is to present a precise asymptotic characterization of the probability distribution of N e . We show that, in certain scaling regimes of (&#948; p , &#963; 2 p ), the distribution of N e converges to a Poisson law. Moreover, we derive conditions under which the exact recovery of &#946; is possible and provide an asymptotic formula for P(N e = 0) in the form of a Gumbel distribution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Main Results</head><p>We make the following assumptions throughout the paper.</p><p>(A.1) The elements of A are drawn from the i.i.d. Gaussian distribution:</p><p>&#8764; N (0, 1 p ). (A.2) &#946; = -1 p , where 1 p denotes the all-ones vector.</p><p>(A.3) The noise is Gaussian: w &#8764; N (0, &#963; 2 p I n ). (A.4) lim inf p&#8594;&#8734; &#948; p &gt; 1/2 and lim sup p&#8594;&#8734; &#948; p / log p &lt; &#8734;.</p><p>(A.5) lim inf p&#8594;&#8734; &#963; 2 p log 2 p &gt; 0 and lim sup p&#8594;&#8734; &#963; 2 p &lt; &#8734;. In (A.2), we assume that each coordinate of true signal is -1 to simplify our derivations.</p><p>All the results still hold for arbitrary &#946;, due to the rotational symmetry of A. In (A.4), the requirement that lim inf p&#8594;&#8734; &#948; p &gt; 1/2 is related to the fundamental limits of convex relaxation for structural signal reconstruction. In <ref type="bibr">[14]</ref>, it is shown that, if lim sup p&#8594;&#8734; &#948; p &#8804; 1  2 , the box-relaxation decoder cannot successfully recover &#946; even in the noiseless case. In (A.5), we essentially require &#963; 2 p &gt; c/ log 2 p for some c &gt; 0. This restriction is due to the limitations of our current proof techniques. We expect that many of our results still hold without this restriction.</p><p>To state our main results, we first need to introduce the following potential function:</p><p>where &#934; is the CDF of the standard normal distribution. One can verify that F p is a strictly convex function of &#964; &#8712; (0, &#8734;). (See Appendix B for details.) Thus, one can uniquely define</p><p>Another quantity that will be crucial in our analysis is </p><p>where d TV is the total variation (TV) distance and P(&#955;) denotes a Poisson distribution with parameter &#955;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Remark 1:</head><p>The theorem, whose proof can be found in Section II-D, characterizes the asymptotic distribution of N e under certain scaling regimes of (&#948; p , &#963; 2 p ). It shows that the law of N e converges to that of a Poisson random variable with parameter &#955; p , if &#955; p grows no faster than &#8730; log p. This requirement on &#955; p is not satisfied in the setting studied in <ref type="bibr">[12]</ref> where both &#948; p and &#963; 2 p are kept as fixed constants and consequently &#955; p = O(p). In that case, one can expect that &#8730; p[ Ne p -&#934;(-1 &#964;p )] converges to a Gaussian distribution. The fact that N e can have a limiting Poisson law is not surprising. Recall from its definition in <ref type="bibr">(6)</ref> that N e is a sum of p Bernoulli random variables {1 { &#946; i =&#946; i } }. Moreover, one can show that</p><p>&#964;p ) and that these Bernoulli random variables are close to being independent. Consequently, the law of N e is approximately a Binomial distribution B(p, &#934;(-1 &#964;p )) with an expected value equal to &#955; p . As p &#8594; &#8734; with &#955; p = O( &#8730; log p), it is well-known that the Binomial distribution converges to a Poisson distribution (i.e., the "law of small numbers"). The technical contribution of this paper is to make the above arguments precise and rigorous. The main tool we use is the leave-one-out approach (see, e.g., <ref type="bibr">[15]</ref>), also known as the cavity method in statistical physics <ref type="bibr">[16]</ref>, <ref type="bibr">[17]</ref>. It allows us to carry out a detailed probabilistic analysis of the random optimization problem in <ref type="bibr">(2)</ref>.</p><p>In our proof of Theorem 1, we did not attempt to optimize the rate of convergence shown on the right-hand side of <ref type="bibr">(10)</ref>. The actual rate is likely to be faster. In Figure <ref type="figure">1</ref>, we compare the empirical distribution of N e , obtained after averaging over 10 4 independent trials, against the limiting Poisson distribution for three different problem dimensions. We can see that, even at a moderate dimension of p = 200, the Poisson approximation is already accurate.</p><p>The characterization given in Theorem 1 allows us to study the conditions under which the box-relaxation decoder can perfectly recover the target signal. Let P correct def = P(N e = 0) denotes the probability of perfect recovery. We can show that a phase transition of P correct emerges when the following quantity</p><p>is near 1. </p><p>If &#945; * = 1, a more refined characterization is available. Specifically, assume that</p><p>for some constant x &#8712; R (and thus &#945; p (x) p&#8594;&#8734; -&#8594; 1), then</p><p>where the right-hand side is the CDF of the Gumbel distribution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Remark 2:</head><p>The above proposition, proved in Section II-E, characterizes the scaling regimes of (&#948; p , &#963; 2 p ) over which perfect recovery is achievable. The possible scalings are also flexible. For example, if we keep the sampling ratio &#948; p at a fixed value &#948; &gt; 1/2, it then follows from <ref type="bibr">(11)</ref> and ( <ref type="formula">12</ref>) that &#963; 2 p = &#948;-1/2 2 log p is the critical noise variance threshold for perfect recovery to happen. Alternatively, if we fix the noise variance &#963; 2 p &#8801; &#963; 2 , then the critical threshold for the sampling ratio is &#948; p = 1/2 + 2&#963; 2 log p.</p><p>To illustrate Proposition 1, we show some results from numerical experiments. In Figure <ref type="figure">2a</ref>, we plot the phase diagram of the empirical values of P correct under different choices of (&#948; p , &#963; 2 p ), as well as the theoretical phase transition boundary separating the regimes of perfect/nonperfect recovery. In Figure <ref type="figure">2b</ref>, we plot P correct as a function of &#945; p (by fixing &#948; p = 1 and varying &#963; 2 p ). A transition indeed takes place near &#945; p = 1, and the transition becomes sharper as we increase the problem dimension p. When p is not very large, a more accurate approximation of P correct is given by the Gumbel distribution. This is illustrated in Figure <ref type="figure">2c</ref>, where we zoom in the region near the phase transition and compare the empirical success probability against the theoretical prediction given in <ref type="bibr">(14)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Related Work</head><p>The precise analysis of high-dimensional signal estimation has already been the subject of a vast literature. Underpinning these rich results are several powerful techniques developed over the years, including the nonrigorous replica method from statistical physics <ref type="bibr">[18]</ref>- <ref type="bibr">[20]</ref>, approximate message passing (AMP) <ref type="bibr">[21]</ref>- <ref type="bibr">[23]</ref>, the cavity method <ref type="bibr">[16]</ref>, <ref type="bibr">[17]</ref> and leave-one-out analysis <ref type="bibr">[15]</ref>,</p><p>Gaussian min-max theorem (GMT) <ref type="bibr">[24]</ref>, <ref type="bibr">[25]</ref>, as well as the geometric framework based on</p><p>Gaussian width <ref type="bibr">[14]</ref> and statistical dimensions <ref type="bibr">[26]</ref>.</p><p>The box-constrained least square problem in (2) has been previously analyzed in <ref type="bibr">[12]</ref>, <ref type="bibr">[13]</ref> using GMT techniques. Analysis of similar problems can also be carried out by AMP <ref type="bibr">[8]</ref>.</p><p>However, these existing studies consider the setting where both the sampling ratio &#948; p and the noise variance &#963; 2 p are kept as constants as p &#8594; &#8734;. Under such scalings, one can establish that the empirical measure of x * , defined as &#181;(x * ) def = 1 p p i=1 &#948; x * i , converges to some deterministic limiting measure. However, the convergence of the empirical measure is insufficient for our purpose: flipping the signs of o(p) entries of x * will completely change the number of error bits N e , but it has no effect on the limiting empirical measure. In view of this, we choose to use the leave-one-out approach, which allows us to construct a surrogate of x * , denoted by x, in our analysis. We show that x *x &#8734; &#8594; 0 but the statistical properties of x are much easier to obtain. We will elaborate on this point in Sec. II.</p><p>Our work considers settings where (&#948; p , &#963; 2 p ) can scale with the problem dimension p. Similar settings with flexible scalings have been explored in other contexts, including, e.g., sparse linear regression <ref type="bibr">[27]</ref>- <ref type="bibr">[29]</ref>, spiked matrix estimation <ref type="bibr">[30]</ref>, and low-rank matrix recovery <ref type="bibr">[31]</ref>. These studies established the precise conditions under which perfect recovery in these problems is achievable. In our work, we go one step further by establishing the asymptotic distribution of the number of error bits N e .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. ROADMAP OF ANALYSIS</head><p>This section provides a general roadmap to our proof of Theorem 1, which is given in Section II-D. To emphasize readability, we only highlight the main ideas and key intermediate results here, leaving heavier technical details to the subsequent sections and to the appendix.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. An Equivalent Scalar Problem</head><p>To analyze N e , we need to understand the statistical properties of x * , i.e., the optimal solution of (2). A basic challenge lies in the fact x * is a high-dimensional vector with no closed-form expressions. The key idea behind the cavity approach <ref type="bibr">[16]</ref>, <ref type="bibr">[17]</ref> or the leave-one-out analysis <ref type="bibr">[15]</ref> is to circumvent this issue by focusing instead on a single coordinate of x * . Specifically, to study the ith coordinate x i , we can first rewrite the original problem (2) as arg min</p><p>=arg min</p><p>where x \i is the vector formed by removing x i (and &#946; \i is defined in the same way), a i is the ith column of A, A \i denotes the matrix formed by removing a i from A, y \i = A \i &#946; \i + w, and</p><p>In reaching <ref type="bibr">(16)</ref>, we have also used Sion's minimax theorem <ref type="bibr">[32]</ref> to swap the inner minimization and maximization in <ref type="bibr">(15)</ref>.</p><p>Let u * \i = arg min u L i (u) and define a function</p><p>We can then check that the optimization problem ( <ref type="formula">16</ref>) has the same solution as arg min</p><p>Thus, starting from the original problem (2) and after optimizing over all the "nuisance" variables</p><p>x \i , we have reached in <ref type="bibr">(19)</ref>, an equivalent scalar optimization problem over x i .</p><p>To nonspecialists, the reformulations leading to ( <ref type="formula">19</ref>) might look slightly mysterious, but there are several good reasons for doing so. First, note that ( <ref type="formula">19</ref>) is obtained by subtracting -L i (u * \i ) from ( <ref type="formula">16</ref>). This manipulation does not change the minimizer of ( <ref type="formula">16</ref>), but it sets the magnitude of <ref type="bibr">(19)</ref> to be O(1), which facilitates our later analysis. Second, we explicitly pull out a &#8890; i u * \i in <ref type="bibr">(19)</ref>, since its distribution is much easier to characterize than a &#8890; i u * in ( <ref type="formula">16</ref>), due to the independence between a i and u * \i . This is in fact a major benefit of the leave-one-out analysis. Third, as we will show next g p,i (x i&#946; i ), which is a random one-dimensional function g p,i (v) evaluated at v = x i&#946; i , has a particularly simple limiting form as p &#8594; &#8734;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. A Limiting Quadratic Function</head><p>The following proposition, whose proof is given in Section III-A, shows that g i (v) uniformly converges to a simple quadratic function.</p><p>Proposition 2: Under (A.1)-(A.5), there exists c &gt; 0 such that for any i &#8712; [p] and &#949; &gt; 0,</p><p>where</p><p>Moreover, for &#947; &gt; 2 and all large enough p, |A p -A * p | &lt; cp -1/(2&#947;) , where</p><p>and f p and &#964; p are the quantities defined in <ref type="bibr">(8)</ref>.</p><p>There is a simple intuitive explanation for why g p,i (v) is approximately a quadratic function.</p><p>Recall that u * \i is the minimizer of L i (u). Thus, in a local neighborhood near u * \i , we can approximate L i (u) by a second-order Taylor expansion:</p><p>, where &#948; = uu * \i and H i corresponds to the Hessian of L i (u) at u * \i . Substituting this approximation into <ref type="bibr">(18)</ref>, we can immediately obtain that g p,i (v) &#8776;</p><p>and it is independent of H i due to the leave-one-out construction, we can expect a &#8890; i H -1 i a i to concentrate near a constant as p &#8594; &#8734;. Of course, the above explanation is not rigorous in that L i (u) is not smooth and H i may not exist. This is one technical challenge we address in the proof. Since 1 2 A * p v 2 is a good approximation of g p,i (v), we can now approximate the optimization problem in <ref type="bibr">(19)</ref> by</p><p>where Prox [-1,1] denotes the proximal operator of the indicator function on [-1, 1]. Its solution, denoted by x i , provides a good surrogate of x * i , as shown in the following proposition. Proposition 3: Under (A.1)-(A.5), for any &#947; &gt; 2, there exists c &gt; 0, such that, for any i &#8712; [p] and &#949; &#8712; (0, 1),</p><p>We prove this result in Section III-B. Here, we demonstrate the accuracy of the approximations stated in ( <ref type="formula">20</ref>) and ( <ref type="formula">24</ref>) via numerical results shown in Figure <ref type="figure">3</ref>.</p><p>Thanks to the independence between a i and u * \i , the surrogate solution x i is much easier to analyze than x * i . Accordingly, we can consider the following approximations of &#946; and N e :</p><p>Applying a union bound to <ref type="bibr">(24)</ref> gives us max i |x * ix i | P -&#8594; 0, i.e., the surrogate vector x is close to x * in &#8467; &#8734; distance. This then allows us to show that P( &#946; = &#946;) &#8594; 0, which also implies</p><p>Theory Empirical </p><p>and accordingly,</p><p>The proof of Proposition 4 can be found in Section III-C. It shows that the distribution of N e is well captured by that of N e . Therefore, to obtain the limiting distribution of N e , we just need to analyze N e , which is what we are going to do next.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Approximate independence of { &#946; i } i&#8712;[p]</head><p>To derive the distribution of N e , we need to know the joint distribution of</p><p>are correlated, but the correlations are weak. In fact, we can prove something stronger.</p><p>The following result, proved in Section IV-A, shows that any size-k subset of {a &#8890; i u * \i } i&#8712;[p] are approximately independent, provided that k is not too large. </p><p>where &#934;(&#8226;) is the CDF of the standard Gaussian and &#8710; p,k</p><p>It follows from ( <ref type="formula">23</ref>) and ( <ref type="formula">25</ref>) that</p><p>(Recall that we have assumed that &#946; i = -1 for all i.) By taking b i = -A * p in <ref type="bibr">(28)</ref>, we can conclude that the k events</p><p>) are also approximately independent. This is made precise by the following proposition, whose proof can be found in Appendix E.</p><p>, there exists c &gt; 0, such that</p><p>Moreover, if &#963; 2 p &#8805; c &#8242; log 2 p for some c &#8242; &gt; 0, then for all large enough p,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Proof of the Main Theorem</head><p>We are now ready to prove Theorem 1 by showing that the limiting distribution of N e converges to Poisson. Recall that N e = p i=1 1 &#946; i =&#946; i . The approximate independence of {1 &#946; i =&#946; i } makes the analysis tractable. Classical results on Poisson approximation of rare events deal with the sum of p i.i.d. Bernoulli random variables with success probability &#955;/p. As p &#8594; &#8734;, the sum converges in distribution to a Poisson random variable with rate &#955;. Things are slightly different in our case, since N e is a summation of p weakly correlated Bernoulli random variables. The following proposition, proved in Section IV-B, shows that the Poisson convergence still holds under the weaker condition of approximate independence.</p><p>where P(&#955;) denotes a Poisson distribution with parameter &#955;.</p><p>Finally, since the TV distance is a metric, the statement of Theorem 1 immediately follows from ( <ref type="formula">27</ref>), <ref type="bibr">(31)</ref> and the triangle inequality.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E. Proof of Proposition 1</head><p>Using the Gaussian tail bounds (133) and (134) given in Appendix F, we can get</p><p>Therefore, if &#945; * &gt; 1, it directly follows from Theorem 1 that P(N e = 0) = 1.</p><p>The case that &#945; * &lt; 1 is more complicated. One can show that &#955; p &#8805; p c(&#945; * ) , where c(&#945; * ) is some constant, so it is possible lim p&#8594;&#8734; d TV ( N e , N e ) &#8594; 0. Instead, we can look at a subset</p><p>Define N e,K as the number of error bits in K. and &#955; p,K def = |K|&#934;(-&#964; -1 p ). We can find K satisfying &#955; p,K &#8781; &#8730; log p. Then following same steps of proving Proposition 4 and Proposition 10 in Appendix G, we can show lim p&#8594;&#8734; P(N e,K = 0) = 0, which indicates that</p><p>Finally, we prove <ref type="bibr">(14)</ref>. If &#945; p satisfies (13), then for large p, &#963;  <ref type="formula">9</ref>) and <ref type="bibr">(10)</ref> gives us</p><p>where step (a) follows from m(-&#964; -1 p ) &#964;p &#8594; 1, step (b) follows from 2&#945; p &#964; 2 p log p &#8594; 1 and we use (13) in step (c).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. THE LIMITING QUADRATIC FUNCTION</head><p>The goal of this technical section is to make the approximations shown in Figure <ref type="figure">3</ref> rigorous.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Proof of Proposition 2</head><p>To lighten notation, we will sometimes omit the leave-one-out subscript as used in Sec. II-A.</p><p>For example, A \i will be replaced by A, and a i by a, as long as doing so causes no confusion.</p><p>Let us first introduce the following function:</p><p>where</p><p>Using G p (s) and omitting subscript i, scalar function g p,i (v) defined in <ref type="bibr">(18)</ref> can be also expressed as:</p><p>and correspondingly, we re-write <ref type="bibr">(19)</ref> as:</p><p>It can be seen that G p (s) is related with the conjugate function of L(u), which is a strongly convex function. Therefore, G p (s) and g p (v) possess some nice properties that will be useful in our proof. We gather them together in Appendix A.</p><p>We first show that g p (v) concentrates around its expectation, which is the following proposition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Its proof will be given in Appendix C.</head><p>Proposition 8: There exists c &gt; 0, s.t. for any &#949; &gt; 0,</p><p>The next result shows that Eg p (v) is essentially a quadratic function in the large p limit.</p><p>where A p is defined in <ref type="bibr">(21)</ref>.</p><p>Proof: First we introduce the following auxiliary functions:</p><p>where a &#8764; N (0, I n ), independent of A, w. Clearly, the original problem (2) is the special case when &#952; = 0. For notational convenience, we also define the expectation of Q p (&#952;) as:</p><p>where L(u) is given in <ref type="bibr">(33)</ref>. Note that the connection between Q p (&#952;) and Eg p (v) is:</p><p>i.e., Eg p (v) can be approximated by the derivative of Q p (&#952;) at &#952; = 0. To make this intuition rigorous, we need to study the analytical properties of Q p (&#952;).</p><p>First, we show that Q p (&#952;) is differentiable on [0, &#8734;) and Q &#8242; p (&#952;) is Lipschitz continuous. Indeed, from <ref type="bibr">(33)</ref> and <ref type="bibr">(38)</ref>,</p><p>where w &#8764; N (0, I n ) and &#251;&#952; corresponds to the optimal solution of (40). In step (a), we use dominated convergence theorem (DCT) to interchange derivative and expectation. By the same argument of (77) in Appendix A, we have for any b, c &#8805; 0,</p><p>On the other hand, for any &#952; &#8805; 0,</p><p>Combining ( <ref type="formula">41</ref>), ( <ref type="formula">42</ref>) and ( <ref type="formula">43</ref>), for any b &gt; c &#8805; 0, we can get</p><p>Therefore,</p><p>Now we are ready to analyze Eg p (v). By the mean value theorem, we get from (39) that</p><p>where &#954; p &#8712; [0, 1]. From (44) and (45), we deduce that</p><p>On the other hand, from (41),</p><p>It can be checked from (15) that u * = Ax *y. Combining (46) and (47), we get <ref type="bibr">(36)</ref>.</p><p>Remark 3: It will be shown later [c.f. (59)] that A p &#8805; C&#948; p , for some constant C &gt; 0.</p><p>Therefore, we know from (36) that the quadratic approximation of Eg p (v) is accurate for large p, if &#963; p &#8811; p -1/2 . We will prove that, when &#963; p &lt; c &#8730; log p for some constant c, perfect recovery is achieved with high probability. This means that &#963; p &#8811; p -1/2 already covers the regime where we are most interested in. In the following, we will take &#963; p &#8805; 1 log p . Proposition 8 and 9 immediately implies the first part of Proposition 2, i.e., <ref type="bibr">(20)</ref>. Next we show A p converges to A * p in the high-dimensional limit. From (47),</p><p>Hence, it boils down to analyzing Q &#8242; p (&#952;) and its limit, which can be done as follows. 1) Convergence of Q p (&#952;): The CGMT framework in <ref type="bibr">[13]</ref>, <ref type="bibr">[33]</ref> can be readily applied to computing the limit of Q p (&#952;) in high dimensions.</p><p>Lemma 1: There exists c &gt; 0, s.t., for any &#949; &gt; 0 and &#952; &#8712; [0, 1],</p><p>where</p><p>with F p defined in <ref type="bibr">(7)</ref>. Also for any &#947; &gt; 2, there exists c &gt; 0 such that</p><p>Remark 4: The proof of Lemma 1 will be given in Appendix D. We can find</p><p>, where f p is defined in <ref type="bibr">(8)</ref>. This can be understood from <ref type="bibr">(37)</ref> and (49), since Q * p (&#952;) is the limiting value of the squared fitting error when the noise variance is &#952; + &#963;</p><p>and</p><p>Then together with bound (84) shown in Appendix B, we know there exists C &gt; 0, s.t., Q * p &#8242;&#8242; (&#952;) &#8804;</p><p>C, for all &#952; &#8805; 0.</p><p>3) Convergence of A p to A * p : Now we can show the convergence of the curvature A p , which also implies the simple limiting form of g p (v).</p><p>Lemma 3: There exists c &gt; 0 such that</p><p>Proof: For &#947; &gt; 2, there exists C &gt; 0, s.t. for &#952; &#8712; (0, 1],</p><p>where in step (a), we use ( <ref type="formula">22</ref>), ( <ref type="formula">48</ref>) and ( <ref type="formula">52</ref>) and in step (b), we use (44), (51) and Lemma 2.</p><p>Therefore, taking &#952; = p -1 2&#947; and using Assumptions (A.4) and (A.5), we can get (54).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Proof of Proposition 3</head><p>Proposition 2 indicates that the original scalar problem (34) can be well approximated by</p><p>which has an explicit optimal solution:</p><p>Note that the difference between x and x should be small, as implied by ( <ref type="formula">23</ref>), ( <ref type="formula">57</ref>) and (54). In fact, we can directly prove x &#8594; x * without considering x. The reason for us to introduce this intermediate variable is to achieve a better convergence rate in our proof.</p><p>The first lemma below shows that the objective function of (56), i.e.,</p><p>is strongly convex.</p><p>Lemma 4: There exists K &gt; 0, s.t., A p &#8805; K&#948; p for all p large enough. Therefore, &#8467; p (x) is K&#948; p -strongly convex.</p><p>Proof: By (8) and the definition of A * p , we have</p><p>Then from assumption (A.5) and (54), we know there exists K &gt; 0 s.t. A p &#8805; K&#948; p &gt; 0 and &#8467; p (x) is K&#948; p -strongly convex.</p><p>Then together with uniform convergence proved in Proposition 2, we can show x * &#8594; x.</p><p>Lemma 5: There exists c &gt; 0 s.t., for &#949; &#8712; (0, 1),</p><p>Proof: Since &#8467; p (x) is K&#948; p -strongly convex,</p><p>Let &#8467; p (x) be the objective function in <ref type="bibr">(19)</ref>. From <ref type="bibr">(20)</ref> we know there exists c &gt; 0, s.t., for</p><p>From ( <ref type="formula">61</ref>) and (62), we can get there exists c &gt; 0 s.t. for all &#949; &#8712; (0, 1),</p><p>. Then changing &#8730; &#949; to &#949; in the above, we get (60).</p><p>Furthermore, using (54) we can also show x &#8594; x.</p><p>Lemma 6: For &#947; &gt; 2, there exists c &gt; 0, s.t., for &#949; &#8712; (0, 1),</p><p>Proof: By the non-expansiveness of proximal operator Prox [-1,1] (&#8226;), from ( <ref type="formula">23</ref>) and (57) we know there exists C &gt; 0, s.t.,</p><p>where we have used (54) and (59). Recall that u * = Ax *y, so similar to (104) and (105), we obtain that there exists c &gt; 0, s.t., for all &#949; &gt; 0, P</p><p>. Since a and u * are independent, then from (63) it is not hard to show there exists c &gt; 0, s.t., for all &#949; &#8712; (0, 1), P (|x -x| &gt; &#949;) &#8804; c &#949; e -p 1/&#947; &#949; 2 /c . Lemma 5 and 6 imply Proposition 3, based on which we can now prove Proposition 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Proof of Proposition 4</head><p>Our strategy is to show that P( &#946; = &#946;) is small, which implies P( N e = N e ) is small and so is d TV ( N e , N e ). Recall that N e and N e are in the same probability space, and we have assumed</p><p>Then the following simple relation holds: 5 ) and A * p (-1p -1 5 ), k = 1 and &#949; = p -1 5 in (28), we can show </p><p>By union bound,</p><p>Since d TV ( N e , N e ) &#8804; P( N e = N e ) &#8804; P( &#946; = &#946;), we obtain <ref type="bibr">(27)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. ASYMPTOTIC DISTRIBUTIONS</head><p>This is another technical section. Our main goal here is to derive the asymptotic distribution of { x i } and that of N e .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Proof of Proposition 5</head><p>By the exchangeability of a &#8890; i u * \i i&#8712;[p] , we just need to consider the joint distribution of a &#8890; i u * \i i&#8712;[k] , i.e., the first k coordinates. A key result we are going to establish is that</p><p>are approximately independent, provided that k is not too large.</p><p>Let u * \[k] be the optimal solution of</p><p>where A \[k] is the matrix formed by removing the first k columns of A and &#946; \[k] is defined in the same way. In other words, u * \[k] is the leave-k-out solution of min u L(u). Also define</p><p>Our proof of approximate independence of</p><p>. This is proved in Lemma 9.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2) Show the joint distribution of a</head><p>, which are mutually independent. This is proved in Lemma 12.</p><p>Details of the proof can be found in Appendix E.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. The Limiting Poisson Law of N e</head><p>Before presenting the actual proof, it would help to first show some heuristic derivations.</p><p>We employ the following general inclusion-exclusion principle <ref type="bibr">[34, p.106</ref>]: for any k &#8712; [p], the probability P k that exactly k among p events A 1 , . . . , A p occur is</p><p>where </p><p>where &#955; p is defined in <ref type="bibr">(9)</ref>. Then combining (69) and (71), we have</p><p>which implies that the PMF of N e is approximately Poisson with rate &#955; p .</p><p>We now quantitatively analyze the error of approximation in (72). First, we approximate the right-hand side of (69) by a truncated sum: L m=k m k (-1) m-k S m , with L &#8804; p. The reason for this operation is that S [m] &#8776; &#934; m -1 &#964;p may not be accurate for large m, since we only have approximate finite event independence. We then need to control the error caused by the truncation. Accordingly, we can apply Bonferroni's inequality <ref type="bibr">[34, p.110]</ref>, stated as follows.</p><p>Under the same setting as (69), for k + 1 &#8804; L &#8804; p, we have</p><p>Therefore, we need to choose a reasonably large L to attain a good trade-off between the approximation error of (71) and the truncation error of ( <ref type="formula">73</ref>) and ( <ref type="formula">74</ref>), such that they are both properly bounded. Our proof of Proposition 7 follows this idea. The details can be found in Appendix H.</p><p>1) G p (s) is convex and differentiable in R n , with</p><p>where</p><p>or equivalently,</p><p>3) g p (v) is convex and differentiable with</p><p>we get (75) and (76).</p><p>Since g p (v) = G p (av), g p (v) is also convex and differentiable with g &#8242; p (v) = a &#8890; &#8711;G p (av). From (75) and (76), we know &#8711;G p (av) &#8804; a v. Therefore, (78) follows from Cauchy-Schwartz inequality and the fact that |v| &#8804; 2. <ref type="bibr">(8)</ref> In this section, we collect some useful properties of the one-dimensional optimization <ref type="bibr">(8)</ref>, which was first studied in <ref type="bibr">[13]</ref>. For our purpose, we consider a slightly more general setting:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Properties of the Optimization Problem</head><p>where t &gt; 0 is a parameter. Note that (8) and the inline optimization of (50) are the cases where t = &#963; 2 p and t = (1 + &#952;) 2 &#963; 2 p , respectively. Also we define the squared loss function: R p (t)</p><p>and evidently,</p><p>1) Uniqueness of Optimal Solution: Let &#964; (t) be the minimizer of (79), which is the solution of stationary equation:</p><p>By direct differentiation of h(&#964; ) above, we can show h</p><p>so it is a strictly increasing function. Also lim &#964; &#8594;0 h(&#964; ) = -&#8734; and lim &#964; &#8594;&#8734; h(&#964; ) = &#948; p &gt; 0. This also establishes that the strict convexity of f p (t). Therefore, &#964; (t) is unique for any t &gt; 0. Besides, we can directly check that &#964; (t) is differentiable with</p><p>so &#964; (t) is strictly increasing.</p><p>2) Upper and Lower Bounds of &#964; (t): Since h</p><p>and evidently, v p &lt; 1/2. Therefore, &#964; (t) can be bounded as:</p><p>3) Properties of f p (t): From (79) we get f p (t) &#8805; 0, f &#8242; p (t) = 1 2&#964; (t) &gt; 0 and f &#8242;&#8242; p (t) = -&#964; &#8242; (t) 2&#964; 2 (t) &lt; 0, so f p (t) is nonnegative, strictly increasing and concave. On the other hand, letting &#964; = t &#948;p in (79) we can get f p (t) &#8804; C &#8730; t&#948;p 2</p><p>+ 1 , where C is some constant.</p><p>Therefore, R p (t) is strictly increasing and convex. From (80), we can show R &#8242; p (t) is bounded:</p><p>On the other hand, R &#8242;&#8242; p (t) satisfies: R &#8242;&#8242; p (t) &#8804; &#981;(-2/&#964; (t)) 2&#964; (t)t , where &#981;(x) is the PDF of standard Gaussian. Then using (82) and Assumption (A.4), we know there exists C &gt; 0, s.t., for t &gt; 0,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Proof of Proposition 8</head><p>We first prove the pointwise convergence of g p (v) to Eg p (v): there exists c &gt; 0, s.t., for any v &#8712; [0, 2] and &#949; &gt; 0,</p><p>Recall that g p (v) = G p (av), so it is equivalent to prove |G p (av) -EG p (av)| &#8594; 0. We first control the moment generating function of G p (av) -EG p (av). Let b be an i.i.d. copy of a. For all |&#955;| &#8804; p 2 &#8730; 2&#960; , we can apply Theorem 2.2 of <ref type="bibr">[36, p.176]</ref> to get</p><p>In step (a), we take expectation over b and use |v| &#8804; 2 and &#8711;G p (av) &#8804; 2 a , as implied by (76); In step (b), we use the inequality log(1 + x) &#8805; x 1+x , for x &gt; -1 and the condition that |&#955;| &#8804; p 2 &#8730; 2&#960; . As a result, for any &#949; &#8805; 0 and &#955; &#8712; 0, p</p><p>After minimizing the exponent on the RHS of (86</p><p>2&#960; . The other direction also holds by the same reasoning. Thus,</p><p>To show uniform convergence <ref type="bibr">(35)</ref>, it suffices to prove the Lipschitz continuity of g p (v) and Eg p (v). From Lemma 1 of <ref type="bibr">[37]</ref>, we have for all x &gt; 0, P a</p><p>Let x = n( &#8730; y + 1 -1) 2 , we have for y &#8805; 2, P a 2 &#948;p -1 &#8805; y &#8804; exp -ny 4 . Therefore, by taking y = K/&#948; p , we get for any K &#8805; 2&#948; p ,</p><p>Combining it with (78), we know for K &#8805; 2&#948; p , g p (v) is 2K-Lipschitz with probability greater than 1exp(-pK 4 ). From (78), we can also get</p><p>Combining the Lipschitz continuity of g p (v) and Eg p (v) with (85),</p><p>we can obtain (35) by a standard epsilon-net argument as follows. We need to consider different values of &#949;:</p><p>1) If &#949; &#8805; &#948; p , we construct an epsilon-net of [0, 2] formed by the following points:</p><p>where we have used the Lipschitz continuity of g p (v) and Eg p (v), as well as &#949; &#8805; &#948; p . Then </p><p>2) If &#949; &lt; &#948; p , we construct an epsilon-net of [0, 2] formed by the following points: </p><p>Combining ( <ref type="formula">90</ref>) and (91), together with symmetry and the union bound, we directly get <ref type="bibr">(35)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Proof of Lemma 1</head><p>The proof follows the CGMT framework <ref type="bibr">[12]</ref>, <ref type="bibr">[13]</ref>. The optimization in <ref type="bibr">(37)</ref> is equivalent to</p><p>where w &#8764; N (0, I n ). The corresponding auxiliary problem (AO) of ( <ref type="formula">92</ref>) is</p><p>where (x) + def = max{x, 0}, g &#8764; N (0, I n ), h &#8764; N (0, I p ), h 0 &#8764; N (0, 1) and they are mutually independent.</p><p>Now we analyze the inline optimization problem of (93), which can be simplified as:</p><p>where in (95) we make a change of variable: &#964; &#8730; &#948;p &#8594; &#964; and the parametric function v (h; &#964;, g) is defined as:</p><p>Denote &#964; * AO (&#952;) as the optimal solution in (95). From (94) and the fact that we did a change of variable in (95), it can be seen &#964; * AO (&#952;) =</p><p>. Note that this is consistent with (82).</p><p>We now show objective function F (&#964; ; &#952;, g, h) in (95) converges to F (&#964; ; &#952;) def = F p (&#964; ; &#952; + &#963; 2 p , &#948; p ) with high probability over &#964; &#8712; &#8486;(&#963; p , &#948; p ). The first and third term in RHS of (95) is relatively easy to deal with. By the concentration of g &#8730; n (e.g. <ref type="bibr">[38, p.44]</ref>) and h 0 &#8730; p , there exists c &gt; 0, s.t., for any &#949; &gt; 0 and &#964; &#8712; &#8486;(&#963; p , &#948; p ),</p><p>and</p><p>Here in (97), we have used the fact that for &#964; &#8712; &#8486;(&#963; p , &#948; p ), &#964; &#948;p 2 + &#952;+&#963; 2 p 2&#964; &#8804; C &#948; p , where C is some constant. For the second term, define the following function: V (h; &#964;, g)</p><p>where v (h; &#964;, g) is given in (96). We now show there exists c &gt; 0, s.t., for any &#949; &#8805; 0,</p><p>where</p><p>First, note that for any fixed g, v (h; &#964;, g) is 2-Lipschitz continuous, so V (h; &#964;, g) is 2 &#8730; p -Lipschitz continuous w.r.t. h. Also we can verify that E h v(h; &#964;, g) = f (&#964; g ), with &#964; g def = &#964; &#8730; n g . Then using Theorem 2.1 in <ref type="bibr">[36, p.</ref>176], we have for any g and &#949; &gt; 0,</p><p>It can be checked that f (t) in (100) satisfies f (t) &#8712; [-1, 0] for any t &gt; 0. Combining this with (100) and (101), we know (99) holds for &#949; &gt; 1 2 . On the other hand, by a direct differentiation, we have f</p><p>, for all t &gt; 0. Therefore, for any &#949; &#8712; (0, 1/2), on the event E &#949; = g &#8730; n -1 &lt; &#949; , which happens with probability P(E &#949; ) &#8805; 1ce -n&#949; 2 /c , there exists c &gt; 0, s.t., |&#964; g&#964; | &#8804; c&#949;. As a result, there exists c &gt; 0, s.t., for &#949; &#8712; (0, 1/2), P(|f (&#964; g )f (&#964; )| &gt; &#949;) &#8804; ce -n&#949; 2 /c . This together with (101) implies there exists c &gt; 0, s.t., for &#949; &#8712; (0, 1/2), inequality (99) still holds.</p><p>Combining (97) and (99), we get that there exists c &gt; 0, s.t., for any &#949; &gt; 0, &#964; &#8712; &#8486;(&#963; p , &#948; p ) and &#952; &#8712; [0, 1],</p><p>On the other hand, it can be verified from definition that there exists C &gt; 0, s.t., F (&#964; ; &#952;, g, h)</p><p>and F (&#964; ; &#952;) are both C&#948; p -Lipschitz over &#964; &#8712; &#8486;(&#963; p , &#948; p ). Then by a similar epsilon-net argument as in the proof of Proposition 8, we can get:</p><p>Since &#966;(&#952;, g, h) = min &#964; &#8712;&#8486;(&#963;p,&#945;p) F (&#964; ; &#952;, g, h) and 2Q * p (&#952;) = min &#964; &#8712;&#8486;(&#963;p,&#945;p) F (&#964; ; &#952;), from (103) we know there exists c &gt; 0, s.t., for any &#949; &gt; 0,</p><p>Since 2Q AO,p (&#952;) = max{&#966;(&#952;, g, h), 0}, from (104) we have</p><p>Taking into account the fact Q * p (&#952;) &#8804; C&#948; p (as shown in Appendix B), we can further obtain the following Bernstein's type inequality: there exists c &gt; 0, s.t., for any &#949; &gt; 0 and &#952; &#8712; [0, 1],</p><p>Then by CGMT (e.g., <ref type="bibr">[33,</ref><ref type="bibr">Corollary 5.1]</ref>), (106) implies that there exists c &gt; 0, s.t.,</p><p>Finally, from (107) we know there exists c &gt; 0, s.t., for any &#951; &gt; 0 and &#952; &#8712; [0, 1],</p><p>Then for &#947; &gt; 2, letting &#951; = p -1/&#947; in (109) and taking into account Assumption (A.4), we can get E|Q p (&#952;) -Q * p (&#952;)| &#8804; cp -1/&#947; for some c &gt; 0 and all the sufficiently large p. As a result,</p><p>Since the constant c above does not depend on &#952;, we get (51).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E. Approximate k-wise</head><p>To prove this, we can show a</p><p>, for any i &#8712; [k] and use the fact that a &#8890; i u * \i and a &#8890; i u * \[k] are in the same probability space. Lemma 8: There exists c &gt; 0, s.t., for any &#949; &gt; 0 and i = 1, 2, . . . , p -1,</p><p>Proof: To lighten notation, define</p><p>. Denote the objective function in (66) as L \[i] (u), (with k replaced by i here). By strong convexity of L \[i] (u), we have</p><p>and</p><p>where we use the fact |&#946; i | = 1 and Cauchy-Schwartz inequality in the last step. From (111) and</p><p>(112), we can get &#8710; [i] &#8804; 2 a i+1 . Therefore, there exists c &gt; 0, s.t., for any &#949;, D &gt; 0,</p><p>where (x) + def = max{x, 0}. Then by choosing D &#8781; &#948; p for small &#949; and D &#8781; &#948; p &#949; for large &#949;, we can get (110).</p><p>Lemma 9: There exists c &gt; 0, s.t., for any b i &#8712; R, i = 1, 2, . . . , k and &#949; &gt; 0,</p><p>and</p><p>Proof: From Lemma 8, for any k &#8712; [p], there exists c &gt; 0, s.t., for any &#949; &gt; 0,</p><p>By the exchangeability of a</p><p>, we have for any i &#8712; [k], it holds that</p><p>Therefore, we have for any &#949; &gt; 0,</p><p>which is (114). The other direction (115) can be obtained in the same way.</p><p>2)</p><p>Lemma 10: When k &#8804; p 2 , there exist C, c &gt; 0, s.t. for any &#949; &gt; 0,</p><p>.</p><p>(116)</p><p>Proof: By the definition of u * \[k] , we can get </p><p>where F p is defined in <ref type="bibr">(7)</ref>. Similar to (104), we can get for k &#8804; p 2 , &#8707;c &gt; 0, s.t., &#8704;&#949; &gt; 0, Proof: Using (116) and following the similar steps as (113), we can get:</p><p>where in step (a), we use (127), in step (b), we use (82) and step (c) follows from inequality (131) and conditions k &#8804; p 1 8 and &#963; 2 p &#8805; c &#8242; log 2 p . The other directions of ( <ref type="formula">129</ref>) and (130) can be derived similarly, which lead to <ref type="bibr">(29)</ref> and <ref type="bibr">(30)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>F. Gaussian Tail Bounds</head><p>Here we gather some properties of the Gaussian tail bounds that will be used in our proof.</p><p>Let &#934;(x) and &#981;(x) be the CDF and PDF of the standard Gaussian distribution, respectively. It is well known that (see <ref type="bibr">[38, p.14]</ref> for a proof), for any x &gt; 0,</p><p>where m(x) def = &#934;(-x) &#981;(-x) is known as the Mills ratio. Correspondingly, the inverse Mills ratio is defined as h(x) def = 1/m(x). This provides us a way to approximate the tail probability &#934;(-x) by &#981;(x), which has an explicit form. In view of (82) and ( <ref type="formula">131</ref> </p><p>On the other hand, if &#945; p &lt; 2, then for large enough p, it holds that &#963; p &#8805; 1 log p . Choose L in (73) to be L = &#8970;5 log p&#8971; . Without loss of generality, assume Lk is odd (otherwise we add L by </p></div></body>
		</text>
</TEI>
