<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>02/05/2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10181769</idno>
					<idno type="doi"></idno>
					<title level='j'>International Conference on Machine Learning</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Aounon Kumar</author><author>Alexander Levine</author><author>Tom Goldstein</author><author>Soheil Feizi</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Randomized smoothing, using just a simple isotropic Gaussian distribution, has been shown to produce good robustness guarantees against ℓ2-norm bounded adversaries. In this work, we show that extending the smoothing technique to defend against other attack models can be challenging, especially in the high-dimensional regime. In particular, for a vast class of i.i.d. smoothing distributions, we prove that the largest ℓp-radius that can be certified decreases as O(1/d12−1p) with dimension d for p>2. Notably, for p≥2, this dependence on d is no better than that of the ℓp-radius that can be certified using isotropic Gaussian smoothing, essentially putting a matching lower bound on the robustness radius. When restricted to generalized Gaussian smoothing, these two bounds can be shown to be within a constant factor of each other in an asymptotic sense, establishing that Gaussian smoothing provides the best possible results, up to a constant factor, when p≥2. We present experimental results on CIFAR to validate our theory. For other smoothing distributions, such as, a uniform distribution within an ℓ1 or an ℓ∞-norm ball, we show upper bounds of the form O(1/d) and O(1/d1−1p) respectively, which have an even worse dependence on d.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Deep neural networks, especially in image classification tasks, have been shown to be vulnerable to adversarial perturbations of the input that are unnoticeable to a human observer but can alter the prediction of the model <ref type="bibr">(Szegedy et al., 2014)</ref>. These examples are generated by optimizing a loss function for a trained network over the input features within a small neighborhood of an example input. Gradient based methods such as FGSM <ref type="bibr">(Goodfellow et al., 2015)</ref> and projected gradient descent <ref type="bibr">(Madry et al., 2018)</ref> have been shown to be very effective for this purpose. In the last couple of years, several heuristic methods have been proposed to detect and/or defend against attacks from specific types of adversaries <ref type="bibr">(Buckman et al., 2018;</ref><ref type="bibr">Guo et al., 2018;</ref><ref type="bibr">Dhillon et al., 2018;</ref><ref type="bibr">Li &amp; Li, 2017;</ref><ref type="bibr">Grosse et al., 2017;</ref><ref type="bibr">Gong et al., 2017)</ref>. Such defenses, however, have been shown to break down against more powerful attacks <ref type="bibr">(Carlini &amp; Wagner, 2017;</ref><ref type="bibr">Athalye et al., 2018;</ref><ref type="bibr">Uesato et al., 2018)</ref>. This necessitates developing classifiers with robustness guarantees. Several convex relaxation-based techniques have been proposed to design certifiably robust classifiers <ref type="bibr">(Wong &amp; Kolter, 2018;</ref><ref type="bibr">Raghunathan et al., 2018;</ref><ref type="bibr">Singla &amp; Feizi, 2019;</ref><ref type="bibr">Chiang et al., 2020)</ref> whose predictions are guaranteed to remain constant within a certified neighborhood around the input point, thereby eliminating the presence of any adversarial example in that region. However, the ever-increasing complexity of deep neural networks has made it difficult to scale these methods meaningfully to high-dimensional datasets like ImageNet.</p><p>To deal with the scalability issue in certifiable robustness, a line of work has been introduced based on randomized robustness <ref type="bibr">(L&#233;cuyer et al., 2019;</ref><ref type="bibr">Li et al., 2019;</ref><ref type="bibr">Cohen et al., 2019;</ref><ref type="bibr">Salman et al., 2019)</ref> wherein an arbitrary base classifier is made more robust by averaging its prediction over random perturbations of the input point within its neighborhood. <ref type="bibr">Cohen et al. (2019)</ref> proved the first tight robustness guarantee for Gaussian smoothing for an 2 -norm bounded adversary.</p><p>In this work, however, we show that extending the smoothing technique to defend against higher-norm attacks, especially in the high-dimensional regime, can be challenging. In particular, for a general class of i.i.d. smoothing distributions, we show that, for p &gt; 2, the largest p -radius that can be certified (denoted by r * p ) decreases with the number of dimensions d as O(1/d 1 2 -<ref type="foot">foot_0</ref> p ). Note that the special case of p = 2 does not suffer from such dependency on d. This makes smoothing-based robustness bounds weak against p adversarial attacks for large p, especially, for &#8734; because as p &#8594; &#8734; the dependence on d becomes O(1/ &#8730; d). Moreover, we show that the dependence of the robustness certificate on d using a general i.i.d. smoothing distribution is similar to that of the standard Gaussian smoothing, even for p &gt; 2. This implies that Gaussian smoothing essentially provides arXiv:2002.03239v1 <ref type="bibr">[cs.</ref>LG] 8 Feb 2020 the best possible robustness certificate result in terms of the dependence on d even for p &gt; 2.</p><p>To be more precise, suppose we smooth a classifier by randomly sampling points surrounding an image x, and observing the labels assigned to these points. Let p 1 (x) and p 2 (x) be the probabilities of the first and second most probable labels under the smoothing distribution. We prove the following bounds on the robustness certificate:</p><p>1. When points are sampled by adding i.i.d. noise to each dimension in x with &#963; 2 variance and continuous support, we prove the certified p radius bound</p><p>2. When smoothing with a generalized Gaussian distributions with variance &#963; 2 (which includes Laplacian, Gaussian, and uniform distributions), we prove that</p><p>,</p><p>where c is a constant less than two for a sufficiently large d and e -d/4 &lt; p 2 (x) &#8804; p 1 (x) &lt; 1 -e -d/4 . When d is large, these bounds do not impact the range of values that p 1 (x) and p 2 (x) can take in a significant way. See Theorem 2.</p><p>3. We also study smoothing techniques where the distribution is uniform over a region around the input point.</p><p>When smoothed over an &#8734; ball of radius b, i.e. uniform i.i.d between -b and b in each dimension, we show that</p><p>where &#963; 2 = b 2 /3 is the variance in each dimension. See Theorem 4. Note that this bound is independent of p 1 (x) and p 2 (x).</p><p>4. For smoothing uniformly over an 1 ball of the same radius b, we achieve an even stronger bound:</p><p>See Theorem 5 for details. Along with being independent of p 1 (x) and p 2 (x), it is also independent of p. Thus, it holds for any p-norm bounded adversary. Note that, unlike the other smoothing distributions we have considered, the uniform 1 smoothing is not i.i.d. in every dimension.</p><p>These bounds hold for any p &gt; 0, but are too weak to offer meaningful insights when p &lt; 2 in the first two cases and for p &lt; 1 in the third one. Moreover, it is straightforward to show that, for p &#8805; 2, the following p -radius can be certified using <ref type="bibr">Cohen et al.'s (2019)</ref> Gaussian smoothing:</p><p>which has the same dependence on d as the upper bound obtained using i.i.d. smoothing. This radius is asymptotically only a constant factor away from the upper bound for the generalized Gaussian distribution, showing that this family of distributions fails to outperform standard Gaussian smoothing in high dimensions. To the best of our knowledge, these bounds form the first results on the limitations of randomized smoothing in the high dimensional regime that cover an extensive range of natural and commonly used smoothing distributions. We provide empirical evidence to support our claims on the CIFAR-10 dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Preliminaries and Notation</head><p>Let h be a classifier that maps inputs from R d to classes in C. Let P be a (smoothing) probability distribution in R d . We define a smoothed classifier h as below:</p><p>We refer to the process of smoothing using distribution P as P-smoothing. Let p c (x) be the output probability of the base classifier for the class c. That is,</p><p>Without loss of generality, we assume that p 1 (x) and p 2 (x) are the probabilities of the first and second most likely classes, respectively.</p><p>For p &gt; 0, we say a smoothing distribution P achieves a certified p -norm radius of r p if, for a base classifier h and an input x,</p><p>For instance, as derived in <ref type="bibr">(Cohen et al., 2019)</ref>, the Gaussian smoothing distribution N (0, &#963; 2 I) achieves a certified 2norm radius of &#963; 2 (&#934; -1 (p 1 (x)) -&#934; -1 (p 2 (x))) where &#934; -1 is the inverse of the standard Gaussian CDF.</p><p>For p 1 , p 2 &#8712; (0, 1), such that, p 1 &#8805; p 2 , let r * p denote the largest r p that can be certified using P-smoothing for all classifiers satisfying p 1 (x) = p 1 and p 2 (x) = p 2 . If we can show a classifier h in this class and two points x, x &#8712; R d , such that, h(x) = h(x ), then r * p &#8804; x -x p . We use this fact to show upper bounds on the largest p-norm radius that can be certified using a given class of distributions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">General i.i.d. Smoothing</head><p>We set the P to be a smoothing distribution I where each coordinate of &#8710; is sampled independently and identically from a symmetric distribution with zero mean, &#963; 2 variance with a continuous support. We prove the following theorem: Theorem 1. For distribution <ref type="table">I</ref> and<ref type="table">for p 1 , p 2 &#8712; (0, 1), such  that, p 1 &#8805; 1/2</ref> and<ref type="table">p 1 +p 2 &#8804; 1</ref>, the largest p -radius r * p that can be certified for all classifiers satisfying p 1 (x) = p 1 and p 2 (x) = p 2 under I-smoothing at input point x is bounded as:</p><p>Proof. Let Z i be the random variable modelling the i th coordinate of &#8710;. Define a random variable S = d i=1 Z i . It is straightforward to show that this random variable is distributed symmetrically with zero mean, d&#963; 2 variance and a continuous support. The key intuition behind this proof is that the random variable S, which is the sum of d identical and independent random variables, will tend towards a Gaussian distribution for large values of d, making the distribution I suffer from some of the same limitations as the Gaussian distribution.</p><p>To simplify our analysis, we move our frame of reference so that x is at the origin. Therefore, r *</p><p>We pick s 1 , s 2 &#8712; R + such that, P(S &#8804; s 1 ) = p 1 (x) (this requires p 1 (x) &#8805; 1/2) and P(S &#8805; s 2 ) = p 2 (x). Let x be the point with every coordinate equal to and so,</p><p>Figure <ref type="figure">1</ref> illustrates how the probabilities of the top two classes change as we move from x to x .</p><p>Applying Chebyshev's inequality on S, we have:</p><p>The value of s for which d&#963; 2 2s 2 = p 2 (x) must be an upperbound on s 2 .</p><p>Similarly, since P(S &#8805; s 1 ) = 1 -p 1 (x), Substituting the above bounds for s 1 and s 2 in (3), proves Theorem (1):</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Generalized Gaussian Smoothing</head><p>We now restrict ourselves to the class of generalized Gaussian distributions that subsumes some commonly used and natural smoothing distributions such as Gaussian, Laplacian and uniform distributions. Using a similar approach as in the previous section, we obtain tighter upper bounds on r * p by restricting the smoothing distribution to generalized Gaussian. In this class of distributions, each coordinate is sampled independently from the following distribution:</p><p>where b &gt; 0 is the scale parameter, q &gt; 0 is the shape parameter and C is the normalizing constant</p><p>where &#915;(.) is the gamma function. The mean of this distribution is at zero and the variance &#963; 2 can be calculated as</p><p>Substituting C from (4) leads to</p><p>Note that the class of generalised Gaussian distributions is a subset of the class of i.i.d. smoothing distributions considered in the previous section. The joint probability distribution over all the d dimensions can be expressed as:</p><p>, which for q = 1, 2 represents Laplace and Gaussian distributions, respectively. As q &#8594; &#8734;, this distribution approximates the uniform distribution over [-b, b] d . For a finite q, the level sets of the above p.d.f. define sets with constant q -norm. Let G be a generalised Gaussian distribution with q &#8805; 1. The following theorem holds:</p><p>Theorem 2. For distribution G and for e -d/4 &lt; p 2 &#8804; p 1 &lt; 1 -e -d/4 and p 1 + p 2 &#8804; 1, the largest p -radius r * p that can be certified for all classifiers satisfying p 1 (x) = p 1 and p 2 (x) = p 2 under G-smoothing at input point x, is bounded as:</p><p>We provide a brief proof sketch for this theorem here. As before, define random variables Z i and S and assume x to be at the origin. Since the above distribution satisfies all the assumptions made in the previous section, we can directly conclude that the bound in (3) holds:</p><p>From here, we strengthen our analysis by replacing Chebyshev's inequality with Chernoff bound.</p><p>for any t &gt; 0. Since S is a sum of independent random variables Z 1 , Z 2 , . . . , Z d sampled from identical distributions,</p><p>where Z is sampled from p(z).</p><p>Lemma 3. For some constant c &lt; 1.85,</p><p>Proof is presented in the appendix.</p><p>Setting t = 1 &#964; &#963; &#8730; d for some &#964; &gt; 0 satisfying c 2 &#964; 2 d &lt; 1, we have:</p><p>for &#964; 2 d &#8805; 16. The value of s for which this expression is equal to p 2 (x) gives us the following upper-bound on s 2 :</p><p>which for &#964; = 2/ log(1/p 2 (x)) gives:</p><p>and similarly, repeating the above analysis and setting &#964; = 2/ log(1/(1 -p 1 (x))), we get:</p><p>Both the above values for &#964; satisfy &#964; 2 d &#8805; 16 due to the restrictions on p 1 and p 2 . Substituting the above bounds for s 1 and s 2 in inequality (3), proves Theorem (2):</p><p>When p 1 (x) is close to one and p 2 (x) is close to zero, this bound is within a constant factor of the Gaussian certificate in equation (1) because &#934; -1 (p) can be lower bounded by &#945; log(1/(1 -p)) + &#946; for some constants &#945; and &#946;. Fig- <ref type="figure">ure</ref> (2) compares the behaviour of the two upper bounds, the one from i.i.d. smoothing u I and the one from generalized Gaussian smoothing u G , with respect to the Gaussian certificate r p obtained in equation ( <ref type="formula">1</ref>). Assuming the binary classification case, for which p 2 (x) = 1 -p 1 (x), we plot the ratios</p><p>which only depend on p 1 (x) and show that the generalized Gaussian bound is much tighter than the i.i.d. bound as p 1 (x) &#8594; 1. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Uniform Smoothing</head><p>In this section, we analyse smoothing distributions that are uniform within a finite region around the input point x. We show stronger upper bounds for r * p when smoothed uniformly over 1 and &#8734; -norm balls. We first consider the &#8734; smoothing distribution which is a limiting case for the generalized Gaussian distribution for q = &#8734;. We set P to be U([-b, +b] d ) which denotes a uniform distribution over the points in [-b, +b] d .</p><p>Theorem 4. For distribution U([-b, +b] d ), the largest pradius r * p that can be certified for all classifiers, is bounded as</p><p>where &#963; 2 = b 2 /3 is the variance in each dimension.</p><p>Proof. Assume x is at origin and let x be a point with every coordinate equal to . Let V 1 and V 2 denote the sets For &#7713; to classify z into class one, we must have:</p><p>Since x p = d 1/p , the optimal radius,</p><p>This shows that for p &gt; 1, &#963; (or b) needs to grow with the number of dimensions d to certify for a meaningfully large p-norm radius. For instance, p = 2 and &#8734;, require &#963; to be &#920;( &#8730; d) and &#920;(d) respectively. However, since inputs can be assumed to come from [0, 1] d (possibly after some scaling and shifting of images), smoothing over distributions with such large variance may significantly lower the performance of the smoothed classifier.</p><p>We now consider the uniform 1 smoothing distribution (denoted by L 1 (b)) where points are sampled uniformly from an 1 -norm ball of radius b. Note that the noise in each dimension is no longer independent.</p><p>Theorem 5. For distribution L 1 (b), the largest p -radius r * p that can be certified for all classifiers, is bounded as</p><p>The following is a proof sketch of the above theorem. Let x be at the origin and x be the point ( , 0, 0, . . . , 0), that is, in the first coordinate and zero everywhere else. Similar to before, let V 1 and V 2 be the sets defined by the 1 balls centered at x and x respectively. Lemma 6. The set V 1 &#8745;V 2 is a subset of an 1 ball of radius b -2 .</p><p>The proof is presented in the appendix.</p><p>As before, let g be a classifier that maps every point in V 1 -V 2 to class one and every point in V 2 -V 1 to class two (figure <ref type="figure">4</ref>). Let &#961; denote the probability that &#7713;(x) samples from V 1 -V 2 , which is equal to the probability that &#7713;(x )</p><p>We us the formula 2 d R d /d! as the volume of a ddimensional 1 ball of radius R. The rest of the analysis is same as that for the &#8734; case and since x p = , we have,</p><p>which proves Theorem 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Experiments</head><p>In order to understand how our results apply to smoothing in practice, we tested the smoothed classification algorithm for CIFAR-10 images with median certified robustness for each classifier using Generalized Gaussian smoothing for different q. For a fixed standard deviation &#963;, the shape of the distribution, controlled by q, has almost no effect on the likelihood that the base classifier returns the correct class.</p><p>proposed by <ref type="bibr">(Cohen et al., 2019)</ref>, using Generalized Gaussian noise in each dimension, rather than Gaussian noise.</p><p>We specifically tested on CIFAR-10 (32 &#215; 32 pixels), as well as scaled-down versions of this dataset (16 &#215; 16 and 8 &#215; 8 pixels), in order to study how our bounds behave as the dimension of the input changes. Although we do not have explicit certificates for these Generalized Gaussian distributions, we are able to compare the upper bounds derived in this work for any possible certificates to the actual certificates for Gaussian smoothing on the same images. Note that we re-trained the classifier on noisy images for each noise distribution and standard deviation &#963;.</p><p>Note also that our main results apply specifically to smoothing based certificates which are functions of only p 1 (x) and p 2 (x) (in theory, larger certificates could be derived if more information is available to the certification algorithm). In reporting the upper bounds on possible empirical certificates, we provide the same inputs to the upper bound as we would provide to the certificate: namely, an empirical lower bound p 1 (x) on p 1 (x), estimated from samples, and an empirical upper bound p 2 (x) on p 2 (x). We are not making claims about the "optimal possible" empirical estimation procedures required to derive the largest possible certificates. We instead regard these bounds, p 1 (x) and p 2 (x), as inputs to the empirical certificate: we are only claiming that, given estimates p 1 (x) and p 2 (x), no certificate will exceed the computed bound. In practice, we use the estimation procedure proposed by <ref type="bibr">(Cohen et al., 2019)</ref>, which first selects a candidate top class label using a small number of samples, then uses a large number of samples (100, 000 in our experiments) to compute p 1 (x) based on a binomial distribution. p 2 (x) is then taken as 1 -p 1 (x). Then, for the sake of our experiments, the only empirical input to our bound is the estimate of p 1 (x).</p><p>Figure <ref type="figure">6</ref>. Upper bounds for certifying with Generalized Gaussian noise (&#963; = .12) on unaltered (32 &#215; 32) CIFAR-10 images, with q = p, compared with certificates using Gaussian noise directly. At this noise level, p1(x) is high enough for the Generalized Gaussian bound to be tighter than the i.i.d. distribution bound. Panel (a) shows the certificates and the bounds directly, while (b) shows the ratio between the tighter Generalized Gaussian bound and the certificate. One interesting result is that the distribution of noise added in each dimension seems to be largely irrelevant to determining p 1 (x) (Figure <ref type="figure">5</ref>). It is the variance of the noise added, not the specific choice of noise distribution, that determines p 1 (x). This paints an even bleaker picture for the possibility of smoothing for high p-norm robustness than our theoretical results alone can: Theorems 1 and 2 still depend on p 1 (x) and p 2 (x) for the particular noise distribution This leaves open the possibility that certain choices of noise distributions could yield values of p 1 (x) large enough to counteract the scaling with p. However, empirically, we find that this is not the case: for a fixed &#963;, p 1 (x) does not depend on the shape of the smoothing distribution.</p><p>For example, one might attempt to use smoothing with q = p in order to certify for the p norm, so that the level sets of the smoothing distribution correspond to p balls around x. This is the technique used for 1 certification by <ref type="bibr">(L&#233;cuyer et al., 2019)</ref>, and for 2 certification by <ref type="bibr">(Cohen et al., 2019)</ref>. However, we find (Figures <ref type="figure">6,</ref><ref type="figure">7</ref>) that, as anticipated by Figure <ref type="figure">2</ref>, for p &gt; 2, this can only achieve at best a constant factor improvement in certified robustness compared to simply using Gaussian smoothing with the certificate from <ref type="bibr">(Cohen et al., 2019)</ref> and applying equivalence of norms (Equation <ref type="formula">1</ref>). Note that, as shown in Figure <ref type="figure">5</ref>, it was only for the lowest level of noise tested (&#963; = .12) and the highest resolution images tested (32&#215;32) that p 1 (x) was sufficiently close to 1 for the Generalized Gaussian bound to be tighter than the i.i.d. distribution bound (Figure <ref type="figure">6</ref>). For all other configurations (Figure <ref type="figure">7</ref>, other plots are given in supplementary materials) the i.i.d. bound is tighter.</p><p>In the case of Gaussian smoothing, <ref type="bibr">(Cohen et al., 2019)</ref> makes an argument that, as image resolution increases, the base classifier will become more tolerant to noise, because information will be redundantly encoded in the additional pixels. This should allow us to increase the magnitude of the smoothing variance &#963; 2 proportionally to d. It is  because by average-pooling back down a large image to a low-resolution one, the variance in each pixel of the smaller image will decrease proportionally with d. Then, if it is possible to classify noisy images at the lower resolution with a certain accuracy p 1 (x), it should be possible to classify images at the higher resolution with higher levels of noise. This increase in the amount of noise that can be added to high resolution images (to obtain roughly the same accuracy to that of low resolution ones) will cancel out the decrease in the robustness radius due to the curse of dimensionality explained in this paper. It is because based on Equation <ref type="formula">1</ref>, if &#963; is allowed to scale with &#8730; d with p 1 (x) and p 2 (x) unchanged, then the certified radius should even remain constant with d in the &#8734; case.</p><p>For image datasets that are identical except for a scaling factor, we observe a related phenomenon: for a fixed noise variance, p 1 (x) tends to increase with the resolution of the image (i.e., the dimensionality of the input), and therefore the certified radii tend to increase with d in the p = 2 case. In Figure <ref type="figure">8</ref>, we show that, for p &gt; 2, this increase is enough to counteract the inverse scaling with d in Equation 1, at least in the case of low-resolution CIFAR-10 images. In other words, we still get larger certificates for larger-resolution images, simply because our base classifier becomes more accurate on noisy images as resolution increases. We emphasize that this is using the standard Gaussian noise: we have demonstrated that other i.i.d distributions will not give significantly better certificates.</p><p>The above setup, however, is an artificial scenario. In the real world, higher-resolution datasets are typically used for classification tasks which could not be accomplished with high accuracy at a lower resolution. As shown in Figure <ref type="figure">9</ref>, if we compare, for a fixed &#963;, a real-world low dimensional classification task (CIFAR-10, d = 3072) to a high dimensional classification task (ImageNet, d = 150528), we see that the certified radius (and therefore p 1 (x)), does not substantially increase with higher resolution. Therefore, for higher p-norms, the certified radius decreases with dimension with a scaling nearly as extreme as the explicit d (1/2-1/p) factor in Equation 1. Therefore, in practice, the curse of dimensionality can be observed as explained in this paper and it cannot be overcome using a novel choice of i.i.d. smoothing distribution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusion</head><p>In this work, we demonstrated some limitations of common smoothing distributions for p -norm bounded adversaries when p &gt; 2. We partially answer the question, raised in <ref type="bibr">(Cohen et al., 2019)</ref>, whether smoothing techniques similar to Gaussian smoothing can be employed to achieve certifiable robustness guarantees for a general p -norm bounded adversary. Most i.i.d. smoothing distributions fail to yield good robustness guarantees in the high-dimensional regime against p -norm bounded attacks when p &gt; 2. Their performance is no better than that of Gaussian smoothing up to a constant factor. While a constant factor improvement in performance could be critical in certain applications, the focus of this work is on the effect of dimensionality on certified robustness. We note that, in our analysis, we focus on i.i.d. and symmetric smoothing distributions. Our analysis highlights the importance of developing input-dependent smoothing techniques rather than the current smoothing methods based on i.i.d. distributions. A. Proof for lemma 3</p><p>Proof. Applying the series expansion of e tZ , we get,</p><p>(1 + (-1) n )z n e -z q /b q dz = 0, n is odd 2 C &#8734; 0 z n e -z q /b q dz, n is even</p><p>When n is even:</p><p>Therefore, keeping only the terms with even n in the expansion of E[e tZ ], we get:</p><p>for some positive constant c &lt; 1.85, because, &#915;(1/q) &#915;(3/q) = 3q&#915;(1 + 1/q) q&#915;(1 + 3/q) (using &#915;(z + 1) = z&#915;(z)) = 3&#915;(1 + 1/q) &#915;(1 + 3/q) &lt; 1.85 2 (for q &#8805; 1, &#915;(1 + 1/q) &#8804; 1 and &#915;(1 + 3/q) &gt; 0.88)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Proof for lemma 6</head><p>Proof. The points in V 1 satisfy the following 2 d constraints:</p><p>Similarly, points in V 2 satisfy,</p><p>Then, the points in V 1 &#8745; V 2 must satisfy the following set of constraints constructed by picking constraints that have a + sign for x 1 in the first set of constraints and asign for x 1 in the second set.</p><p>Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness</p><p>They may be rewritten as,</p><p>which define an 1 ball of radius b -/2 centered at ( /2, 0, . . . , 0), that is, /2 in the first coordinate and zero everywhere else.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>University of Maryland, College Park, Maryland, USA. Correspondence to: Aounon Kumar &lt;aounon@umd.edu&gt;, Soheil Feizi &lt;sfeizi@cs.umd.edu&gt;.</p></note>
		</body>
		</text>
</TEI>
