skip to main content


Title: Random Smoothing Might be Unable to Certify L_infinity Robustness for High-Dimensional Images
We show a hardness result for random smoothing to achieve certified adversarial robustness against attacks in the ℓp ball of radius ϵ when p>2. Although random smoothing has been well understood for the ℓ2 case using the Gaussian distribution, much remains unknown concerning the existence of a noise distribution that works for the case of p>2. This has been posed as an open problem by Cohen et al. (2019) and includes many significant paradigms such as the ℓ∞ threat model. In this work, we show that any noise distribution D over R^d that provides ℓp robustness for all base classifiers with p>2 must satisfy E[η_i^2]= Ω(d^(1−2/p) ϵ^2 (1−δ)/δ^2) for 99% of the features (pixels) of vector η∼D, where ϵ is the robust radius and δ is the score gap between the highest-scored class and the runner-up. Therefore, for high-dimensional images with pixel values bounded in [0,255], the required noise will eventually dominate the useful information in the images, leading to trivial smoothed classifiers.  more » « less
Award ID(s):
1815011
NSF-PAR ID:
10202005
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Journal of machine learning research
Volume:
21
Issue:
211
ISSN:
1532-4435
Page Range / eLocation ID:
1-21
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We show a hardness result for random smoothing to achieve certified adversarial robustness against attacks in the ℓp ball of radius ϵ when p>2. Although random smoothing has been well understood for the ℓ2 case using the Gaussian distribution, much remains unknown concerning the existence of a noise distribution that works for the case of p>2. This has been posed as an open problem by Cohen et al. (2019) and includes many significant paradigms such as the ℓ∞ threat model. In this work, we show that any noise distribution D over Rd that provides ℓp robustness for all base classifiers with p>2 must satisfy E[η_i^2]=Ω(d^(1−2/p) ϵ^2(1−δ)/δ^2) for 99% of the features (pixels) of vector η∼D, where ϵ is the robust radius and δ is the score gap between the highest-scored class and the runner-up. Therefore, for high-dimensional images with pixel values bounded in [0,255], the required noise will eventually dominate the useful information in the images, leading to trivial smoothed classifiers. 
    more » « less
  2. null (Ed.)
    Randomized smoothing, using just a simple isotropic Gaussian distribution, has been shown to produce good robustness guarantees against ℓ2-norm bounded adversaries. In this work, we show that extending the smoothing technique to defend against other attack models can be challenging, especially in the high-dimensional regime. In particular, for a vast class of i.i.d.~smoothing distributions, we prove that the largest ℓp-radius that can be certified decreases as O(1/d12−1p) with dimension d for p>2. Notably, for p≥2, this dependence on d is no better than that of the ℓp-radius that can be certified using isotropic Gaussian smoothing, essentially putting a matching lower bound on the robustness radius. When restricted to {\it generalized} Gaussian smoothing, these two bounds can be shown to be within a constant factor of each other in an asymptotic sense, establishing that Gaussian smoothing provides the best possible results, up to a constant factor, when p≥2. We present experimental results on CIFAR to validate our theory. For other smoothing distributions, such as, a uniform distribution within an ℓ1 or an ℓ∞-norm ball, we show upper bounds of the form O(1/d) and O(1/d1−1p) respectively, which have an even worse dependence on d. 
    more » « less
  3. Randomized smoothing, using just a simple isotropic Gaussian distribution, has been shown to produce good robustness guarantees against ℓ2-norm bounded adversaries. In this work, we show that extending the smoothing technique to defend against other attack models can be challenging, especially in the high-dimensional regime. In particular, for a vast class of i.i.d. smoothing distributions, we prove that the largest ℓp-radius that can be certified decreases as O(1/d12−1p) with dimension d for p>2. Notably, for p≥2, this dependence on d is no better than that of the ℓp-radius that can be certified using isotropic Gaussian smoothing, essentially putting a matching lower bound on the robustness radius. When restricted to generalized Gaussian smoothing, these two bounds can be shown to be within a constant factor of each other in an asymptotic sense, establishing that Gaussian smoothing provides the best possible results, up to a constant factor, when p≥2. We present experimental results on CIFAR to validate our theory. For other smoothing distributions, such as, a uniform distribution within an ℓ1 or an ℓ∞-norm ball, we show upper bounds of the form O(1/d) and O(1/d1−1p) respectively, which have an even worse dependence on d. 
    more » « less
  4. null (Ed.)
    We consider the communication complexity of a number of distributed optimization problems. We start with the problem of solving a linear system. Suppose there is a coordinator together with s servers P1, …, Ps, the i-th of which holds a subset A(i) x = b(i) of ni constraints of a linear system in d variables, and the coordinator would like to output an x ϵ ℝd for which A(i) x = b(i) for i = 1, …, s. We assume each coefficient of each constraint is specified using L bits. We first resolve the randomized and deterministic communication complexity in the point-to-point model of communication, showing it is (d2 L + sd) and (sd2L), respectively. We obtain similar results for the blackboard communication model. As a result of independent interest, we show the probability a random matrix with integer entries in {–2L, …, 2L} is invertible is 1–2−Θ(dL), whereas previously only 1 – 2−Θ(d) was known. When there is no solution to the linear system, a natural alternative is to find the solution minimizing the ℓp loss, which is the ℓp regression problem. While this problem has been studied, we give improved upper or lower bounds for every value of p ≥ 1. One takeaway message is that sampling and sketching techniques, which are commonly used in earlier work on distributed optimization, are neither optimal in the dependence on d nor on the dependence on the approximation ε, thus motivating new techniques from optimization to solve these problems. Towards this end, we consider the communication complexity of optimization tasks which generalize linear systems, such as linear, semi-definite, and convex programming. For linear programming, we first resolve the communication complexity when d is constant, showing it is (sL) in the point-to-point model. For general d and in the point-to-point model, we show an Õ(sd3L) upper bound and an (d2 L + sd) lower bound. In fact, we show if one perturbs the coefficients randomly by numbers as small as 2−Θ(L), then the upper bound is Õ(sd2L) + poly(dL), and so this bound holds for almost all linear programs. Our study motivates understanding the bit complexity of linear programming, which is related to the running time in the unit cost RAM model with words of O(log(nd)) bits, and we give the fastest known algorithms for linear programming in this model. Read More: https://epubs.siam.org/doi/10.1137/1.9781611975994.106 
    more » « less
  5. Concentration of measure has been argued to be the fundamental cause of adversarial vulnerability. Mahloujifar et al. (2019) presented an empirical way to measure the concentration of a data distribution using samples, and employed it to find lower bounds on intrinsic robustness for several benchmark datasets. However, it remains unclear whether these lower bounds are tight enough to provide a useful approximation for the intrinsic robustness of a dataset. To gain a deeper understanding of the concentration of measure phenomenon, we first extend the Gaussian Isoperimetric Inequality to non-spherical Gaussian measures and arbitrary ℓp-norms (p ≥ 2). We leverage these theoretical insights to design a method that uses half-spaces to estimate the concentration of any empirical dataset under ℓp-norm distance metrics. Our proposed algorithm is more efficient than Mahloujifar et al. (2019)‘s, and experiments on synthetic datasets and image benchmarks demonstrate that it is able to find much tighter intrinsic robustness bounds. These tighter estimates provide further evidence that rules out intrinsic dataset concentration as a possible explanation for the adversarial vulnerability of state-of-the-art classifiers. 
    more » « less