<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Mixing Classifiers to Alleviate the Accuracy-Robustness Trade-Off</title></titleStmt>
			<publicationStmt>
				<publisher>PMLR</publisher>
				<date>07/15/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10567036</idno>
					<idno type="doi"></idno>
					
					<author>Yatong Bai</author><author>Brendon Anderson</author><author>Somayeh Sojoudi</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Mixing Classifiers to Alleviate the Accuracy-Robustness Trade-Off]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In recent years, high-performance machine learning models have been employed in various control settings, including reinforcement learning for dynamic systems with uncertainty <ref type="bibr">(Levine et al., 2016;</ref><ref type="bibr">Sutton and Barto, 2018)</ref> and autonomous driving <ref type="bibr">(Bojarski et al., 2016;</ref><ref type="bibr">Wu et al., 2017)</ref>. However, models such as neural networks have been shown to be vulnerable to adversarial attacks, which are imperceptibly small input data alterations maliciously designed to cause failure <ref type="bibr">(Szegedy et al., 2014;</ref><ref type="bibr">Nguyen et al., 2015;</ref><ref type="bibr">Huang et al., 2017;</ref><ref type="bibr">Eykholt et al., 2018;</ref><ref type="bibr">Liu et al., 2019)</ref>. This vulnerability makes such models unreliable for safety-critical control where guaranteeing robustness is necessary. In response, "adversarial training (AT)" <ref type="bibr">(Kurakin et al., 2017;</ref><ref type="bibr">Goodfellow et al., 2015;</ref><ref type="bibr">Bai et al., 2022a,b;</ref><ref type="bibr">Zheng et al., 2020;</ref><ref type="bibr">Zhang et al., 2019)</ref> have been studied to alleviate the susceptibility. AT builds robust neural networks by training on adversarially attacked data.</p><p>A parallel line of work focuses on mathematically certified robustness <ref type="bibr">(Anderson et al., 2020;</ref><ref type="bibr">Ma and Sojoudi, 2021;</ref><ref type="bibr">Anderson and Sojoudi, 2022a</ref>). Among these methods, "randomized smoothing (RS)" is a particularly popular one that seeks to achieve certified robustness by processing intentionally corrupted data at inference time <ref type="bibr">(Cohen et al., 2019;</ref><ref type="bibr">Li et al., 2019;</ref><ref type="bibr">Pfrommer et al., 2023)</ref>, and has recently been applied to robustify reinforcement learning-based control strategies <ref type="bibr">(Kumar et al., 2022;</ref><ref type="bibr">Wu et al., 2022)</ref>. The recent work <ref type="bibr">(Anderson and Sojoudi, 2022b)</ref> has shown that "locally biased smoothing," which robustifies the model locally based on the input test datum, outperforms the traditional RS with fixed smoothing noise. However, <ref type="bibr">Anderson and Sojoudi (2022b)</ref> only focus on binary classification problems, significantly limiting the applications. Moreover, <ref type="bibr">Anderson and Sojoudi (2022b)</ref> rely on the robustness of a K-nearest-neighbor (K-NN) classifier, which suffers from a lack of representation power when applied to harder problems and becomes a bottleneck.</p><p>While some works have shown that there exists a fundamental trade-off between accuracy and robustness <ref type="bibr">(Tsipras et al., 2019;</ref><ref type="bibr">Zhang et al., 2019)</ref>, recent research has argued that it should be possible to simultaneously achieve robustness and accuracy on benchmark datasets <ref type="bibr">(Yang et al., 2020)</ref>. To this end, variants of AT that improve the accuracy-robustness trade-off have been proposed, including TRADES <ref type="bibr">(Zhang et al., 2019)</ref>, Interpolated Adversarial Training <ref type="bibr">(Lamb et al., 2019)</ref>, and many others <ref type="bibr">(Raghunathan et al., 2020;</ref><ref type="bibr">Zhang and Wang, 2019;</ref><ref type="bibr">Tram&#232;r et al., 2018;</ref><ref type="bibr">Balaji et al., 2019)</ref>. However, even with these improvements, degraded clean accuracy is often an inevitable price of achieving robustness. Moreover, standard non-robust models often achieve enormous performance gains by pre-training on larger datasets, whereas the effect of pre-training on robust classifiers is less understood and may be less prominent <ref type="bibr">(Chen et al., 2020;</ref><ref type="bibr">Fan et al., 2021)</ref>.</p><p>This work makes a theoretically disciplined step towards robustifying models without sacrificing clean accuracy. Specifically, we build upon locally biased smoothing and replace its underlying K-NN classifier with a robust neural network that can be obtained via various existing methods. We then modify how the standard base model (a highly accurate but possibly non-robust neural network) and the robust base model are "mixed" accordingly. The resulting formulation, to be introduced in Section 3, is a convex combination of the output probabilities from the two base classifiers. We prove that, when the robust network has a bounded Lipschitz constant or is built via RS, the mixed classifier also has a closed-form certified robust radius. More importantly, our method achieves an empirical robustness level close to that of the robust base model while approaching the standard base model's clean accuracy. This desirable behavior significantly improves the accuracy-robustness trade-off, especially for tasks where standard models noticeably outperform robust models on clean data.</p><p>Note that we do not make any assumptions about how the standard and robust base models are obtained (can be AT, RS, or others), nor do we assume the adversarial attack type and budget. Thus, our mixed classification scheme can take advantage of pre-training on large datasets via the standard base classifier and benefit from ever-improving robust training methods via the robust base classifier.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background and related works</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Notations</head><p>The &#8467; p norm is denoted by &#8741;&#8226;&#8741; p , while &#8741;&#8226;&#8741; p * denotes its dual norm. The matrix I d denotes the identity matrix in R d&#215;d . For a scalar a, sgn(a) &#8712; {-1, 0, 1} denotes its sign. For a natural number c, the set [c] is defined as {1, 2, . . . , c}. For an event A, the indicator function I(A) evaluates to 1 if A takes place and 0 otherwise. The notation P X&#8764;S [A(X)] denotes the probability for an event A(X) to occur, where X is a random variable drawn from the distribution S. The normal distribution on R d with mean x and covariance &#931; is written as N (x, &#931;). We denote the cumulative distribution function of N (0, 1) on R by &#934; and write its inverse function as &#934; -1 .</p><p>Consider a model g : R d &#8594; R c , whose components are</p><p>where d is the dimension of the input and c is the number of classes. In this paper, we assume that g(&#8226;) does not have the desired level of robustness, and refer to it as a "standard model", as opposed to a "robust model" which we denote as h(&#8226;). We consider &#8467; p norm-bounded attacks on differentiable neural networks. A classifier f : R d &#8594; [c], defined as f (x) = arg max i&#8712;[c] g i (x), is considered robust against adversarial attacks at an input datum x &#8712; R d if it assigns the same class to all perturbed inputs x + &#948; such that &#8741;&#948;&#8741; p &#8804; &#1013;, where &#1013; &#8805; 0 is the attack radius.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Related Adversarial Attacks and Defenses</head><p>The fast gradient sign method (FGSM) and projected gradient descent (PGD) attacks based on differentiating the cross-entropy loss are highly effective and have been considered the most standard attacks for evaluating robust models <ref type="bibr">(Madry et al., 2018;</ref><ref type="bibr">Goodfellow et al., 2015)</ref>. To exploit the structures of the defense methods, adaptive attacks have also been introduced <ref type="bibr">(Tram&#232;r et al., 2020)</ref>.</p><p>On the defense side, while AT <ref type="bibr">(Madry et al., 2018)</ref> and TRADES <ref type="bibr">(Zhang et al., 2019)</ref> have seen enormous success, such methods are often limited by a significantly larger amount of required training data <ref type="bibr">(Schmidt et al., 2018)</ref> and a decrease in generalization capability. Initiatives that construct more effective training data via data augmentation <ref type="bibr">(Rebuffi et al., 2021;</ref><ref type="bibr">Gowal et al., 2021)</ref> and generative models <ref type="bibr">(Sehwag et al., 2022)</ref> have successfully produced more robust models. Improved versions of AT <ref type="bibr">(Jia et al., 2022;</ref><ref type="bibr">Shafahi et al., 2019)</ref> have also been proposed.</p><p>Previous initiatives that aim to enhance the accuracy-robustness trade-off include using alternative attacks during training <ref type="bibr">(Pang et al., 2022)</ref>, appending early-exit side branches to a single network <ref type="bibr">(Hu et al., 2020)</ref>, and applying AT for regularization <ref type="bibr">(Zheng et al., 2021)</ref>. Moreover, ensemble-based defenses, such as random ensemble <ref type="bibr">(Liu et al., 2018)</ref> and diverse ensemble <ref type="bibr">(Pang et al., 2019;</ref><ref type="bibr">Alam et al., 2022)</ref>, have been proposed. In comparison, this work considers two separate classifiers and uses their synergy to improve the accuracy-robustness trade-off, achieving higher performances.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Locally Biased Smoothing</head><p>Randomized smoothing, popularized by <ref type="bibr">(Cohen et al., 2019)</ref>, achieves robustness at inference time by replacing f</p><p>where S is a smoothing distribution. A common choice for S is a Gaussian distribution. <ref type="bibr">Anderson and Sojoudi (2022b)</ref> have recently argued that data-invariant RS does not always achieve robustness. They have shown that in the binary classification setting, RS with an unbiased distribution is suboptimal, and an optimal smoothing procedure shifts the input point in the direction of its true class. Since the true class is generally unavailable, a "direction oracle" is used as a surrogate. This "locally biased smoothing" method is no longer randomized and outperforms traditional data-blind RS. The locally biased smoothed classifier, denoted h &#947; : R d &#8594; R, is obtained via the deterministic calculation h &#947; (x) = g(x) + &#947;h(x)&#8741;&#8711;g(x)&#8741; p * , where h(x) &#8712; {-1, 1} is the direction oracle and &#947; &#8805; 0 is a trade-off parameter. The direction oracle should come from an inherently robust classifier (which is often less accurate). In <ref type="bibr">(Anderson and Sojoudi, 2022b)</ref>, this direction oracle is chosen to be a one-nearest-neighbor classifier.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Using a Robust Neural Network as the Smoothing Oracle</head><p>Locally biased smoothing was designed for binary classification, restricting its practicality. Here, we first extend it to the multi-class setting by treating the output of each class, denoted as h &#947; i (x), independently, giving rise to:</p><p>Note that if &#8741;&#8711;g i (x)&#8741; p * is large for some class i, then h &#947; smo1,i (x) can be large for class i even if both g i (x) and h i (x) are small, leading to incorrect predictions. To remove the effect of the gradient magnitude difference across the classes, we propose a normalized formulation as follows:</p><p>84 86 88 90 92 94 96 Clean accuracy of the mixed classifier 0 10 20 30 40 PGD10 attacked accuracy 1, No SoftMax ||&#8711;gi(&#8901;)||p * , No SoftMax ||&#8711; max j gj(&#8901;)||p * , No SoftMax ||&#8711;g(&#8901;)||p * ||&#8711;h(&#8901;)||p * , No SoftMax 1, SoftMax</p><p>&#8226; "No Softmax" represents Option 1, i.e., use the logits for g(&#8226;) and h(&#8226;).</p><p>&#8226; "Softmax" represents Option 2, i.e., use the probabilities for g(&#8226;) and h(&#8226;).</p><p>&#8226; With the best formulation, high clean accuracy can be achieved with very little sacrifice on robustness.</p><p>Figure <ref type="figure">1</ref>: Comparing the "attacked accuracy -clean accuracy" curves for various options for R i (x).</p><p>The parameter &#947; adjusts between clean accuracy and robustness. It holds that h &#947; smo2,i (x) &#8801; g i (x) when &#947; = 0, and h &#947; smo2,i (x) &#8594; h i (x) when &#947; &#8594; &#8734; for all x and all i. With the mixing procedure generalized to the multi-class setting, we now discuss the choice of the smoothing oracle h i (&#8226;). While K-NN classifiers are relatively robust and can be used as the oracle, their representation power is too weak. On the CIFAR-10 image classification task <ref type="bibr">(Krizhevsky, 2012)</ref>, K-NN only achieves around 35% accuracy on clean test data. In contrast, an adversarially trained ResNet can reach 50% accuracy on attacked test data <ref type="bibr">(Madry et al., 2018)</ref>. This lackluster performance of K-NN becomes a significant bottleneck in the accuracy-robustness trade-off of the mixed classifier. To this end, we replace the K-NN model with a robust neural network. The robustness of this network can be achieved via various methods, including AT, TRADES, and RS.</p><p>Further scrutinizing (2) leads to the question of whether &#8741;&#8711;g i (x)&#8741; p * is the best choice for adjusting the mixture of g(&#8226;) and h(&#8226;). This gradient magnitude term is a result of <ref type="bibr">Anderson and Sojoudi (2022b)</ref>'s assumption that h(x) &#8712; {-1, 1}. Here, we no longer have this assumption. Instead, we assume both g(&#8226;) and h(&#8226;) to be differentiable. Thus, we generalize the formulation to</p><p>where R i (x) is an extra scalar term that can potentially depend on both &#8711;g i (x) and &#8711;h i (x) to determine the "trustworthiness" of the base classifiers. Here, we empirically compare four options for R i (x), namely, 1, &#8741;&#8711;g i (x)&#8741; p * , &#8741;&#8711; max j g j (x)&#8741; p * , and</p><p>Another design question is whether g(&#8226;) and h(&#8226;) should be the pre-softmax logits or the postsoftmax probabilities. Note that since most attack methods are designed based on logits, the output of the mixed classifier should be logits rather than probabilities to avoid gradient masking, an undesirable phenomenon that makes it hard to evaluate the robustness properly. Thus, we have the following two options that make the mixed model compatible with existing gradient-based attacks:</p><p>1. Use the logits for both base classifiers, g(&#8226;) and h(&#8226;).</p><p>2. Use the probabilities for both base classifiers, and then convert the mixed probabilities back to logits. The required "inverse-softmax" operator is simply the natural logarithm.</p><p>Figure <ref type="figure">1</ref> visualizes the accuracy-robustness trade-off achieved by mixing logits or probabilities with different R i (x) options. Here, the base classifiers are a pair of standard and adversarially trained ResNet-18s. This "clean accuracy versus PGD 10 -attacked accuracy" plot concludes that R i (x) = 1 gives the best accuracy-robustness trade-off, and g(&#8226;) and h(&#8226;) should be probabilities. Appendix A in the supplementary materials confirms this selection by repeating Figure <ref type="figure">1</ref> with alternative model architectures, different robust base classifier training methods, and various attack budgets.</p><p>Our selection of R i (x) = 1 differs from R i (x) = &#8741;g i (x)&#8741; p * used in <ref type="bibr">(Anderson and Sojoudi, 2022b)</ref>. Intuitively, <ref type="bibr">Anderson and Sojoudi (2022b)</ref> used linear classifiers to motivate estimating the base models' trustworthiness with their gradient magnitudes. When the base classifiers are highly nonlinear neural networks as in our case, while a base classifier's local Lipschitzness correlates with its robustness, its gradient magnitude is not always a good local Lipschitzness estimator. Additionally, Section 3.1 offers theoretical intuitions for selecting mixing probabilities over mixing logits.</p><p>With these design choices implemented, the formulation (3) can be re-parameterized as</p><p>where <ref type="formula">4</ref>), which is a convex combination of base classifier probabilities, as our proposed mixed classifier. Note that (4) calculates the mixed classifier logits, acting as a drop-in replacement for existing models which usually produce logits. Removing the logarithm recovers the output probabilities without changing the predicted class.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Theoretical Certified Robust Radius</head><p>In this section, we derive certified robust radii for the mixed classifier h &#945; (&#8226;) introduced in (4), given in terms of the robustness properties of h(&#8226;) and the mixing parameter &#945;. The results ensure that despite being more sophisticated than a single model, h &#945; (&#8226;) cannot be easily conquered, even if an adversary attempts to adapt its attack methods to its structure. Such guarantees are of paramount importance for reliable deployment in safety-critical control applications.</p><p>Noticing that the base model probabilities satisfy 0 &#8804; g i (&#8226;) &#8804; 1 and 0 &#8804; h i (&#8226;) &#8804; 1 for all i, we introduce the following generalized and tightened notion of certified robustness.</p><p>Definition 1 Consider an arbitrary input x &#8712; R d and let y = arg max i h i (x), &#181; &#8712; [0, 1], and r &#8805; 0. Then, h(&#8226;) is said to be certifiably robust at x with margin &#181; and radius r if h y (x + &#948;) &#8805; h i (x + &#948;) + &#181; for all i &#824; = y and all &#948; &#8712; R d such that &#8741;&#948;&#8741; p &#8804; r.</p><p>, 1] and h(&#8226;) is certifiably robust at x with margin 1-&#945; &#945; and radius r, then the mixed classifier h &#945; (&#8226;) is robust in the sense that</p><p>Proof Suppose that h(&#8226;) is certifiably robust at x with margin 1-&#945; &#945; and radius r.</p><p>Thus, it holds that h &#945; y (x + &#948;) &#8805; h &#945; i (x + &#948;) for all i &#824; = y, and thus arg max i h &#945; i (x + &#948;) = y = arg max i h i (x).</p><p>Intuitively, Definition 1 ensures that all points within a radius from a nominal point have the same prediction as the nominal point, with the difference between the top and runner-up probabilities no smaller than a threshold. For practical classifiers, the robust margin can be straightforwardly estimated by calculating the confidence gap between the predicted and the runner-up classes at an adversarial input obtained with strong attacks.</p><p>While most existing provably robust results consider the special case with zero margin, we will show that models built via common methods are also robust with non-zero margins. We specifically consider two types of popular robust classifiers: Lipschitz continuous models (Theorem 4) and RS models (Theorem 5). Here, Lemma 2 builds the foundation for proving these two theorems, which amounts to showing that Lipschitz and RS models are robust with non-zero margins and thus the mixed classifiers built with them are robust.</p><p>Lemma 2 provides further justifications for using probabilities instead of logits in the mixing operation. Intuitively, it holds that (1 -&#945;)g i (&#8226;) is bounded between 0 and 1 -&#945;, so as long as &#945; is relatively large (specifically, at least 1 2 ), the detrimental effect of g(&#8226;)'s probabilities when subject to attack can be bounded and be overcome by h(&#8226;). Had we used the logits for g i (&#8226;), since this quantity cannot be bounded, it would have been much harder to overcome the vulnerability of g(&#8226;).</p><p>Since we do not make assumptions on the Lipschitzness or robustness of g(&#8226;), Lemma 2 is tight. To understand this, we suppose that there exists some i &#8712; [c]\{y} and &#948; &#824; = 0 such that &#8741;&#948;&#8741; p &#8804; r that make</p><p>In this case, it holds that h &#945; y (x + &#948;) &lt; h &#945; i (x + &#948;), and thus arg max i h &#945; i (x + &#948;) &#824; = arg max i h i (x).</p><p>Assumption 1 The classifier h(&#8226;) is robust in the sense that, for all i &#8712; {1, 2, . . . , n}, h i (&#8226;) is &#8467; p -Lipschitz continuous with Lipschitz constant Lip p (h i ).</p><p>Theorem 4 Suppose that Assumption 1 holds, and let x &#8712; R d be arbitrary. Let y = arg max i h i (x).</p><p>Therefore, h(&#8226;) is certifiably robust at x with margin 1-&#945; &#945; and radius r &#945; p (x). Hence, by Lemma 2, the claim holds.</p><p>We remark that the &#8467; p norm that Theorem 4 certifies may be arbitrary (e.g., &#8467; 1 , &#8467; 2 , or &#8467; &#8734; ), so long as the Lipschitz constant of the robust network h(&#8226;) is computed with respect to the same norm.</p><p>Assumption 1 is not restrictive in practice. For example, Gaussian RS with smoothing variance &#963; 2 I d yields robust models with &#8467; 2 -Lipschitz constant 2 /&#960;&#963; 2 (Salman et al., 2019). Moreover, empirically robust methods such as AT and TRADES often train locally Lipschitz continuous models, even though there may not be closed-form theoretical guarantees.</p><p>Assumption 1 can be relaxed to the even less restrictive scenario of using local Lipschitz constants over a neighborhood (e.g., a norm ball) around a nominal input x (i.e., how flat h(&#8226;) is near x) as a surrogate for the global Lipschitz constants. In this case, Theorem 4 holds for all &#948; within this neighborhood. Specifically, suppose that for an arbitrary input x and an &#8467; p attack radius &#1013;, it holds that h y (x) -</p><p>for all i &#824; = y and all perturbations &#948; such that &#8741;&#948;&#8741; p &#8804; &#1013;. Furthermore, suppose that the robust radius r &#945; p (x), as defined in (5) but use the local Lipschitz constant Lip x p as a surrogate to the global constant Lip p , is not smaller than &#1013;. Then, if the robust base classifier h(&#8226;) is correct at the nominal point x, then the mixed classifier h &#945; (&#8226;) is robust at x within the radius &#1013;. The proof follows that of Theorem 4.</p><p>The relaxed Lipschitzness defined above can be estimated for practical differentiable classifiers via an algorithm similar to the PGD attack <ref type="bibr">(Yang et al., 2020)</ref>. Yang et al. ( <ref type="formula">2020</ref>) also showed that many existing empirically robust models, including those trained with AT or TRADES, are in fact locally Lipschitz. Note that Yang et al. ( <ref type="formula">2020</ref>) evaluated the local Lipschitz constants of the logits, whereas we analyze the probabilities, whose Lipschitz constants are much smaller. Therefore, Theorem 4 provides important insights into the empirical robustness of the mixed classifier.</p><p>An intuitive explanation of Theorem 4 is that if &#945; &#8594; 1, then r &#945; p (x) &#8594; min i&#824; =y</p><p>Lip p (hy)+Lip p (h i ) , which is the standard Lipschitz-based robust radius of h(&#8226;) around x (see <ref type="bibr">(Fazlyab et al., 2019;</ref><ref type="bibr">Hein and Andriushchenko, 2017)</ref> for further discussions on Lipschitz-based robustness). On the other hand, if &#945; is too small in comparison to the relative confidence of h(&#8226;) and put an excess weight into the non-robust classifier g(&#8226;), namely, if there exists i &#824; = y such that &#945; &#8804; 1 1+hy(x)-h i (x) , then r &#945; p (x) &#8804; 0, and in this case, we cannot provide non-trivial certified robustness for h &#945; (&#8226;). If h(&#8226;) is 100% confident in its prediction, then h y (x) -h i (x) = 1 for all i &#824; = y, and therefore this threshold value of &#945; becomes 1 2 , leading to non-trivial certified radii for &#945; &gt; 1 2 . However, once we put over 1 2 of the weight into g(&#8226;), a nonzero radius around x is no longer certifiable. Since no assumptions on the robustness of g(&#8226;) around x have been made, this is intuitively the best one can expect. We now move on to tightening the certified radius in the special case when h(&#8226;) is an RS classifier and our robust radii are defined in terms of the &#8467; 2 norm.</p><p>is not 0 almost everywhere or 1 almost everywhere.</p><p>Theorem 5 Suppose that Assumption 2 holds, and let x &#8712; R d be arbitrary. Let y = arg max i h i (x) and y &#8242; = arg max i&#824; =y h</p><p>The proof of Theorem 5 is provided in Appendix B in the supplementary materials.</p><p>To summarize our certified radii, Theorem 4 applies to very general Lipschitz continuous robust base classifiers h(&#8226;) and arbitrary &#8467; p norms, whereas Theorem 5, applying to the &#8467; 2 norm and RS base classifiers, strengthens the certified radius by exploiting the stronger Lipschitzness arising from the special structure and smoothness granted by Gaussian convolution operations. Theorems 4 and 5 guarantee that our proposed robustification cannot be easily circumvented by adaptive attacks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Numerical Experiments</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">&#945;'s Influence on Mixed Classifier Robustness</head><p>We first use the CIFAR-10 dataset to evaluate the mixed classifier h &#945; (&#8226;) with various values of &#945;.</p><p>We use a ResNet18 model trained on unattacked images as the standard base model g(&#8226;) and use another ResNet18 trained on PGD 20 data as the robust base model h(&#8226;). We consider PGD 20 attacks that target g(&#8226;) and h(&#8226;) individually (abbreviated as STD and ROB attacks and can be regarded as transfer attacks), in addition to the adaptive PGD 20 attack generated using the end-to-end gradient of h &#945; (&#8226;), denoted as the MIX attack.</p><p>The test accuracy of each mixed classifier is presented in Figure <ref type="figure">2</ref>. As &#945; increases, the clean accuracy of h &#945; (&#8226;) converges from the clean accuracy of g(&#8226;) to the clean accuracy of h(&#8226;). In terms of attacked performance, when the attack targets g(&#8226;), the attacked accuracy increases with &#945;. When the attack targets h(&#8226;), the attacked accuracy decreases with &#945;, showing that the attack targeting h(&#8226;) becomes more benign when the mixed classifier emphasizes g(&#8226;). When the attack targets the mixed classifier h &#945; (&#8226;), the attacked accuracy increases with &#945;.</p><p>When &#945; is around 0.5, the MIX-attacked accuracy of h &#945; (&#8226;) quickly increases from near zero to more than 30% (two-thirds of h(&#8226;)'s attacked accuracy). This observation precisely matches the theoretical intuition from Theorem 4. Meanwhile, when &#945; is greater than 0.5, the clean accuracy gradually decreases at a much slower rate, leading to the alleviated accuracy-robustness trade-off.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">The Relationship between h &#945; (&#8226;)'s Robustness and h(&#8226;)'s Confidence</head><p>This difference in how clean and attacked accuracy change with &#945; can be explained by the prediction confidence of the robust base classifier h(&#8226;). Specifically, Table <ref type="table">1</ref> confirms that h(&#8226;) makes confident correct predictions even when under attack (average robust margin is 0.768). Moreover, h(&#8226;)'s robust margin follows a long-tail distribution: the median robust margin is 0.933, much larger than the 0.768 mean. Thus, most attacked inputs correctly classified by h(&#8226;) are highly confident (i.e., robust with large margins). As Lemma 2 suggests, such a property is precisely what the mixed classifier relies on. Intuitively, once &#945; becomes greater than 0.5 and gives h(&#8226;) more authority over g(&#8226;), h(&#8226;) can use its confidence to correct g(&#8226;)'s mistakes under attack.</p><p>On the other hand, h(&#8226;) is unconfident when producing incorrect predictions on clean data, with the top two classes' output probabilities separated by merely 0.434. This probability gap again forms a long-tail distribution (the median is 0.378 which is less than the mean), confirming that h(&#8226;) rarely makes confident incorrect predictions. Now, consider clean data that g(&#8226;) correctly classifies and h(&#8226;) mispredicts. Recall that we assume g(&#8226;) to be more accurate but less robust, so this scenario should be common. Since g(&#8226;) is confident (average top two classes probability gap is 0.982) and h(&#8226;) is usually unconfident, even when &#945; &gt; 0.5 and g(&#8226;) has less authority than h(&#8226;) in the mixture, g(&#8226;) can still correct some of the mistakes from h(&#8226;).</p><p>0.0 0.2 0.4 0.5 0.6 0.8 1.0 &#945; 0 10 20 30 40 50 60 70 80 90 100 Clean and PGD10 accuracy (%)  In summary, h(&#8226;) is confident when making correct predictions on attacked data while being unconfident when misclassifying clean data, and such a confidence property is the key source of the mixed classifier's improved accuracy-robustness trade-off. Additional analyses in Appendix A with alternative base models imply that multiple existing robust classifiers share this benign confidence property and thus help the mixed classifier improve the trade-off.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Visualization of the Certified Robust Radii</head><p>Next, we visualize the certified robust radii presented in Theorem 4 and Theorem 5. Since a (Gaussian) RS model with smoothing covariance matrix &#963; 2 I d has an &#8467; 2 -Lipschitz constant 2 /&#960;&#963; 2 , such a model can be used to simultaneously visualize both theorems, with Theorem 5 giving tighter certificates of robustness. Note that RS models with a larger smoothing variance certify larger radii but achieve lower clean accuracy, and vice versa. Here, we consider the CIFAR-10 dataset and select g(&#8226;) to be a ConvNeXT-T model with a clean accuracy of 97.25%, and use the RS models presented in <ref type="bibr">(Zhang et al., 2019)</ref> as h(&#8226;). For a fair comparison, we select an &#945; value such that the clean accuracy of the constructed mixed classifier h &#945; (&#8226;) matches that of another RS model h baseline (&#8226;) with a smaller smoothing variance. The expectation term in the RS formulation is approximated with the empirical mean of 10000 random perturbations drawn from N (0, &#963; 2 I d ), and the certified radii of h baseline (&#8226;) are calculated using Theorems 4 and 5 by setting &#945; to 1. Figure <ref type="figure">3</ref> displays the calculated certified accuracy of h &#945; (&#8226;) and h baseline (&#8226;) at various attack radii. The ordinate "Accuracy" at a given abscissa "&#8467; 2 radius" reflects the percentage of the test data for which the considered model gives a correct prediction as well as a certified radius at least as large as the &#8467; 2 radius under consideration.</p><p>In both subplots of Figure <ref type="figure">3</ref>, the certified robustness curves of h &#945; (&#8226;) do not connect to the clean accuracy when &#945; &#8594; 0. This is because Theorems 4 and 5 both consider robustness with respect to h(&#8226;) and do not certify test inputs at which h(&#8226;) makes incorrect predictions, even though h &#945; (&#8226;) may correctly predict some of these points. This is reasonable because we do not assume any robustness or Lipschitzness of g(&#8226;), and g(&#8226;) is allowed to be arbitrarily incorrect whenever the radius is non-zero.</p><p>The Lipschitz-based bound of Theorem 4 allows us to visualize the performance of the mixed classifier h &#945; (&#8226;) when h(&#8226;) is an &#8467; 2 -Lipschitz model. In this case, the curves associated with h &#945; (&#8226;)  and h baseline (&#8226;) intersect, with h &#945; (&#8226;) achieving higher certified accuracy at larger radii and h baseline (&#8226;) certifying more points at smaller radii. Adjusting &#945; and the Lipschitz constant of h(&#8226;) can change the location of this intersection while maintaining the clean accuracy. Thus, the mixed classifier allows for optimizing the certified accuracy at a particular radius without sacrificing clean accuracy.</p><p>The RS-based bound from Theorem 5 captures the behavior of the mixed classifier h &#945; (&#8226;) when h(&#8226;) is an RS model. For both h &#945; (&#8226;) and h baseline (&#8226;), the RS-based bounds certify larger radii than the corresponding Lipschitz-based bounds. Nonetheless, h baseline (&#8226;) can certify more points with the RS-based guarantee. Intuitively, this phenomenon suggests that RS models can yield correct but low-confidence predictions when under large-radius attack, and thus may not be best-suited for our mixing operation, which relies on robustness with non-zero margins. Meanwhile, Lipschitz models, a more general and common class of models, exploit the mixing operation more effectively. Moreover, as shown in Figure <ref type="figure">2</ref> and Table <ref type="table">1</ref>, empirically robust models often yield high-confidence correct predictions when under attack, making them more suitable to be used as h &#945; (&#8226;)'s robust base classifier.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>This work proposes to mix the predicted probabilities of an accurate classifier and a robust classifier to mitigate the accuracy-robustness trade-off. These two base classifiers can be pre-trained, and the resulting mixed classifier requires no additional training. Theoretical results certify that the mixed classifier inherits the robustness of the robust base model under realistic assumptions. Empirical evaluations show that our method approaches the high accuracy of the latest standard models while retaining the robustness of modern robust classification methods. Hence, this work provides a foundation for future research to focus on either accuracy or robustness without sacrificing the other, providing additional incentives for deploying robust models in safety-critical control.     Appendix A. Additional Empirical Support for R i (x) = 1</p><p>Finally, we use additional empirical evidence <ref type="bibr">(Figures 4(a)</ref> and 4(b)) to show that R i (x) = 1 is the appropriate choice for the mixed classifier and that the probabilities should be used for the mixture. While most experiments in this paper are based on the popular ResNet architecture, our method does not depend on any ResNet properties. Therefore, for the experiment in Figure <ref type="figure">4</ref>(a), we select a more modern ConvNeXT-T model <ref type="bibr">(Liu et al., 2022)</ref> pre-trained on ImageNet-1k as an alternative architecture for g(&#8226;). We also use a robust model trained via TRADES in place of an adversarially-trained network for h(&#8226;) for the interest of diversity. Additionally, although most of our experiments are based on &#8467; &#8734; attacks, the proposed method applies to all &#8467; p attack budgets.</p><p>In Figure <ref type="figure">4</ref>(b), we provide an example that considers the &#8467; 2 attack. The experiment settings are summarized in Table <ref type="table">2</ref>.</p><p>Figures 4(a) and 4(b) confirm that setting R i (x) to the constant 1 achieves the best trade-off curve between clean and attacked accuracy, and that mixing the probabilities outperforms mixing the logits. This result aligns with the conclusions of Figure <ref type="figure">1</ref> and our theoretical analyses.</p><p>For all three cases listed in Table <ref type="table">2</ref>, the mixed classifier reduces the error rate of h(&#8226;) on clean data by half while maintaining 80% of h(&#8226;)'s attacked accuracy. This observation suggests that the mixed classifier noticeably alleviates the accuracy-robustness trade-off. Additionally, our method is especially suitable for applications where the clean accuracy gap between g(&#8226;) and h(&#8226;) is large. On easier datasets such as MNIST and CIFAR-10, this gap has been greatly reduced by the latest advancements in constructing robust classifiers. However, on harder tasks such as CIFAR-100 and ImageNet-1k, this gap is still large, even for state-of-the-art methods. For these applications, standard classifiers often benefit much more from pre-training on larger datasets than robust models.</p></div></body>
		</text>
</TEI>
