<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>NeuroPots: Realtime Proactive Defense against Bit-Flip Attacks in Neural Networks</title></titleStmt>
			<publicationStmt>
				<publisher>USENIX Security Symposium</publisher>
				<date>08/09/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10471804</idno>
					<idno type="doi"></idno>
					<title level='j'>USENIX Security Symposium</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Q Liu</author><author>J Yin</author><author>W Wen</author><author>C Yang</author><author>S Sha</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Deep neural networks (DNNs) are becoming ubiquitous in various safety-and security-sensitive applications such as selfdriving cars and financial systems. Recent studies revealed that bit-flip attacks (BFAs) can destroy DNNs' functionality via DRAM rowhammer -by precisely injecting a few bit-flips into the quantized model parameters, attackers can either degrade the model accuracy to random guessing, or misclassify certain inputs into a target class. BFAs can cause catastrophic consequences if left undetected. However, detecting BFAs is challenging because bit-flips can occur on any weights in a DNN model, leading to a large detection surface.Unlike prior works that attempt to "patch" vulnerabilities of DNN models, our work is inspired by the idea of "honeypot". Specifically, we propose a proactive defense concept named NeuroPots, which embeds a few "honey neurons" as crafted vulnerabilities into the DNN model to lure the attacker into injecting faults in them, thus making detection and model recovery efficient. We utilize NeuroPots to develop a trapdoor-enabled defense framework. We design a honey neuron selection strategy, and propose two methods for embedding trapdoors into the DNN model. Furthermore, since the majority of injected bit flips will concentrate in the trapdoors, we use a checksum-based detection approach to efficiently detect faults in them, and rescue the model accuracy by "refreshing" those faulty trapdoors. Our experiments show that trapdoor-enabled defense achieves high detection performance and effectively recovers a compromised model at a low cost across a variety of DNN models and datasets.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Deep neural networks (DNNs) have achieved tremendous success in many real-world applications ranging from computer vision, speech recognition, disease diagnosis to safetyand security-critical autonomous vehicles, and banking systems <ref type="bibr">[1,</ref><ref type="bibr">2,</ref><ref type="bibr">3,</ref><ref type="bibr">4,</ref><ref type="bibr">5,</ref><ref type="bibr">6]</ref>. Unfortunately, recent studies have revealed that DNN inference execution on hardware engines such as GPUs, FPGAs, and ASICs, can be directly compromised by a variety of fault injection attacks <ref type="bibr">[7]</ref>. In parallel with existing data-centric attacks <ref type="bibr">[8,</ref><ref type="bibr">9,</ref><ref type="bibr">10,</ref><ref type="bibr">11,</ref><ref type="bibr">12,</ref><ref type="bibr">13</ref>], these emerging model-centric attacks can also lead to attackerdesired persistent accuracy loss without altering the input.</p><p>Fault injection attacks disrupt DNN inference by manipulating the model's neuron activation, computed intermediate results, or weight parameters stored in buffers or memories, via active fault injection techniques such as laser beaming <ref type="bibr">[14]</ref>, row hammering <ref type="bibr">[15,</ref><ref type="bibr">16]</ref>, or clock glitching <ref type="bibr">[17]</ref>. Among these methodologies, the recently discovered Bit-Flip Attack (BFA) <ref type="bibr">[16,</ref><ref type="bibr">18]</ref> that exploits DRAM rowhammer vulnerability is considered to be one of the most destructive attacks. BFA injects faulty bits directly into memories that host weight parameters with DeepHammer <ref type="bibr">[16]</ref>, a type of rowhammer that can precisely flip a bit inside a DRAM page <ref type="bibr">[19]</ref>. As demonstrated in real DNN testbeds, by flipping only 13 bits out of 1.2 billion bits of an 8-bit quantized DNN model, Deep-Hammer can degrade the model accuracy to random guessing within 66 seconds <ref type="bibr">[16]</ref>. Moreover, a stealthy and probably more dangerous variant of BFAs called Targeted Bit-Flip Attack (T-BFA) was proposed <ref type="bibr">[20]</ref>. T-BFA can go unnoticed for a long time as it only alters the classification of some inputs with a very marginal impact on others. Considering a resource-sharing environment where highly optimized DNN models are deployed to serve multiple users (e.g., machinelearning-as-a-service <ref type="bibr">[5,</ref><ref type="bibr">21]</ref>), BFAs can result in catastrophic consequences if left undetected.</p><p>Defending against BFAs, however, can be very challenging. BFAs leverage a progressive search algorithm to select and flip bits only in the most sensitive weights, and the locations of those bits are highly dependent on a very small portion of input data that attackers have. As a result, it is difficult to predict where BFAs will be landed and concentrate on those weights, as complex DNN models have too many "vulnerable points" to protect against (more details in Section 2.2). Defense solutions that passively detect faults of weights in pre-selected layers can be suboptimal <ref type="bibr">[22]</ref>. To tackle this challenge, we propose a proactive defense con-cept named "NeuroPots" (formally defined in Section 3.2). Inspired by the idea of honeypots, NeuroPots embeds a few specially designed "honey neurons" as vulnerabilities (i.e., trapdoors fully controlled by the defender) into the model to lure BFA's targeted chain of bit-flips. As honey neurons are more vulnerable than normal neurons, BFAs are highly likely to land on a few designated honey neurons. At runtime, faults in honey neurons can be efficiently detected using checksums, and the faulty weights can be recovered to the original value.</p><p>We comprehensively study NeuroPots's theoretical and practical foundations of trapping bit-flips under real system constraints. Then based on NeuroPots, we design a trapdoorenabled defense framework to detect and mitigate different basic and adaptive attacks. The evaluation shows that our design incurs minimal impact on inference accuracy, but is able to lure attackers to inject most bit-flips (if not all) into those honey neurons. By checking only a very small portion of model weights, attacks can be easily detected with a close-to-zero false negative rate. The discovered position information of the trapped bit-flips further offers a unique opportunity to repair the model online at low cost (e.g., 0.69ms time overhead and 99KB storage overhead for 14ms/22MB ResNet-34 in ImageNet).</p><p>To the best of our knowledge, this is the first work that adopts a HoneyPot-style proactive defense policy to offer extremely lightweight real-time mitigation on the challenging model-centric bit-flip attacks whose attack surface could be too large to be covered by existing solutions. We hope that our results could provide a new perspective for defending emerging attacks in deep learning and will enable more indepth research along this direction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Background and Motivation</head><p>In this section, we first provide a brief background on BFA, T-BFA, and DeepHammer. Then we explain why a proactive defense solution is crucial. Finally, we explain at a high level the design intuition behind NeuroPots.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Bit-Flip Attack</head><p>Bit-Flip Attack and Target BFA. The goal of BFA is to significantly degrade the model's accuracy by performing a minimum number of bit-flips. The objective function can be represented as: max B L( f (x, B),t), where B is the two's complement representation of quantized weights, t is the groundtruth target of input x, L denotes the loss function of the model.</p><p>Unlike BFA, T-BFA aims to misclassify inputs from source category p into the target category q (q = p). Meanwhile, the remaining inputs will maintain their original categories (with no impact on their accuracy) to ensure attack stealthiness. T-BFA will minimize the following objective function:</p><p>Although BFA and T-BFA have different objectives, they both utilize a gradient-based Progressive Bit Search (PBS) to find the most vulnerable bits iteratively on quantized DNNs. PBS consists of two steps: intra-layer and cross-layer search. In the k-th iteration, the intra-layer search selects bits with top n gradient b L as the most vulnerable bit candidates at layer l. For a q-bit quantized DNN, b L can be represented as:</p><p>Then, they apply a classical Fast Gradient Sign Method (FGSM) <ref type="bibr">[8]</ref> algorithm from adversarial attack to the n selected vulnerable bit candidates, which can be presented as:</p><p>Note, BFA uses +sign( b L) to maximize objective function, while T-BFA applies -sign( b L) to minimize its objective function. Then, they evaluate the loss increment as L k l for layer l. The same process will be performed in each layer to obtain loss set {L k 1 , . . . , L k l , . . . L k L }. Then, the cross-layer algorithm identifies the j-th layer with maximum loss (i.e., j=arg max l {L k 1 , . . . , L k l , . . . L k L }) and flip the most vulnerable bit with largest gradient in j-th layer.</p><p>DeepHammer Attack (BFA via row-hammer). At the software-level, DeepHammer attack <ref type="bibr">[16]</ref> optimizes BFA's vulnerable bits search algorithm. In the k-th iteration, after selecting n vulnerable bits by ranking gradients b L like BFA, DeepHammer flips each bit respectively and generates a loss set as {L k l } n i=1 . The same process will be performed for each layer. As a result, the total candidate bits will be n&#215;l and the corresponding loss set is {L k } n&#215;l i=1 . DeepHammer selects the bit that has the maximum loss as the most vulnerable bit. At the hardware level, a new double-sided row-hammer attack using a targeted column-page-stripe data pattern is proposed to perform precise bit-flips in a DRAM page. Unlike the previous two ideal attacks, to achieve precise bit flipping, DeepHammer needs to obey a strict constraint, that is, only one bit-flip per page. In this work, we assume BFA and TBFA can flip any number of bits per DRAM page, while DeepHammer is constraint by flipping only one bit per page.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Motivation for Proactive Defense</head><p>To maximize the attack efficiency, BFAs always perform bitflips on the most sensitive weights with the largest gradients. Because the gradients for weights highly rely on inputs, the generated bit-flip chain can be very different (i.e., different combinations of bits, weights, and layers) if attackers use different input data. Fig. <ref type="figure">1</ref> From the three trails of attack, we notice that although each trial focuses on flipping bits in just a few layers, none of the layers receives a bit-flip in all three trials. This study demonstrates that DNN models have many unpredictable natural "vulnerable points", and hence defense solutions that rely on detecting faults of weights in a few pre-selected layers can be suboptimal. In other words, passively finding and examining a subset of neurons (or layers) that is more vulnerable to BFAs can easily lead to false negative given the limited coverage. Moreover, it fails when facing adaptive attackers who can circumvent these "fixed" vulnerable points and find another attack path.</p><p>The above observation motivates us to design a proactive defense mechanism. Instead of blindly analyzing which layers/weights the highly unpredictable BFAs will land on, we intentionally introduce vulnerabilities into the model to attract BFAs, such that the locations of faults become determined. If the crafted vulnerabilities are always easily discovered, we can ensure that attackers inject bit-flips into them even if using different data. Furthermore, as we only need to protect a few vulnerabilities, the detection overhead will be trivial even using precise but high-cost detection methods. More excitingly, if the vulnerabilities can trap most of the bit-flips, we can quickly recover model accuracy by "refreshing" faulty weights. To our best knowledge, no prior work can recover model accuracy to the original level at run time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">NeuroPots Design Intuition</head><p>Consider a scenario where, starting from a trained DNN model with a quantized parameter set B and any input x, the attacker searches for a universal adversarial binary weight-bit perturbation to induce a misclassification from the correct label y x to any incorrect label other than y x in BFA (or a target label y t = y x in T-BFA). This is analogous to looking for a "shortcut" from the model to achieve the required systematical misclassification for any legitimate input x by flipping the least amount of bits in the model parameter set B (e.g., 13 bit-flips out of 1.2 billion bits <ref type="bibr">[16]</ref>). Along these lines, NeuroPots intentionally disguise a few neurons as decoys and create shortcuts that are easier to locate and shorter than any natural weakness attackers are searching for. Once the model is trapdoored, an attacker will generate the adversarial chain of bit-flips along shortcuts produced by NeuroPots. In Fig. <ref type="figure">1</ref>(c), we embed one honey neuron in each layer to construct a trapdoored model. We can observe that 5 out of 6 bit-flips of the three attack chains appear in weights connected with the embedded honey neurons. Because at least one bit-flip appears in honey weights for each attack chain, we can therefore detect the attack accurately (we assume that the detection approach can correctly detect faults of honey weights). Moreover, all bit-flips of attack chains A and C are trapped by NeuroPots, so these two trials can achieve completed accuracy recovery by replacing faulty weights with their golden backup. Because honey weights are a tiny portion out of the entire weights in our experiments (e.g., &#8764; 0.5% in ResNet-34), the time and storage overhead for fault detection and model recovery will be very low.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Trapdoor Defense using NeuroPots</head><p>We utilize NeuroPots to design a trapdoor-enabled proactive defense framework. The key is to expand specific vulnerabilities in the model via designed honey neurons. Such intentional weaknesses we build into the DNN model (called "trapdoored model") will shape and lure bit-flip attacks to make them easily detected and recovered at inference time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Threat Model and Design Objectives</head><p>Threat Model. We adopt the white-box threat model assumption consistent with prior relevant works <ref type="bibr">[16,</ref><ref type="bibr">18]</ref>, of which the goal is to crush a well-trained quantized DNN model by flipping a minimum number of weight bits after model deployment. In particular, the attacker is assumed to possess complete knowledge of the victim trapdoored model's structure, weights, gradients, and partial knowledge of the training data. This can be obtained by ways such as side-channel attacks <ref type="bibr">[23,</ref><ref type="bibr">24]</ref>. The attacker's program co-locates with the victim model on resource-sharing platforms such as machinelearning-as-a-service with DRAM hosting quantized parameters. The attacker searches for the target chain of bit-flips via gradient-based BFA algorithm <ref type="bibr">[18]</ref> and performs the Deep-Hammer attack <ref type="bibr">[16]</ref> to precisely flip DRAM bits selected by the BFA algorithm. Under real system constraints, we assume DeepHammer only flips one bit per page. We also assume the attacker does not have access to proposed detector (e.g., checksum or bit-by-bit check used at run time to detect BFA or DeepHammer) and the small number of backup weights associated with NeuroPots for online model recovery (e.g., secured in the trusted execution environment (TEE) like Intel SGX <ref type="bibr">[25]</ref> or ARM Trustzone <ref type="bibr">[26,</ref><ref type="bibr">27]</ref>). If ever compromised, trapdoor detection and recovery can be reset.</p><p>Adaptive Adversaries. We define three types of attackers based on their knowledge levels as follows. We evaluate the online detection and model recovery capabilities against all attackers in Section 6. 1. Basic Attacker is an entry-level attacker with no knowledge of the trapdoor defense. The attacker directly applies BFAs to the trapdoored model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2.</head><p>Expert Attacker is aware that the model is protected by trapdoors and detection will examine the flipped bits during inference. The attacker may also know some basic principles of trapdoor designs (e.g., algorithms for selecting and creating honey neurons) and try to leverage such knowledge to circumvent the defense. However, the attacker does not know the exact characteristics of trapdoor defense (e.g., the locations and number of honey neurons).</p><p>3. Oracle Attacker knows comprehensive details of trapdoor defense, for example, the design principles, the number and exact locations of embedded honey neurons, etc.</p><p>Defender's Capability. We consider two scenarios: 1) The defender has the ability to fine-tune the well-trained model using training data (often a few epochs), so as to quickly create trapdoored models. This is referred to as retrainingbased strategy described in Section 4.1.2. 2) The defender does not have access to training hardware but a small number of testing data, and can directly build the trapdoor from the trained model without involving retraining. This is referred to as one-shot strategy described in Section 4.1.2).</p><p>Design Goals. We have three major design goals. First, the defense should detect adversarial bit-flips with very high precision and/or close-to-zero false negative rate under different levels of attackers; and recover model accuracy to almost the original level (e.g., &lt; 3%) for basic and expert attackers <ref type="foot">1</ref> . Second, the trapdoored model should not impact inference accuracy. Third, the defense should incur very low overhead, for example, a very small number of honey neurons for detection and negligible latency impact on normal inference.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Theoretical&amp; Practical Basis of NeuroPots</head><p>In this section, we present formal theoretical principles as well as practical implementation principles of our trapdoorenabled defense. These principles guarantee effectiveness in mitigating adversarial bit-flips. They also lay solid foundations for our detailed defense implementation in Section 4.</p><p>Theoretical Foundation of NeuroPots. To lure an attacker into flipping weight bits within our designed NeuroPots, we examine how the attacker uses a greedy strategy to identify the most vulnerable weight bits from the model. While the attacker's choice of a chain of bit-flips could be highly unpredictable (see Section 2.2), the first critical step is to identify a bit with the largest gradient w.r.t loss function from each layer based on a batch of training data. Then, the attacker compares the impact of each individual bit-flip to loss function and finally flips the most influential bit from one iteration to another. Intuitively, if we can intentionally magnify the gradient | W L| of some selected weights, then we will be able to improve the chance of luring bit-flips to such weights and make the attacker's bit-flip chain predicable. Inspired by the above intuition, we further investigate the backpropagation rule to compute the gradient of loss function L w.r.t. weight W l i j that connects neuron j at layer l to neuron i at layer l -1:</p><p>where g is the activation function and &#948; l j is the error term of neuron j at layer l (i.e., &#948; l j = &#8706;L &#8706;a l j , in which a l j is neuron j's output before passed to g at layer l). In a hidden layer (i.e., 1 &#8804; l &lt; N), the gradient magnitude of loss function L w.r.t.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>W l</head><p>i j is determined by three key terms: g (a l j ) -the derivative of activation function for neuron j at layer l; o l-1 i -neuron i's activation value at layer l -1; and &#8721; n l+1 k=1 W l+1 jk &#948; l+1 k -the weighted summation of error terms of all neurons at layer l + 1. The first term generally equals to a constant (either 0 or 1) due to the widely adopted ReLU function in DNNs (g(x) = x if x &#8805; 0, otherwise 0). Meanwhile, the last term cannot guarantee the increase of the absolute value of the summation, because weights and error terms could be negative or positive As a result, we focus on the second term o l-1 i , which is the activation value of source neuron i at current layer l -1. By just increasing o l-1 i , theoretically the gradient magnitude of all weights (W l i j ) starting from a source neuron i at the current layer to any destination neuron j at the next layer can be enlarged. This is also applicable to the output layer. In this way, we have the following formal definitions: DEFINITION 1. Any neuron (or feature map in a convolutional layer) with its activation value intentionally enlarged can be defined as a "honey neuron" (or "honey feature map"). All weights connected from the honey neuron to any neurons at the next layer are "honey weights" that can be used to trap malicious bit-flips from an attacker. Practical Foundation of NeuroPots. Another important consideration of NeuroPots is to cover scatted bit-flips which could stem from an expert attacker under real system constraints (e.g., DeepHammer with one flippable bit per 4KB DRAM page). Fig. <ref type="figure">2</ref> depicts an example of mapping and storing honey neuron's connected weights in DRAM pages for a convolutional layer. In this example, we use a honey feature map (marked in yellow) instead of a single honey neuron, and its connected weights to the next layer are 2D filters in the same channel (marked in yellow) across all 3D kernels. Since the computation of every output feature map needs to read its corresponding 3D kernel from memory, the weights will be stored in the granularity of an entire 3D kernel for better access to a page. Given that honey weights in the yellow 2D filter appear in every 3D kernel, and the size of the total kernels in a convolutional layer (e.g., 256 3D kernels with 3 &#215; 3 &#215; 64) is much larger than a DRAM page size (e.g., 4KB), honey weights will be stored across several pages to trap bit-flips, even if only a single honey feature map is embedded. This is also applicable to fully connected layers -the weight of a single honey neuron will be accessed whenever computing an output neuron of the next layer.</p><p>Why Focus on Neurons? We need to emphasize that Neu-roPots focus on neurons/feature-maps instead of individual weights to create figurative holes. When attackers construct adversarial bit-flips against the model, they will have a high probability of falling into the trap. This is based on the fact that the number of neurons/feature-maps is usually much smaller than the weights in modern DNNs, and this can significantly reduce the attack surface and hence achieves efficient and effective detection and mitigation in real-time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Detailed Defense Implementation</head><p>A useful defense should be able to mitigate different levels of attacks which are unpredictable but may occur in practical scenarios. Therefore, the design of trapdoor-enabled defense shall proactively consider defeating adversaries with differ-  ent capabilities (see Section 3.1). To achieve this, we first introduce the two-step defense implementation: 1) offline trapdoored model construction; and 2) online adversarial bitflip detection/mitigation. We then present the final defense recommendation against basic and expert attackers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Step 1: Trapdoored Model Construction</head><p>Our first step is to construct a trapdoored model offline before model deployment, in which honey neurons and their associated weights are carefully designed and stealthily embedded into a trained model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.1">Honey Neuron Selection</head><p>Since the basic idea of trapdoor design is to intentionally make a small number of neurons more vulnerable, a simple approach would be to design a honey neuron selection strategy based on the ranking of neuron activation response. Intuitively, activating such already highly activated neurons is more likely to trap bit-flips produced by basic attackers, with less impact on inference accuracy. However, for skilled attackers who are aware that honey neurons are selected by ranking activation values, it may not work well since attackers may avoid flipping bits associated with those highly activated neurons to bypass the defense. Therefore, our solution needs to enforce multiple levels of randomness in honey neuron selection: 1) the positions of honey neurons (e.g., which layer and which neuron in a layer); 2) the activation rankings of honey neurons (e.g. ranking at different levels); and 3) the number of honey neurons. In this process, we also avoid selecting "dead" neurons that always produce zero activation.</p><p>We further treat fully connected layer (neuron) and convolutional layer (feature map) differently in honey neuron selection. In particular, for neurons in a fully connected layer, we can randomly select the N neurons at a layer l. For neurons in a convolutional layer, considering that weights (or kernels) are shared by different neurons within the same output feature map, to simplify the design, we could treat an entire input feature map as a honey neuron, a.k.a honey feature map, and all its corresponding 2D filters in 3D kernels become honey weights (see Fig. <ref type="figure">2</ref>). As a result, we randomly select the N feature-maps as honey feature maps at layer l.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.2">Honey Neuron Embedding</head><p>Once honey neurons are identified, the next step is to embed them into a well-trained DNN model to create trapdoors. Meanwhile, trapdoor embedding should not degrade the model accuracy. To achieve the above goals, we need to solve the following optimization problem:</p><p>where L h is the loss function for embedding honey neurons, and &#945; (0 &lt; &#945; &lt; 1) is to balance model accuracy (first term) and trapdoor embedding (second term). N h is the number of layers embedded with honey neurons. o l is the honey neuron's activation at layer l, while t l is its target activation value (t l o l ). In our implementation, we set the target t l as &#947; times initial value of o l (before trapdoor embedding), i.e., t l = &#947; &#8226; o l (&#947; &gt; 1 is an expanding coefficient). To simply our design, we set the same &#947; c for all convolutional layers and the same &#947; f for all fully connected layers. To solve the above optimization problem, we propose two different methods: retraining-based embedding and one-shot embedding.</p><p>Method 1: Retraining-based Embedding (Fine-tuning). To optimize Eq. 5, a straightforward method is to fine-tune the original model using an optimization (training) algorithm. For trapdoor embedding, our goal is to minimize the difference between honey neuron activation and its target. We use Mean Square Error (MSE) as an extra loss term to achieve this goal:</p><p>where k is the number of honey neurons, and t l i is the target (constant) activation value for honey neuron i at layer l. It is worth noting that honey feature maps embedding in convolutional layers will be conducted at the granularity of a single neuron of a feature map. Specifically, given a honey feature map o l i , we have</p><p>where n is the number of neurons of a certain feature map at layer l. Intuitively, we can also concentrate on a small portion of neurons (rather than all) of the feature map for reducing potential accuracy degradation. However, our experiment indicates there is not much difference in accuracy from acting on all neurons of the feature map when the number of honey feature maps is small. Thus, for design simplification, we will focus on all neurons of a feature map in real implementation.</p><p>We use ADAM <ref type="bibr">[28]</ref> as our optimization algorithm to minimize Eq. 5 and Eq. 6. In general, we set a small value for &#945; in Eq. 5 (e.g., 0.01), since the value of L is very small for a well-trained model, while the target activation value of honey neuron, namely t l i , is much larger than o l i at the beginning of the retraining process. Note that the retraining-based trapdoor can be applied to any layer except for the input layer (i.e., l = 0). This is because the input data (e.g., pixels or words) cannot be enlarged by fine-tuning parameters (e.g., weights).</p><p>Method 2: One-shot Embedding (Retraining-free). The retraining-based trapdoored model construction is relatively complex and costly, and it is only for defenders who have the ability to fine-tune models using a large amount of training data. The one-shot trapdoored model construction addresses those issues. Specifically, the one-shot strategy aims to quickly solve Eq. 5 by a set of simple computations with very low cost, using the following two steps. First, we directly enlarge the activation value of a honey neuron by multiplying &#947;, i.e., o l &#8592; &#947; &#8226; o l , to achieve trapdoor embedding (the second term of Eq. 5 will be 0). Second, we shrink all weights connected to the honey neuron by 1  &#947; , so that the weighted contributions from this neuron to neurons in the next layer remain almost unchanged, hence minimizing the first term of Eq. 5. The specific one-shot processing can be formulated as:</p><p>where o l h is the honey neuron at layer l, and w l hi is the associated honey weights. For a full-precision model, this one-shot processing will not bring any errors (we ignore bias here). For a quantized model, however, because we need to re-quantize the shrunk weights 1 &#947; &#8226; w l hi to fixed-point numbers, this will bring a small amount of quantization error. A large &#947; could also lead the original weight to 0 (behaves like pruning) if the weight is small. However, we found that it only has a minor impact on model accuracy when embedding a few honey neurons. For the honey feature map, like retraining-based trapdoor embedding, the one-shot processing will be applied to all neurons of a feature map in a convolutional layer. Note that the one-shot strategy can be applied to any layer (including the input layer) because it will directly change the activation value of honey neurons.</p><p>Impact of Trapdoor on Model Output. After finding the most sensitive bit, the attacker will further measure the impact of bit flipping on the loss function to verify if the flipping can degrade model accuracy. Thus, to trap bit-flips efficiently, we need to ensure that our trapdoor can cause a more considerable change in model output when bit-flips are injected. For a normal neuron, its impact on next layer's output under BFAs can be expressed as o l+1 = (w + &#8710;w) &#8226; o l , where &#8710;w is the weight distortion caused by bit-flips. In comparison, taking the one-shot trapdoor as an example, a honey neuron's impact to the next layer is represented as:</p><p>We can see o l+1 increases by (&#947; -1) &#8226; &#8710;w &#8226; o l compared to the normal neuron. Moreover, the attacker is more likely to flip the most significant bit of weights to produce a large perturbation &#8710;w, so the impact of the honey neuron to o l+1 will be more significant, especially for a large &#947;. Such change will propagate and accumulate to the following layers, causing a dramatic change to the model output. Therefore, our trapdoor can trap bit-flips efficiently. Activation Ranking Obfuscation. While using a larger expanding coefficient-&#947; to intentionally activate honey neurons significantly can achieve a better bit-flip trapping efficiency in basic attackers, expert attackers may still use such information to bypass the defense. Therefore, we propose to moderately enlarge the activation of honey neurons using a smaller expanding coefficient (e.g., &#947; = 2) during embedding. In this way, honey neurons' ranking can be moved up slightly and distributed across all ranking positions in a layer, increasing the stealthiness. Even if attackers bypass higher-ranked ones, those with a lower ranking can still trap bit-flips.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Step 2: Online Detection/Model Recovery</head><p>After constructing the trapdoored model offline, the next step is to take advantage of it for online fault detection and model recovery. As Fig. <ref type="figure">3</ref> shows, we add fault detection and mitigation into layers embedded with the trapdoor, while other layers simply follow the original inference flow. Because honey weights are a tiny portion of the DNN model, the fault detection and recovery overhead will be very low. Therefore, our defense has a trivial impact on the original inference. We evaluate the inference time and storage overhead in Section 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1">Fault Detection</head><p>Because our trapdoor can significantly reduce the detection surface, lightweight fault detection could be easily realized, for example, via simple checksum or distance-based comparison. As an example, we develop a simple checksum mechanism to detect faults on honey weights. If the sum of the</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sum of honey weights</head><p>Sum of honey weights 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 current honey weights is different from the original sum, the model is then considered to be tampered. In particular, the method takes advantage of Multiply-Accumulate (MAC) operations commonly available in DNN hardware, and thus can eliminate additional hardware support for detection. Also, instead of checking the sum of all honey weights, it checks the sum of honey weights at the granularity of filters to improve detection performance. Fig. <ref type="figure">4</ref> illustrates the procedure of utilizing convolution (or MAC) operations to compute the sum of honey weights without additional hardware support. Specifically, we set the honey feature map into 1 and others to 0. After conducting convolution operations with a 3D kernel, the first neuron's value in the output feature map will be the sum of the honey weights (filter). For fully connected layers, computation of neurons output can be treated as 1&#215;1 convolution operation, so we can also use similar operations to compute the sum of honey weights.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sum of honey weights</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sum</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.2">Run-time Model Recovery</head><p>After building the trapdoored model, the defender will need to store a copy of clean honey weights in a secured zone (e.g., trusted execution environment (TEE) like Intel SGX <ref type="bibr">[25]</ref> or ARM Trustzone <ref type="bibr">[26,</ref><ref type="bibr">27]</ref>). Once faults are detected, we can directly replace faulty honey weights with the clean copy at run-time. In particular, to fit our filter-wise checksum detection, the level of granularity for fault mitigation is set as filter. Because we only need to store/recover a very small amount of weights, the storage and time overheads are extremely low. Furthermore, since our trapdoor can trap most of the bit-flips and mitigate the faults, the recovered model will be very close to the original model and hence the inference accuracy. It is worth mentioning that the recovered model is still a trapdoored model, so we do not need to repeat the trapdoor embedding process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Putting It All Together For Final Defense</head><p>We build our final trapdoor-enabled defense to effectively mitigate different basic and expert attackers by combining all aforementioned techniques via the following manner: 1) random honey neuron selection (position), neuron activation ranking obfuscation (via tuning expanding coefficient &#947; in embedding) and layer-wise coverage (via adjusting honey neuron number); 2) filter-wise fine-grained fault detection and model recovery in real time. In the following sections, we use such a combination as the final recommended defense for evaluation and discussion.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Adaptive Attack Design</head><p>Beyond the basic adversary, any meaningful defense must withstand countermeasures from expert attackers who acknowledge the defense. In Section 3.1, we define attackers with different abilities to comprehensively evaluate trapdoorenabled defense. Particularly, attackers without any knowledge of defense (original BFA, T-BFA and DeepHammer Attack in Section 2.1), are referred as basic attackers.</p><p>In this section, we design expert attackers that are aware that the target model has been embedded with trapdoors, and know the basic principles of trapdoor design but no specific details. We present multiple expert attackers separated into two broad categories. First, we consider the bit (weight)-level bypass approach (BBA) by crafting an adversarial bit-flip chain from weight bits with lower gradients in the trapdoor protected model, and then use it to attack this model. The purpose is to avoid choosing an individual (weight) bit from established NeuroPots in the model that often has a higher gradient. Second, we consider neuron-level bypass approaches (NBA) that focus on finding bit-flip chains that cause misclassification while avoiding using all weight bits possibly associated with honey neurons.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Bit-level Bypass Attack (BBA)</head><p>To achieve the highest attack efficiency, a greedy non-expert attacker searches the more vulnerable bits of the model to generate an attack that would lead to the least number of bit-flips. Our trapdoor-enabled defense takes advantage of such an attack pattern to set a few disguised vulnerable traps, e.g., weight bits with intentionally enlarged gradients located in NeuroPots, to lure the attacker's bit selection. Therefore, an expert attacker who knows that a target model contains such trapdoors may simply lower the objective by picking weight bits whose gradients are not among the top list, to circumvent pre-defined NeuroPots. Accordingly, the gradientbased bit ranking search algorithm of this adaptive BFA can be modified as follows:</p><p>where t &#8712; (0, 100%) is an adjustable threshold. The algorithm iteratively searches the most vulnerable p bits only from a subset of weights-t% of weights with lower gradients in a layer l, and excludes the rest (top) (1 -t%) of weights with higher gradients that may contain trapdoors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Neuron-level Bypass Attacks</head><p>Instead of examining individual weight bit, more advanced expert attackers would also consider defeating trapdoor by first speculating honey neurons, and then avoiding injecting bits in all their associated weights. Such neuron-(or Neu-roPots) level bypass could potentially lead to better efficiency for expert attacks, since all weights associated with identified neurons will not be selected for bit flipping.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.1">Ranking-based Neuron Bypass (R-NBA)</head><p>To trap more bit-flips, honey neurons generally need to have large activations. Therefore, expert attackers could utilize this property to identify honey neurons and bypass trapdoors. Specifically, the attacker may rank activation magnitude of neurons or feature maps (i.e., |o| or &#8721; n i |o i |), then bypass all weights associated with the neurons with top k b |o| or top k b &#8721; n i |o i | (e.g., k b = 10%) in certain layers. We name this attack as R-NBA-b. On the other hand, if expert attackers are aware that our trapdoor obfuscates the activation ranking of honey neurons, they may further upgrade their attacks by deliberately attacking the top-ranked neurons only to craft the attack, since these neurons may not be honey neurons. This attack can potentially generate a stealthy attack chain that would need much fewer bit-flips, thereby lowering the possibility of being detected by the trapdoor. This variant is called R-NBA-a.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.2">Frequency-based Neuron Bypass (F-NBA)</head><p>Another approach to precisely detect and bypass honey neurons is to leverage the fact that, statistically, honey neurons are likely to receive more bit-flips than normal neurons. In this case, expert attacker can perform BFAs multiple times using different input data, during which he or she counts the number of bit-flips in weights associated with each neuron/featuremap. A neuron whose weights receive more bit-flips is likely to be a honey neuron, and all weights associated with this neuron can be excluded from the bit-flip search process. We name such a bypass attack as F-NBA. Specifically, this bit-flip frequency detection includes two stages: 1) Testing (detecting) stage -we perform the attack algorithm using different training/testing data m times to generate a honey neuron set {h}. In each testing, the number of bit-flips in weights associated with each neuron n i will be counted. If the number of bit-flips on neuron n i is larger than a threshold T (e.g., T = 2), then n i is considered to be a honey neuron. We then add neuron n i into the honey neuron set {h} (i.e., {h} &#8592; {h} &#8746; n i ).</p><p>2) Attacking (bypassing) stage-during attack we bypass all weights associated with the honey neurons obtained from the last step n i &#8712; {h}. Generally, it is difficult for an attacker to obtain enough data to generate different bit-flip chains to test the target model many times. Here we make an aggressive assumption for the purpose of evaluation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Evaluation</head><p>In this section, we evaluate the performance of our trapdoorenabled defense against basic adversary and expert adversary as described in Section 3.1. Specifically, our evaluation answers the following questions: 1. How does NeuroPots impact the model accuracy? 2. How is the detection and mitigation performance of our defense framework against basic and adaptive BFAs?</p><p>3. How much time and storage overhead does our fault detection and model recovery framework introduce?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Experimental Setup</head><p>We use PyTorch as our implementation framework. All simulations are conducted in a workstation with one AMD Ryzen Thread ripper 2990WX 32-Core Processor and four NVIDIA GeForce RTX 2080Ti GPUs. Our DRAM is 32GB (X64, DR) 260-Pin DDR4 SODIMM and consists of 16 internal banks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.1">Datasets and DNN Structures</head><p>We evaluate the effectiveness of our trapdoor-enabled defense framework with CIFAR-10 <ref type="bibr">[29]</ref> and ImageNet <ref type="bibr">[30]</ref> for image classification and Google Speech Command <ref type="bibr">[31]</ref> for speech recognition datasets. Specifically, CIFAR-10 consists of 60K of 32&#215;32 colored images from 10 classes, in which 50K and 10K images are used for training and testing, respectively. Im-ageNet contains 256&#215;256 colored images from 1000 classes, in which 1.3M, 100K and 50K images are used for training, testing, and validation, respectively. Google Speech Command includes 65,000 one-second-long clips from 12 classes (10 commands and 2 additional special labels), each of which has 30 different words. We evaluate our trapdoor-enabled defense across different DNN structures. In particular, ResNet-20 <ref type="bibr">[32]</ref> and VGG-16 <ref type="bibr">[33]</ref> are evaluated on CIFAR-10, while the evaluations of AlexNet <ref type="bibr">[34]</ref>, ResNet-34, and a memory-efficient MobileNet-V2 <ref type="bibr">[35]</ref> are conducted on the complex ImageNet. The Google Speech Command dataset is used to evaluate WideResNet-28 <ref type="bibr">[36]</ref>. The weights of all these DNNs are quantized to 8-bit. The accuracy of well-trained quantized models are shown in Table <ref type="table">1</ref> (see w/o trapdoor column).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.2">Attack Configuration and Fault Models</head><p>Basic Adversary. For basic adversary evaluation, we use three types of original bit-flip attacks: BFA, T-BFA, and Deep-Hammer. The goal of BFA and DeepHammer is to degrade model accuracy to random guesses (i.e., 10%, 0.1%, 8.33% for CIFAR-10, ImageNet, Google Speech Commands, respectively). In comparison, T-BFA only misclassifies inputs from source class p to a target class q while maintaining accuracy for all inputs from other classes j. We randomly select a source and a target class in our experiment. In terms of hardware constraints of bit-flips in DRAM, we assume BFA and T-BFA can inject bit-flips into any weights of the model. In contrast, as a real bit-flip attack via row-hammer, Deep-Hammer only flips one bit per DRAM page. For each type of BFAs, we repeat the above attack configuration 50 times to generate 50 bit-flip chains (i.e., 50 fault models) using different random seeds for each DNN.</p><p>Expert Adversary. We consider that attackers can integrate the defense bypass techniques described in Section 5 with realistic DeepHammer to develop four adaptive Deep-Hammer attacks: BBA, R-NBA-b, R-NBA-a, and F-NBA. Without loss of generality, we set the configurations for these attacks as follows: t = 98% for BBA (i.e., bypass weight bits with top 2% gradient), k b = 5% for R-NBA-b and R-NBA-a (i.e., bypass/only-attack 5% top-ranked neurons), T = 2 and test times is 50 for F-NBA (i.e., run the attack algorithm 50 times with different seeds in the targeted model and bypass all neurons that are attacked more than two times in any trials). For simplicity, we focus on discussing these attacks under a given defense budget (see defense budget setting discussion in Section 6.1.3). We leave more evaluations under different adaptive attack parameter settings in Appendix C.</p><p>The detailed configurations for three basic BFAs and four adaptive attacks in all DNN models, including the number of needed bit-flips, are presented in Table <ref type="table">8</ref> of Appendix A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.3">Guidelines of Setting Key Parameters in Defense</head><p>We follow the procedure in Section 4.1 to create honey neurons for defense. In particular, we randomly select honey neurons and apply activation ranking obfuscation within a layer. To increase their stealthiness and improve the trapping rate, honey neurons are distributed across more layers. To properly set the three key parameters of this process: 1) number of honey neurons per layer N, 2) expanding coefficient &#947;, and 3) number of selected layers N l , we follow the general guidelines described below. A detailed ablation study is provided in Section 7.1.</p><p>To choose an appropriate N for different models, we empirically set the honey neuron number per layer-N = max(2, p%&#215;N m N l ), where p is a given budget (percentage) of honey neurons that can be used in defense, N m is the total number of neurons in the model and N l is the number of selected layers. These parameters determine the defense performance and time/storage overhead. In our evaluation, we consider the following typical defense budget: the defense incurred time overhead is no more than 10% of a single inference time in any model, and the proportion of all se- lected neurons in any model is &lt; 1%, except for ResNet-20 where 2.8% are honey neurons because it only contains &#8764; 700 neurons/feature-maps). We set N l to 10 for all DNN models. Note that, AlexNet only has 8 layers, so we embed honey neurons to all layers of AlexNet. For honey neuron embedding, we use a small expanding coefficient (e.g., &#947; = 2) to ensure stealthiness and maintain original accuracy for all models.</p><p>To find a suitable expanding coefficient for each model, our practice is to start with a relatively larger value (e.g., &#947; = 4). If it incurs a noticeable impact on model accuracy, or all honey neurons are top-ranked, we then gradually reduce its value until not all honey neurons are top-ranked and the accuracy drop is acceptable (e.g., less than 3%). The detailed defense parameter settings are listed in Table <ref type="table">7</ref> of Appendix B.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.4">Evaluation Metrics</head><p>We focus on three evaluation metrics: 1) detection rate is defined as the ratio between the number of models correctly identified as malicious and the total number of malicious models. A higher detection rate indicates better defense efficiency; 2) mitigation success rate is calculated as the number of models with accuracy recovered close to original level after defense divided by the total number of malicious models.</p><p>Here we define a successful mitigation empirically if the accuracy difference between the recovered model and clean model (without trapdoor) is less than 3%; 3) trapping rate is the proportion of trapped bits out of the total number of bit-flips injected by attackers. Intuitively, a higher trapping rate indicates a better mitigation success rate.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Results and Analysis</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2.1">Impact on Performance</head><p>If not properly selected and embedded, honey neurons can cause model accuracy drop. We first evaluate how NeuroPots impacts the model accuracy. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2.2">Detection and Mitigation Effectiveness</head><p>Trapping Efficiency. Table <ref type="table">2</ref> shows that our defense can trap most bit-flips in the attack chain based on three bit-flip chains generated by BFA on the retraining-based trapdoored model of VGG-16. For example, our model captures 10 out of 13 bit-flips in the first attack chain and achieves a 76.92% trapping rate. Although our defense might not trap all faults from attackers, the model's accuracy can still be recovered to the original level in the fault mitigation stage. This is because our defense always traps the most destructive bit-flips of the attack chain to interrupt the critical "attack path". For example, in Table <ref type="table">2</ref>, the first three bit-flips are trapped by our defense in all three attack chains. The attackers always search for the most vulnerable bits of current model iteratively. For each iteration, the current most vulnerable bit is on top of the prior bit-flips to maximize accuracy drop. If we can recover the first few critical bit-flips, the remaining bit-flips are less important. Detection and Mitigation Performance. As Table <ref type="table">3</ref> shows, our trapdoor-enabled defense can achieve a very high detection rate (100% in most cases) for three types of basic attacks and four types of adaptive attacks in all DNN models. Table <ref type="table">3</ref> also reports the mitigation success rate in different DNN models. On the one hand, for three types of basic attacks, our defense achieves a considerably high mitigation success rate in all DNN models. This indicates that most faulty models' accuracy are recovered to the original level at run-time. In particular, our defense achieves a similar mitigation success rate for BFA and T-BFA (90.33% vs. 90% on average) but a slightly worse performance for DeepHammer in most cases (85.33% on average). This is because DeepHammer has the one-bit-flip-per-page constraint, and therefore, the injected bitflips are more scattered and a lower portion of them is trapped. On the other hand, for four types of adaptive attacks, our defense only achieves slightly worse mitigation effectiveness than the basic version (i.e., Deep Hammer) due to the randomness of selecting honey neurons. Such minor performance degradation means that these strong adaptive attacks have a very small chance to bypass a few honey weights/neurons.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2.3">Time and Storage Overhead</head><p>One of our design goals is to minimize the overhead of the trapdoor-enabled defense model. Table <ref type="table">4</ref> shows the time spent on a model inference with and without the trapdoor, as well as the additional storage overhead. For time cost, our defense only takes slightly more time than the baseline due to involving fault detection and mitigation in the inference pipeline. For example, in VGG-16, it costs an additional 0.07ms (total 4.96ms) compared to the baseline (4.89ms). For networks with convolutional layers only (e.g. ResNet-20), the time over-  head is slightly increased but is &lt; 10% due to having more honey feature maps. For storage cost, it requires almost negligible extra memory (e.g., 99KB or 0.5% in ResNet-34), to store a small number of clean honey weights for all DNN models. Overall, compared to traditional passive defense approaches, our proactive approach can effectively reduce the number of weights needed for fault detection and mitigation under BFAs, and incurs very low time and storage overhead.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Discussion</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.1">Ablation Study</head><p>In Section 6.1.3 we present general guidelines for selecting three key hyper-parameters for NeuroPots-based trapdoor defense. Without loss of generality, in this section we analyze how these parameters impact our model performance using ResNet-20 and CIFAR-10 dataset. Impact of Expanding Coefficient &#947;. Fig. <ref type="figure">5</ref>(a) shows how expanding coefficient impacts the trapping effectiveness under basic and adaptive attacks. For most of the attacks (e.g., Deep Hammer and BBA), a larger &#947; could significantly increase the trapping effectiveness. However, we can also observe that a larger &#947; decreases the trapping rate for R-NBA-b. This is because the weight gradients of honey neurons increases as &#947; grows. Once the gradients become large enough, these honey neurons will fall into the higher-ranked category, which R-NBA-b can effectively bypass. Therefore, to ensure a decent detection and mitigation performance for all attacks, we need to choose an appropriate &#947; (generally less than 3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Impact of Number of Honey</head><p>Neurons Per Layer N. In Fig. <ref type="figure">5</ref>(b), we observe that more honey neurons in each selected layer improves the trapping rate for all attacks. However, having too many honey neurons suffers from higher time and storage overhead. For example, when N = 5, storing honey weights brings 3.38% extra storage overhead. Therefore, an appropriate number of honey neurons in each layer is essential to balance trapping effectiveness and overhead.</p><p>Impact of Number of Selected Layers N l (or Honey Neuron Distribution). Fig. <ref type="figure">5(c</ref>) shows how N l impacts the trapping rate. In this experiment, we keep the total number of honey neurons unchanged (i.e., 20 honey neurons/featuremaps) and spread honey neurons evenly across each selected layer. For example, if the number of selected layers is 4, we select 5 honey neurons in each layer for the first 4 layers in the model. We found that distributing honey neurons across more layers can increase trapping rates for most attacks. However, for R-NBA-b, we observe that too scattered honey neurons could hurt trapping effectiveness (e.g., when the number of selected layers is larger than 10). This is because when N l becomes too large, some layers may have only one honey neuron, and therefore R-NBA-b can bypass it so that the entire layer becomes unprotected. Therefore, we use at least 2 honey neurons in each selected layer with an appropriate number of selected layers (e.g., 10 layers) to ensure trapping effectiveness for all attacks.   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.2">Comparison and Discussion</head><p>Influence of Scattered Flips. We now discuss the impact of scattered bit-flips on defense performance. Keeping the onebit-per-page constraint unchanged, we vary the page size to control the degree of scattering bit-flips in Deep Hammer. The default DRAM page size in Deep Hammer is 4K. As Table <ref type="table">5</ref> shows, the detection rate remains 100% for all page sizes. The trapping rate of trapdoor defense remains stable beyond 4K page size. Similarly behavior is observed for mitigation success rate. As the page size grows, each page will contain more honey neurons. Although bit-flips becomes more scattered in larger pages, having more honey neurons in each page can effectively trap the bit-flips and stabilize the trapping rate (and hence mitigation rate). Overall, more scattered bit-flips only have a limited impact on our defense.</p><p>Comparison to Natural "Neural Pots". We inspect the relationship between natural "neural pots" (neurons with naturally very high activation with no additional embedding operation) and NeuroPots. To make a fair comparison, for natural "neural pots", we use the same configuration as NeuroPots (i.e., N = 2, N l = 10 in ResNet-20). As Fig. <ref type="figure">6</ref> shows, the trapping effectiveness of natural "neural pots" is much lower than NeuroPots, especially for adaptive attacks such as BBA. This is because in natural "neural pots", without the embedding operation, there is no guarantee that the selected highly activated neurons remain highly activated for all inputs. Addi-tionally, in some trials, those highly activated neurons can be bypassed by well-designed adaptive attacks. Therefore, the stealthy honey neuron embedding in NeuroPots is necessary.</p><p>Comparison to Random Neuron Monitoring and Runtime Analysis. A simple baseline defense could be monitoring neurons randomly to detect attacks. Fig. <ref type="figure">7</ref> compares our trapdoor defense with this random neuron monitoring defense. To achieve a relatively high detection rate (e.g., &gt; 90%), this baseline needs to monitor 50% of neurons at least, resulting in a high time overhead. For example, an extra 1.82ms is needed for an inference that normally takes 1.13ms to monitor 50% of neurons, totaling 2.95ms or 2.6&#215; that original inference time-1.13ms). Compared to random neuron monitoring, trapdoor defense only needs to monitor 2.8% of neurons with 0.11ms additional time (e.g., total 1.24ms, 9.7% of the inference time) to achieve a 100% detection rate.</p><p>Comparison to Existing Detection Methods. One of the key capabilities of NeuroPots is to detect BFAs accurately in real-time at extremely low cost, hence we also experimentally compare trapdoor defense with detection-based defense. Since existing detection can be roughly divided into two categories: 1) machine learning-based-training a simple model for fault detection <ref type="bibr">[22,</ref><ref type="bibr">37]</ref>; 2) signature-based-embed unique signatures, e.g., watermark or checksum, into the original model and then compute and compare signatures on the fly to detect faults <ref type="bibr">[38,</ref><ref type="bibr">39,</ref><ref type="bibr">40]</ref>. Without loss of generality, we choose two representative solutions in the respective category for the comparison: weight encoding detection (WED) <ref type="bibr">[22]</ref> and DeepAttest (a.k.a., Fingerprint) <ref type="bibr">[38]</ref>. In particular, WED applies weight sensitivity analysis to pick weights from a few most sensitive layers to train a single-layer perception to generate the detecting secret-key, while Fingerprint utilizes a shredder storage format to randomly select weights to embed fingerprint. Then the detecting secret-key or fingerprint extracted and computed from protected weights are compared  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.3">Defense Against Oracle Attack</head><p>An oracle attacker has all details about the trapdoor, including algorithm, the number and exact locations of honey neurons, such that the attacker can entirely circumvent NeuroPots and crush the trapdoor defense. We use this strongest attack which is often impractical, to test the upper bound of our defense. We design a hybrid countermeasure to ensure that the oracle attacker cannot circumvent NeuroPots without spending an unreasonable amount of effort. So far, we have explained how NeuroPots can be used to craft honey neurons. In the hybrid countermeasure described below, we also use NeuroPots to enhance normal neurons that are vulnerable to BFAs and make them more robust. We name such neurons as enhanced neurons. We assume the oracle attacker is also fully aware of details about enhanced neurons in addition to honey neurons.</p><p>The rationality is that: if the attacker circumvents honey neurons which offer a shortcut for BFA, and enhanced neurons that indicate a much longer path for bit-flips, he or she will be forced to create a path from the (remaining) normal neurons insensitive to BFA. This could lead to the unreasonable amount of effort for attacker under real hardware constraints. Specifically, the hybrid countermeasure involves two steps: (1) Selecting honey neurons and normal neurons that need enhancement; and (2) Embedding honey neurons and enhanced neurons into the DNN model. In step 1, we first categorize neural network layers into three levels (high/mid/low) according their sensitivity to attack. A layer's sensitivity can be measured by the sum of the absolute value of its top n weight gradient. The larger it is, the more sensitive a layer is. We empirically choose the number of layers for each level. For high-sensitivity layer, we treat all neurons as honey neurons. For mid-sensitivity layers, we identify the top k h activated neurons (e.g., k h = 10%) as honey neurons. In addition, enhanced neurons are selected from the top k e activated neurons (e.g., k e = 50%) among the remaining 1k h neurons. We leave all neurons in low-sensitivity layers untouched (normal neurons). In step 2, we set a large expanding coefficient (&#947; &gt; 1) for honey neurons but a smaller one (0 &lt; &#947; &lt; 1) for enhanced neurons. The purpose is to suppress the sensitivity of enhanced neurons that might originally have high sensitivity.</p><p>We evaluate it against oracle DeepHammer attack based on the following settings: &#947; = 2 (&#947; = 0.5) for honey neurons (enhanced neurons), the 1st and 8th layers as the high-sensitivity layers, 2nd, 3rd, 4th layers as the mid-sensitivity layers, and the remaining layers as the low-sensitivity layers. As Table <ref type="table">6</ref> shows, the number of bit-flips needed by oracle attacker increases as the percentage of honey weights increases, for example, 17.6&#215; with merely 2.4% honey weights compared to clean models without trapdoor. We observe the similar trend in other models (See Appendix D). This means that the defense can significantly raise the bar for the attacker to exert faults injection physically at the cost of more honey neurons.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.4">Limitations</head><p>While our NeuroPots-enabled trapdoor defense can effectively trap bit-flips and repair the model online at a low cost, it has a few known limitations. First, the fine-tuning-based honey neuron embedding may incur a retraining cost in large and complex models. Although the retraining-free one-shot method lowers the cost, both methods still lead to marginal accuracy drop. Second, there is still time overhead added into the original inference, e.g. higher overhead in small models than large ones (9.7% for ResNet-20 v.s. 1.1% for MobileNet-V2). Third, while our defense well defeats a series of the state-ofthe-art bit-flip attacks which always attempt to tamper the model by flipping a minimum amount of weight bit-flips, including the latest BFA flipping even less bits within only one layer <ref type="bibr">[41]</ref>, our evaluated adaptive attacks are limited to weight gradient-based bit-flip attacks. Considering the existence of other attacks, e.g. bit-flip trojan attack by hacking model weights and input simultaneously <ref type="bibr">[42]</ref> (different threat model), and non-gradient attacks, the NeuroPots-based defense idea can be further explored along this direction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">Related Work</head><p>Deep Neural Networks (DNNs) are vulnerable to both dataand model-centric attacks. While the defenses against the former, such as evasion <ref type="bibr">[8,</ref><ref type="bibr">9,</ref><ref type="bibr">10]</ref>, poisoning <ref type="bibr">[43,</ref><ref type="bibr">44]</ref>, backdoor/trojan <ref type="bibr">[11,</ref><ref type="bibr">12,</ref><ref type="bibr">13,</ref><ref type="bibr">45,</ref><ref type="bibr">46]</ref> have been intensively studied, this does not hold for the latter exploiting hardware-based fault injection techniques <ref type="bibr">[14,</ref><ref type="bibr">15,</ref><ref type="bibr">16,</ref><ref type="bibr">17,</ref><ref type="bibr">47,</ref><ref type="bibr">48]</ref>. Along this direction, the recent BFAs crush quantized DNNs by leveraging greedy gradient-based bit search and precise row-hammer to flip the least number of weight bits stored in DRAMs.</p><p>There exist a few attempts to directly address BFA which can be grouped into two categories: fault-tolerance enhancement and fault detection. In the first category, it often requires costly binarization-aware training <ref type="bibr">[49]</ref> or BFA-aware weight reconstruction <ref type="bibr">[50]</ref>, leading to either noticeable accuracy drop or limited robustness improvement. For the second category, it either requires detecting all model weights <ref type="bibr">[38,</ref><ref type="bibr">40,</ref><ref type="bibr">51]</ref> to ensure detection rate or detecting weights within sensitive layers only <ref type="bibr">[22,</ref><ref type="bibr">39]</ref> through embedding signatures or training machine learning models for detection. They either suffer from high overhead or false negative detection. Moreover, they cannot realize high-accurate real-time model recovery. As evaluated in Section 7.2, NeuroPots-based trapdoor defense outperforms them in detection rate and overhead significantly.</p><p>Hardware countermeasures can also prevent physical fault injection channels. However, general rowhammer defenses such as Error-Correcting Code and Intel SGX can be broken by new types of rowhammer attacks <ref type="bibr">[52,</ref><ref type="bibr">53]</ref>, not to mention the non-trivial hardware modifications like additional/probabilistic refreshes, memory controller redesign, etc.</p><p>Honeypot-based proactive defense has been applied in many fields like IoT security <ref type="bibr">[54,</ref><ref type="bibr">55]</ref>, cloud security <ref type="bibr">[56,</ref><ref type="bibr">57]</ref>, and network security <ref type="bibr">[58,</ref><ref type="bibr">59]</ref>. Recently "honeypot" is also used to detect input-based adversarial attacks in deep learning <ref type="bibr">[60]</ref>. However, our work differs in two aspects from theirs: 1) problem-bit flips inside DNN models v.s. pertubations of input data outside DNNs; 2) approach-crafting honey neurons inside DNNs vs. input trigger-based backdoor training.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="9">Conclusion</head><p>In this work, we propose NeuroPots-a proactive defense concept against the emerging bit-flip attacks (BFAs) dedicated to quantized DNNs. Based on NeuroPots, we design a trapdoorenabled defense framework to efficiently detect and mitigate BFAs. Our method embeds a very few well-designed honey neurons as vulnerabilities into DNN models. These vulnerabilities are trapdoors that largely attract attackers to inject bitflips to the weights associated with honey neurons. Since most bit-flips are captured by trapdoors, defenders can detect faults and recover model accuracy effectively at run time. Extensive experimental results show that our trapdoor-enabled defense achieves high detection (mitigation) rate and extremely low overhead against a variety of static and adaptive attacks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A Number of Bit-flips under BFAs</head><p>In Table <ref type="table">8</ref>, we list the number of bit-flips generated by three basic BFAs and four adaptive attacks in clean models (w/o trapdoor) and the trapdoored models (w/ trapdoor) on various DNN structures. For basic attacks, we can observe that the number of bit-flips in the trapdoored model is less than the corresponding clean model on average for most DNN structures. For example, in ResNet-20, DeepHammer needs to flip an average of 16 bits to degrade the clean model's accuracy to random guessing, while it needs 5 bits on the trapdoored models. This is because our defense intentionally implants more vulnerable points into the model to attract attackers, which makes attackers achieve their goals more easily. Note different models have different resistance abilities to BFAs, causing varied numbers of needed bit-flips. Since attackers usually cannot know how many bit-flips are needed to achieve their goal in a clean targeted model, it is hard to perceive trapdoors by attackers. For adaptive attacks, we can observe that attackers need to flip more bit-flips to achieve their goal, compared to clean and trapdoored models under basic attacks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B Detailed Settings of Trapdoor Techniques</head><p>Detailed settings of the trapdoor techniques are listed in Table 7. We can observe that the optimal &#947; based on the guidelines described in Section 6.1.3 is generally <ref type="bibr">(1,</ref><ref type="bibr">3]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C Results of Defending against Adaptive Attacks with Different Attack Settings</head><p>We further evaluate the defense performance of the trapdoorenabled defense against various adaptive attacks by using three representative DNN models and two datasets-CIFAR-10 and ImageNet.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.1 Defending against BBA</head><p>As Fig. <ref type="figure">9</ref> shows, the detection rate of our defense is always 100%. When t = 90%, the mitigation success rate slightly drops, but the number of bit-flips of attack increases &gt; 12.2&#215; (we set the maximum number of bit-flips is 500 in our experiments), 13.9&#215;, 14.6&#215; compared to baseline on VGG-16, AlexNet, ResNet-34, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.2 Defending against NBA</head><p>Results for R-NBA. In Fig. <ref type="figure">10</ref>, we can observe that, when k b = 30%, the mitigation success rate of our defense decreases by &#8764; 20%, while the number of bit-flips of the attack rises by &gt; 12.2&#215;, 10.1&#215;, 13.9&#215; on VGG-16, AlexNet, ResNet-34, respectively.</p><p>Results for F-NBA. The results are shown in Fig. <ref type="figure">11</ref>. As Fig. <ref type="figure">11</ref> (a) shows, when test times is 100, the attacker only detects a small percentage of honey neurons on three models. Therefore, in <ref type="bibr">Fig 11 (b)</ref> and (c), we can observe that the detection rate of our defense is still 100%, and the mitigation success rate only has negligible degradation on three models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D Defending against Oracle Attack</head><p>Table <ref type="table">9</ref> reports the detailed settings and results of hybrid countermeasures against oracle attack to test the bound of our trapdoor defense. As a result, if oracle attacker bypasses our functional neurons (honey and enhanced neurons), the needed bit-flips (or the efforts needed in physical fault injection) of the attacker will increase by &gt; 12.2&#215;, 12.1&#215;, 13&#215; on VGG-16, AlexNet, ResNet-34, respectively.      </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>The online model repair is meaningful only if the recovered accuracy can be close to its original level. We do not consider such recovery for the oracle attacker due to a complete defense bypass.</p></note>
		</body>
		</text>
</TEI>
