<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Iteratively Enhanced Semidefinite Relaxations for Efficient Neural Network Verification</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>06/27/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10428158</idno>
					<idno type="doi">10.1609/aaai.v37i12.26744</idno>
					<title level='j'>Proceedings of the AAAI Conference on Artificial Intelligence</title>
<idno>2159-5399</idno>
<biblScope unit="volume">37</biblScope>
<biblScope unit="issue">12</biblScope>					

					<author>Jianglin Lan</author><author>Yang Zheng</author><author>Alessio Lomuscio</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[We propose an enhanced semidefinite program (SDP) relaxation to enable the tight and efficient verification of neural networks (NNs). The tightness improvement is achieved by introducing a nonlinear constraint to existing SDP relaxations previously proposed for NN verification. The efficiency of the proposal stems from the iterative nature of the proposed algorithm in that it solves the resulting non-convex SDP by recursively solving auxiliary convex layer-based SDP problems. We show formally that the solution generated by our algorithm is tighter than state-of-the-art SDP-based solutions for the problem. We also show that the solution sequence converges to the optimal solution of the non-convex enhanced SDP relaxation. The experimental results on standard benchmarks in the area show that our algorithm achieves the state-of-the-art performance whilst maintaining an acceptable computational cost.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>The area of verification of neural networks (NNs) is concerned with the development of methods and tools to establish whether a given NN satisfies a given specification on a given input <ref type="bibr">(Li et al. 2020;</ref><ref type="bibr">Liu et al. 2020)</ref>. If a NN model is verified to be robust, no adversarial attack exists for the model, input and perturbation under analysis <ref type="bibr">(Goodfellow, Shlens, and Szegedy 2014)</ref>. NN verification has been used in many areas including safety-critical systems <ref type="bibr">(Tran et al. 2020;</ref><ref type="bibr">Julian and Kochenderfer 2021;</ref><ref type="bibr">Kouvaros et al. 2021;</ref><ref type="bibr">Manzanas Lopez et al. 2021)</ref>.</p><p>One type of NN verification methods is the complete approach which guarantees no false negatives or false positives are generated. Existing complete approaches for NN verification build on mixed-integer linear programming (MILP) <ref type="bibr">(Bastani et al. 2016;</ref><ref type="bibr">Lomuscio and Maganti 2017;</ref><ref type="bibr">Tjeng, Xiao, and Tedrake 2019;</ref><ref type="bibr">Anderson et al. 2020;</ref><ref type="bibr">Botoeva et al. 2020)</ref> or satisfiability modulo theories <ref type="bibr">(Ehlers 2017;</ref><ref type="bibr">Katz et al. 2021</ref>). These complete verifiers can guarantee to resolve any verification query but often come at a high computational cost, thus not scaling well to large-scale NNs.</p><p>Another type of verification methods is the incomplete approach that involves overapproximations of the NN and has a better scalability to larger NNs than the complete approach. The existing incomplete verifiers involve abstraction and approximation via linear approximations (Henriksen and Lomuscio 2020, 2021; <ref type="bibr">Wang et al. 2021;</ref><ref type="bibr">Hashemi, Kouvaros, and Lomuscio 2021)</ref> or convex approximation <ref type="bibr">(Xu et al. 2020;</ref><ref type="bibr">Dvijotham et al. 2018;</ref><ref type="bibr">Wong and Kolter 2018;</ref><ref type="bibr">Chen et al. 2021;</ref><ref type="bibr">Raghunathan, Steinhardt, and Liang 2018;</ref><ref type="bibr">Fazlyab, Morari, and Pappas 2022;</ref><ref type="bibr">Batten et al. 2021)</ref>. Some incomplete verifiers can offer completeness guarantees by combining symbolic interval propagation or convex relaxation with input refinement (e.g., ReluVal <ref type="bibr">(Wang et al. 2018c</ref>)) or neuron refinement (e.g., Neurify <ref type="bibr">(Wang et al. 2018a)</ref>, VeriNet <ref type="bibr">(Henriksen and Lomuscio 2020)</ref>, DEEPSPLIT <ref type="bibr">(Henriksen and Lomuscio 2021)</ref>, &#946;-CROWN <ref type="bibr">(Wang et al. 2021</ref>) and OSIP <ref type="bibr">(Hashemi, Kouvaros, and Lomuscio 2021)</ref>), but requiring a very large numbers of splitting refinements.</p><p>This paper focuses on the incomplete approach. The success of incomplete methods hinges on the tightness of the approximations generated by the technique, because a looser approximation leads to more unsolvable verification queries and a lower scalability capability. The triangle relaxation <ref type="bibr">(Ehlers 2017)</ref> offers the tightest possible convex approximation for a single Rectified Linear Unit (ReLU) neuron and has been used for faster bound propagations <ref type="bibr">(Weng et al. 2018;</ref><ref type="bibr">Singh et al. 2019a;</ref><ref type="bibr">Tjandraatmadja et al. 2020;</ref><ref type="bibr">M&#252;ller et al. 2021)</ref>. These incomplete methods are based on polynomial-time solvable linear program (LP) problems and can achieve state-of-the-art (SoA) performance. However, their efficacy is fundamentally limited by the tightness of convex relaxations. This is known as the "convex relaxation barrier" indicating that even optimal convex relaxations on a single neuron fail to obtain tight approximation of the overall model <ref type="bibr">(Salman et al. 2019)</ref>. This problem can be approached in two ways. The first is applying convex relaxations to multiple neurons. The representative methods include DeepPoly <ref type="bibr">(Singh et al. 2019a)</ref>, kPoly <ref type="bibr">(Singh et al. 2019b</ref><ref type="bibr">), OptC2V (Tjandraatmadja et al. 2020)</ref>, and PRIMA <ref type="bibr">(M&#252;ller et al. 2021</ref>). The second is using stronger relaxations, among which the most promising ones achieving tightness and efficiency are based on semidefinite program (SDP) <ref type="bibr">(Raghunathan, Steinhardt, and Liang 2018;</ref><ref type="bibr">Fazlyab, Morari, and Pappas 2022;</ref><ref type="bibr">Batten et al. 2021)</ref>.</p><p>Empirically, the standard SDP relaxation (Raghunathan,</p><p>The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)</p><p>Steinhardt, and Liang 2018) is considerably tighter than the standard LP relaxations. However, as illustrated in <ref type="bibr">(Zhang 2020)</ref>, SDP relaxations become loose for multiple hidden layers. Linear cuts <ref type="bibr">(Batten et al. 2021)</ref> or nonconvex cuts <ref type="bibr">(Ma and Sojoudi 2020)</ref> were introduced to further tighten SDP relaxations. The standard SDP approaches <ref type="bibr">(Raghunathan, Steinhardt, and Liang 2018;</ref><ref type="bibr">Zhang 2020;</ref><ref type="bibr">Ma and Sojoudi 2020)</ref> achieve tighter relaxations than LPs, but incur a considerable extra computational effort, making them not scalable to larger models. To alleviate the computational cost, a memory-efficient first-order algorithm was introduced in <ref type="bibr">(Dathathri et al. 2020)</ref>. With the same aim, layerwise SDP relaxations were used in <ref type="bibr">(Batten et al. 2021;</ref><ref type="bibr">Newton and Papachristodoulou 2021)</ref> by exploiting the cascaded NN structures based on chordal graph decomposition <ref type="bibr">(Zheng, Fantuzzi, and Papachristodoulou 2021)</ref>. To the best of our knowledge, the LayerSDP method <ref type="bibr">(Batten et al. 2021</ref>) achieves the tightest relaxations by combining SDP relaxation with triangle relaxation <ref type="bibr">(Ehlers 2017)</ref>. Even so, it is observed in <ref type="bibr">(Batten et al. 2021</ref>) that the relaxation gap of LayerSDP is still considerable in large networks. This leads to an increased false negative rate as the model size grows and thus limits applicability of the approach.</p><p>In this paper, we advance the SDP approach for NN verification with a new relaxation and an efficient algorithm to solve it. The relaxation we propose combines the LayerSDP method <ref type="bibr">(Batten et al. 2021</ref>) with a set of valid nonlinear constraints, which gives a provably tighter relaxation than the SoA SDP methods. The inclusion of nonlinear constraints results in a non-convex SDP problem that is generally harder to solve than LayerSDP. We further develop an iterative algorithm to obtain the optimal solution of the non-convex SDP problem. The algorithm initialises from the LayerSDP solution and solves iteratively an auxiliary convex SDP problem. The auxiliary SDP problem is derived from the original non-convex SDP problem and its size is around the same as LayerSDP. Our theoretical analysis shows that the solution sequence produced by the iterative algorithm is tighter than the SoA SDP methods, and can converge to the optimal solution of the non-convex SDP problem. The experiments on various standard benchmarks confirm that the proposed SDP method is considerably tighter than the present SoA, whilst having competitive efficiency. Our previous work <ref type="bibr">(Lan, Zheng, and Lomuscio 2022)</ref> constructed effective linear cuts using the linear reformulation technique (RLT) and added them to LayerSDP <ref type="bibr">(Batten et al. 2021)</ref> to produce a tighter solution. These RLT-based linear cuts can be directly added to the SDP problems solved in this paper, meaning that the method proposed in this paper always gives a provably tighter solution than the method in <ref type="bibr">(Lan, Zheng, and Lomuscio 2022)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Preliminaries</head><p>We use the symbol R n to denote the n-dimensional Euclidean space, &#8741; &#8226; &#8741; &#8734; to denote the standard &#8467; &#8734; norm, and I b to denote a sequence of non-zero integers from 0 to b. We use diag(X) to stack the main diagonals of matrix X as a column, and &#8857; to refer to the element-wise product. We use 0 n&#215;m to denote a n &#215; m zero matrix and 1 n to denote a n &#215; 1 vector of ones. We use P i [z] to represent the elements of matrix P i corresponding to the vector or matrix z.</p><p>We focus on feed-forward ReLU NNs. A network f : R n0 &#8594; R n L+1 with L hidden layers, n 0 inputs and n L+1 outputs is defined as follows:</p><p>, where x 0 is the input, f (x 0 ) is the output, xi+1 &#8712; R ni+1 is the pre-activation vector, x i+1 &#8712; R ni+1 is the post-activation vector, W i &#8712; R ni+1&#215;ni is the weight and b i &#8712; R ni+1 is the bias. We focus on classification NNs whereby an input x 0 is assigned to the class j &#8902; whose corresponding output has the highest value: j &#8902; = arg max j=1,2,...,n L+1 f (x 0 ) j .</p><p>This paper contributes to solving the local robustness verification problem. Given a NN f and a clean input x under the &#8467; &#8734; -norm perturbation &#1013;, the local robustness problem concerns determining whether f is robust on x, i.e., whether f (x 0 ) j &#8902; -f (x 0 ) j &gt; 0 holds for all j = 1, 2, . . . , n L+1 with j &#824; = j &#8902; , and for all x 0 satisfying &#8741;x 0 -x&#8741; &#8734; &#8804; &#1013;. This verification problem can be formulated and solved as an optimisation problem <ref type="bibr">(Raghunathan, Steinhardt, and Liang 2018;</ref><ref type="bibr">Batten et al. 2021)</ref>:</p><p>where</p><p>and l i+1 and u i+1 are the lower and upper activation vector bounds that can be computed by using bound propagation methods <ref type="bibr">(Henriksen and Lomuscio 2020;</ref><ref type="bibr">Wang et al. 2018b)</ref>. Without loss of generality, we assume that the considered verification problem is feasible. We solve the problem (1) for every potential adversarial target j = 1, 2, . . . , n L+1 , where j &#824; = j &#8902; , to obtain the optimal objective value &#947; * . If &#947; * &gt; 0 in all the cases, the NN is verified to be robust on the input x under the adversarial input x 0 . Since the ReLU constraints in (1a) are nonlinear, the optimisation problem (1) is non-convex and in general hard to solve. A promising direction is converting problem (1) into a convex relaxation problem that can be efficiently solved. Solving the convex relaxation gives an optimal value &#947; * cvx that is a valid lower bound to &#947; * , i.e., &#947; * &#8805; &#947; * cvx . When &#947; * cvx &gt; 0, then &#947; * &gt; 0 always holds and the input is robust. Clearly, a tighter convex relaxation (i.e., with a smaller relaxation gap &#947; * -&#947; * cvx ) increases the number of verification queries that can be solved. Tightness is thus the key design objective of a convex relaxation method given that its computational cost is acceptable.</p><p>The LP relaxation <ref type="bibr">(Ehlers 2017</ref>) is a widely used convex relaxation method in the literature. Its key idea is using a triangle relaxation to approximate each ReLU constraint in (1a) with a convex hull:</p><p>where</p><p>The vectors &#251;i+1 and li+1 are the upper and lower bounds of the pre-activation vector xi+1 , respectively. Replacing the nonlinear constraint (1a) with the linear constraints in (2) leads to an LP relaxation to the non-convex optimisation problem (1). The LP is relatively easy to solve, but there is usually a large relaxation gap for many NNs, known as the convex relaxation barrier in <ref type="bibr">(Salman et al. 2019)</ref>. This paper aims to develop a tighter convex relaxation method based on semidefinite program (SDP) <ref type="bibr">(Parrilo 2000;</ref><ref type="bibr">Lasserre 2009</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Convex SDP Relaxations and a New</head><p>Non-convex Enhancement</p><p>In this section we describe a few SDP relaxations to the nonconvex optimisation problem (1), including the existing SDP methods and our new non-convex SDP relaxation.</p><p>Convex SDP Relaxations. The starting point is to observe that the nonlinear ReLU constraints in (1a) can equivalently be replaced by the set of linear and quadratic constraints:</p><p>Given a verification instance, we can know the strictly active neurons at each layer. If the j-th neuron, 1 &#8804; j &#8804; n i+1 , at layer i + 1 is strictly active, then for this neuron the set of constraints in (3) can equivalently be replaced by a single linear equality constraint x i+1 (j) = W i (j, :</p><p>The input constraints in ( <ref type="formula">1b</ref>) and (1c) can also be reformulated as the quadratic constraints:</p><p>where l 0 = x -&#1013;1 n0 and u 0 = x + &#1013;1 n0 .</p><p>The techniques of polynomial lifting <ref type="bibr">(Parrilo 2000;</ref><ref type="bibr">Lasserre 2009</ref>) can be used to reformulate the quadratic constraints in (3) and (4) as linear constraints in terms of new variables. Specifically, to couple all the ReLU constraints of the network together, we define a single positive semidefinite (PSD) matrix P as:</p><p>. By using (5), the constraints (3) and ( <ref type="formula">4</ref>) can be reformulated as linear constraints of P . This idea was first utilised for NN verification in <ref type="bibr">(Raghunathan, Steinhardt, and Liang 2018)</ref>. Dropping the rank-one constraint on P results in a global SDP relaxation to the non-convex optimisation problem (1), with a potentially smaller relaxation gap than the LP relaxation, as empirically observed in <ref type="bibr">(Raghunathan, Steinhardt, and Liang 2018)</ref>. However, the global SDP relaxation is very computationally expensive for large NNs due to high dimensionality of P , thus limiting applicability of the approach.</p><p>Thanks to the inherent cascading structure of feedforward NNs, the activation vector of layer i + 1 depends only on its preceding layer i, for all i &#8805; 0. We can exploit this cascading structure to derive a layer-based SDP relaxation with a set of PSD matrices that are of smaller sizes than the matrix P . The layer-based PSD matrices are defined as:</p><p>with</p><p>. By using (6), we reformulate (3) and (4) as a set of linear constraints on the elements of P i :</p><p>where xi+1 = [1, x T i+1 ] T . The constraint (7e) ensures inputoutput consistency <ref type="bibr">(Batten et al. 2021</ref>) and ( <ref type="formula">7f</ref>) is equivalent to (6) as shown in <ref type="bibr">(Horn and Johnson 2012)</ref>.</p><p>By replacing ( <ref type="formula">3</ref>) and ( <ref type="formula">4</ref>) with ( <ref type="formula">7</ref>) and dropping the rankone constraint rank(P i ) = 1, the non-convex problem ( <ref type="formula">1</ref>) is relaxed to a convex layer SDP <ref type="bibr">(Batten et al. 2021)</ref>: <ref type="formula">7b</ref>), ( <ref type="formula">7c</ref>), ( <ref type="formula">7d</ref>), (7e), (8a)</p><p>where (8c) is reformulated from the last inequality in (2) with</p><p>As shown in <ref type="bibr">(Batten et al. 2021)</ref>, the layer SDP relaxation (8) is considerably easier to solve than the global SDP relaxation <ref type="bibr">(Raghunathan, Steinhardt, and Liang 2018)</ref> and is also tighter, i.e., &#947; * GlobalSDP &#8804; &#947; * LayerSDP &#8804; &#947; * , by introducing the linear cut (8c). However, due to dropping the rank-one constraint, the relaxation gap of layer SDP is still considerable in large NNs, thereby limiting the scalability and applicability of the approach.</p><p>Non-convex Layer-based SDP Relaxation. A promising way to reduce the SDP relaxation gap is introducing constraints (or cuts) to enforce the rank conditions in (7f). Nonconvex cuts in the form of</p><p>are introduced in (Ma and Sojoudi 2020) to tighten the global SDP relaxation <ref type="bibr">(Raghunathan, Steinhardt, and Liang 2018)</ref>. The resulting non-convex SDP is solved via successively generating large numbers of linear cuts to approximate the nonconvex cuts. This method has limited scalability due to the underlying computationally demanding global SDP and the additional large number of linear cuts.</p><p>To alleviate the computational challenge, we take inspirations from recent advances on SDP relaxation for nonconvex QCQP (quadratically constrained quadratic program). In particular, a single nonlinear constraint in the form of c T (P -xx T )c = 0 is used for generic SDP relaxations in <ref type="bibr">(Luo, Bai, and Peng 2019)</ref>. This constraint must use the same vector c as the objective function, which limits its capability in reducing the relaxation gap <ref type="bibr">(Ma and Sojoudi 2020)</ref>. The method is thus undesirable for NN verification, because the vector c only has very few non-zero elements associated with the last hidden layer. Regarding the layer SDP relaxation (8), the vector c is only associated with the last PSD matrix P L-1 . A direct adoption of the nonlinear constraint in <ref type="bibr">(Luo, Bai, and Peng 2019)</ref> to (8) will then only be able to enforce the rank condition of P L-1 .</p><p>We propose a new type of nonlinear constraints that allows using a set of vectors to enforce the rank conditions of all PSD matrices P i , i &#8712; I L-1 , by exploiting the NN activation pattern, which will be explained in detail in the next section. In particular, we introduce a nonlinear constraint to each P i in ( <ref type="formula">8</ref>) and obtain the novel layer SDP relaxation: <ref type="formula">8b</ref>), ( <ref type="formula">8c</ref>), (9a)</p><p>, are verification instance based constant vectors to be constructed (see Algorithm 1). Hereby we explicitly write the vector x i (referring to the elements in the first column of P i , i.e., P i [x i ]) to ease the analysis.</p><p>The introduction of the constraints in (9b) results in a tighter SDP relaxation, as shown in Theorem 1. Theorem 1. Given a feasible verification instance, we have</p><p>Intuitively, the relations in Theorem 1 hold because we add the extra valid constraints in (9b) into the layer SDP relaxation (8). Due to the nonlinear constraint (9b), the new layer SDP relaxation (9) is non-convex and harder to solve than the original layer SDP relaxation (8). Our main technical contribution is to circumvent the non-convexity issue by developing an iterative algorithm. This iterative algorithm, detailed in the next section, recursively solves a convex SDP problem of around the same size as (8). The algorithm is able to generate an objective value that initialises from &#947; * LayerSDP and converges to &#947; * ncvxSDP for v i constructed in Algorithm 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>An Iterative Algorithm for Solving the Non-convex Layer SDP Relaxation</head><p>This section describes an iterative algorithm to compute the optimal solution of the non-convex layer SDP relaxation (9) at moderate computational cost. We first formulate an auxiliary convex layer SDP, then use it to develop an iterative algorithm and apply it to the NN verification problem.</p><p>Auxiliary SDP Problem. Although the non-convex layer SDP relaxation ( <ref type="formula">9</ref>) is generally hard to solve, its optimal objective value &#947; * ncvxSDP is bounded below by &#947; * LayerSDP , as shown in Theorem 1. This lower bound can be efficiently solved from the convex layer SDP relaxation (8). Hence, we can use &#947; * LayerSDP as a start point to search for the value of &#947; * ncvxSDP . This inspires us to generate an objective value sequence {&#947; (k) iter } by solving an auxiliary convex SDP problem recursively. Our aim is to generate the objective value sequence that is bounded by &#947;</p><p>iter &#8804; &#947; * ncvxSDP and can converge to &#947; * ncvxSDP . We want to ensure these bounds so that &#947; (k) iter is always tighter than &#947; * LayerSDP and remains as a valid lower bound to &#947; * . By having these properties, the sequence {&#947; (k) iter } can then be used for NN verification. The above analysis motivates us to consider the auxiliary SDP problem in the form of:</p><p>subject to (8a), (8b), (8c), (10a)</p><p>where</p><p>The weight &#945; is a user-specified positive constant. Its value is set as &#945; &gt; 1 to penalise more on the SDP relaxations of the first L -1 layers. This is useful to obtain a tighter NN output, as it is influenced by the SDP relaxations of the first L -1 layers. &#948; i , i &#8712; I L-2 , are user-given scalars. v i , i &#8712; I L-1 , are constant vectors to be constructed in Algorithm 1 and &#948; L-1 is a constant scalar to be constructed in Algorithm 2. They are constructed such that the constraints in (10b) and (10c) are always satisfied for any feasible solution to the non-convex layer SDP problem (9). It thus enables us to solve the auxiliary SDP problem to obtain a valid lower bound of &#947; * ncvxSDP . We construct the vector set {v i } L-1 i=0 from Algorithm 1 by using the strictly active neurons of the NN (whose input and output are equal), which generally exist for each test input. By doing so, we know (10b) and v T L-1 x L-1 = c T P L-1 [x L ] are always satisfied. Hence, we can solve the auxiliary SDP (10) to obtain v T L-1 x L-1 and thus the objective value &#947; iter = v T L-1 x L-1 + c 0 = c T P L-1 [x L ] + c 0 . We further show that solving this auxiliary SDP can obtain &#947; iter = &#947; * ncvxSDP , based on Proposition 2. Proposition 2. By constructing the set {v i } L-1 i=0 from Algorithm 1, the optimal objective value of the auxiliary SDP problem (10) has the properties: 1) &#947; * auxSDP &#8805; 0 for any constructed &#948; L-1 that satisfies (10c). 2) &#947; * auxSDP = 0 if and only if the optimal solution satisfies v</p><p>ncvxSDP , then solving the problem (10) gives the objective value</p><p>To achieve this, we propose an algorithm to iteratively update the value of &#948; L-1 and generate the objective value sequence {&#947; (k) iter } that converges to &#947; * ncvxSDP . Iterative Algorithm and Application to NN Verification. The iterative algorithm is based on solving the problem (10) at each iteration with the scalar &#948; L-1 that is changed with the Algorithm 1: Constructing the set of vectors</p><p>for j = 1, 2, . . . , n i+1 do 5:</p><p>if li+1 (j) &#8805; 0 then 6:</p><p>end for 9:</p><p>if i &#8804; L -2 then 10:</p><p>11:</p><p>else 12:</p><p>13:</p><p>end if 14: end for 15: Output:</p><p>iterations. The proposed iterative algorithm is summarised in Algorithm 2. The initial value of &#948; L-1 is set as &#948;</p><p>(1)</p><p>LayerSDP -c 0 , where &#947; * LayerSDP is the optimal objective value of the layer SDP relaxation (8). For each given &#948; (k) L-1 , the auxiliary SDP problem ( <ref type="formula">10</ref>) is solved to obtain the objective value &#947; (k) iter . At each iteration, the obtained optimal objective &#947; * (k)  auxSDP of problem ( <ref type="formula">10</ref>) is used to update the value of &#948; L-1 .</p><p>The iteration is terminated when &#947; * (k) auxSDP is smaller than a prescribed tolerance &#949; &#8805; 0. Algorithm 2 outputs the objective value &#947; (k0) iter for robustness verification of NNs. To apply Algorithm 2 to the problem of robustness verification, we first need to ensure that:</p><p>1) &#947;</p><p>iter is always a valid lower bound to &#947; * . 2) The sequence {&#947; (k) iter } converges to &#947; * ncvxSDP . The first item is concerned with soundness of the algorithm, i.e., whether &#947; (k) iter &gt; 0 implies that &#947; * &gt; 0 and the input is robust. The second item concerns the tightness of the algorithm, i.e., whether the sequence {&#947; </p><p>auxSDP can be used to check when the sequence {&#948; (k) L-1 } converges. This confirms the appropriateness of using &#947; * (k)  auxSDP &#8804; &#949; as the stopping criterion in Algorithm 2. We now proceed to analyse the properties of Algorithm 2: Proposed iterative algorithm 1: Input: NN parameters, {&#948; i } L-2 i=0 , &#945; and &#949;. 2: Initialise: Construct {v i } L-1 i=0 using Algorithm 1. Solve &#947; * LayerSDP from (8). Set &#948;</p><p>(1)</p><p>auxSDP and x</p><p>auxSDP &#8804; &#949; 7: Output: &#947; L-1 x L-1 under the parameter &#948; L-1 . Consider Algorithm 2, we know that:</p><p>Given the above, we can now state the soundness and tightness of Algorithm 2 in Theorem 5. Theorem 5. Given a feasible verification instance, the objective value sequence {&#947; Proof. By using Lemma 3 and the first property of Proposition 4, we know that the sequence {&#947;</p><p>iter } is monotonically increasing. By further using the second property of Proposition 4, we know that the sequence {&#947;</p><p>iter &#8804; &#947; * ncvxSDP . By construction and using (10c), we also know that {&#947;</p><p>The convergence proof follows directly from that of Lemma 3 by replacing &#948;</p><p>Theorem 5 shows that Algorithm 2 achieves an objective value sequence {&#947; (k) iter } that is always a valid lower bound to &#947; * ncvxSDP and &#947; * . Also, the objective values at all iterations are not worse than &#947; * LayerSDP and converge to &#947; * ncvxSDP in finite iterations. The finite-time convergence has been empirically observed, but how to derive an analytic upper bound for the maximal iterations is still an open question. The above analysis shows that the proposed iterative algorithm is efficient to solve the non-convex layer SDP problem (9), which would otherwise be hard to solve directly. We now show the application of Algorithm 2 to NN verification (see Figure <ref type="figure">1</ref>). In the picture the output "Verified robust" means that the instance studied is robust against the adversarial input, while the output "Undefined" represents the situation in which robustness cannot be determined. The iteration is executed only when &#947; * LayerSDP &#8804; 0, i.e., the layer SDP is unable to resolve the robustness query. As we show later, this approach reduces the computational cost, as the layer SDP approach can already verify a considerable number of instances. Also, it is clear that the use of our iterative algorithm can increase the number of instances that can be verified. In practice, the iteration is terminated whenever &#947;</p><p>auxSDP &#8804; &#949; is not reached yet. Computational Complexity. The computational cost in the method stems from solving the auxiliary SDP (10). This auxiliary SDP and the existing layer SDP <ref type="bibr">(Batten et al. 2021</ref>) and global SDP <ref type="bibr">(Raghunathan, Steinhardt, and Liang 2018)</ref> are all solved using the solver MOSEK <ref type="bibr">(Andersen and Andersen 2000)</ref> which implements an interior-point algorithm. Before calling MOSEK, the SDP problem is reformulated as the widely-used conic optimisation form where inequality constraints are converted into equalities <ref type="bibr">(Sturm 1999)</ref>. As shown in <ref type="bibr">(Nesterov 2003)</ref>, solving a standard SDP at each iteration of the interior-point algorithm requires O(n 3 m + n 2 m 2 + m 3 ) time and O(n 2 + m 2 ) memory, where n is the size of the largest semidefinite constraint and m is the number of equality constraints. For the global SDP, n = 1 + L-1 i=0 n i which is the sum of all layer dimensions. For the layer SDP and the auxiliary SDP (10), n = 1 + max i=0,...,L-1 (n i + n i+1 ) which is the largest size of P i and it can be significantly smaller than that in the global SDP. As demonstrated in <ref type="bibr">(Batten et al. 2021, Proposition 2)</ref>, it is always beneficial to remove the dead neurons in any verification instance before formulating the SDP. The layer SDP and the auxiliary SDP can increase the number of equality constraints and may offset the benefits of smaller semidefinite constraints; see <ref type="bibr">(Batten et al. 2021</ref>, Remark 1).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Experimental Evaluation</head><p>Settings. We conducted experiments on a Linux machine with an AMD Ryzen Threadripper 3970X 32-Core CPU, 256 GB RAM and a RTX 3090 GPU. The optimisation problems were modelled using the toolbox YALMIP <ref type="bibr">(Lofberg 2004</ref>) and solved using the SDP solver MOSEK. We used &#948; i = 1, i &#8712; I L-2 , &#945; = 1.0e5 and &#949; = 0 to run Algorithm 2.</p><p>Networks. We evaluated on several feed-forward ReLU NNs trained on the MNIST dataset (LeCun 1998) and CI-  Results. All experiments were run on the first 100 images of the datasets. We received the permission to run VeriNet (Henriksen and Lomuscio 2020) for obtaining the activation bounds of all benchmarks. We report the verified robustness (percentage of images verified to be robust) and runtime (average solver time for verifying an image) for each method.</p><p>Figure <ref type="figure">2</ref> shows the results of MLP-3 &#215; 50 under various &#1013; and methods IterSDP, LP, SDP-IP and LayerSDP. Our method IterSDP outperforms the baselines across all &#1013;, confirming the statement in Theorem 5. Notably, IterSDP improves the verified robustness up to the PGD bounds for several &#1013;. IterSDP requires more runtime (about twice) when compared to LayerSDP, but it is still faster than SDP-IP. This is expected since Algorithm 2 uses LayerSDP to initialise and solves the auxiliary SDP (10) whose size is similar to LayerSDP.</p><p>Table <ref type="table">1</ref> shows the comparative results for the standard benchmarks. The breakdown of the time cost of our method IterSDP is also reported. IterSDP is more precise than the baselines for all the networks, except MLP-LP, MLP-SDP and MLP-9 &#215; 100. For MLP-LP and MLP-SDP, both IterSDP and LayerSDP reach the PGD upper bounds, while for MLP-9&#215;100, IterSDP is only less precise than &#946;-CROWN. Remarkably, for most the networks, IterSDP achieved the verified accuracy that is close to or same as the PGD upper bound. It is also shown in <ref type="bibr">(Dathathri et al. 2020;</ref><ref type="bibr">Li et al. 2020</ref>) that the complete method MILP verifies 69% robust cases for MLP-SDP, 67% for MLP-8&#215;1024-0.1 and 7% for MLP-8 &#215; 1024-0.3, lower than 84%, 86% and 33% by our method respectively.</p><p>As expected, IterSDP needs more runtime than LayerSDP across all the networks, but is faster than SDP-IP. Neither SDP-IP nor SDP-FO could verify the large models MLP-8&#215;1024-0.1 and MLP-8&#215;1024-0.3. The results reported in <ref type="bibr">(Batten et al. 2021)</ref> showed that compared to LayerSDP, SDP-FO has a runtime that is much larger for MLP-Adv and MLP-LP, but smaller for MLP-SDP. Also, SDP-FO fails to verify MLP-6&#215;100, MLP-9&#215;100 and MLP-6&#215;200. These results confirm that our proposed IterSDP improves the verification precision, whilst retaining a competitive computational efficiency.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusions</head><p>Relaxation methods that provide tight approximations are required to be able to verify large NNs used in applications. SDP-based methods have been proposed in the literature to accomplish this goal; yet, the relaxation gap in the present SoA is still considerable when applying them to deep and large networks, resulting in SoA tools not being able to establish the result for many verification queries. Here we proposed a novel SDP relaxation to narrow the relaxation gaps and enable efficient NN verification. The method improves the tightness by integrating a nonlinear constraint to the present SoA layer SDP method. Our approach is computationally effective because, unlike previous approaches, the resulting non-convex SDP problem is solved by computing the solutions of auxiliary convex layer SDP problems in an iterative manner. As shown, the method always yields tighter relaxations than the layer SDP and the solution sequence iteratively converges to the optimal solution of the non-convex SDP relaxation. The experiments showed that the method results in SoA performance on all benchmarks commonly used in the area.</p></div></body>
		</text>
</TEI>
