<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Neural Network Verification with Proof Production</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>10/01/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10397532</idno>
					<idno type="doi"></idno>
					<title level='j'>Proceedings of the  22nd International Conference on Formal Methods In Computer-Aided Design (FMCAD)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Omri Isac</author><author>Clark Barrett</author><author>Min Zhang</author><author>Guy Katz</author><author>Alberto Griggio</author><author>Neha Rungta</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Deep neural networks (DNNs) are increasingly being employed in safety-critical systems, and there is an urgent need to guarantee their correctness. Consequently, the verification community has devised multiple techniques and tools for verifying DNNs. When DNN verifiers discover an input that triggers an error, that is easy to confirm; but when they report that no error exists, there is no way to ensure that the verification tool itself is not flawed. As multiple errors have already been observed in DNN verification tools, this calls the applicability of DNN verification into question. In this work, we present a novel mechanism for enhancing Simplex-based DNN verifiers with proof production capabilities: the generation of an easy-to-check witness of unsatisfiability, which attests to the absence of errors. Our proof production is based on an efficient adaptation of the well-known Farkas' lemma, combined with mechanisms for handling piecewise-linear functions and numerical precision errors. As a proof of concept, we implemented our technique on top of the Marabou DNN verifier. Our evaluation on a safety-critical system for airborne collision avoidance shows that proof production succeeds in almost all cases and requires only minimal overhead.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Machine learning techniques, and specifically deep neural networks (DNNs), have been achieving groundbreaking results in solving computationally difficult problems. Nowadays, DNNs are state-of-the-art tools for performing many safetycritical tasks in the domains of healthcare <ref type="bibr">[29]</ref>, aviation <ref type="bibr">[45]</ref> and autonomous driving <ref type="bibr">[19]</ref>. DNN training is performed by adjusting the parameters of a DNN to mimic a highly complex function over a large set of input-output examples (the training set) in an automated way that is mostly opaque to humans.</p><p>The Achilles heel of DNNs typically lies in generalizing their predictions from the finite training set to an infinite input domain. First, DNNs tend to produce unexpected results on inputs that are considerably different from those in the training set; and second, the input to the DNN might be perturbed by sensorial imperfections, or even by a malicious adversary, again resulting in unexpected and erroneous results. These weaknesses have already been observed in many modern DNNs <ref type="bibr">[37]</ref>, <ref type="bibr">[64]</ref>, and have even been demonstrated in the real world <ref type="bibr">[30]</ref> -thus hindering the adoption of DNNs in safety-critical settings.</p><p>In order to bridge this gap, in recent years, the formal methods community has started devising techniques for DNN verification (e.g., <ref type="bibr">[2]</ref>, <ref type="bibr">[11]</ref>, <ref type="bibr">[13]</ref>, <ref type="bibr">[31]</ref>, <ref type="bibr">[32]</ref>, <ref type="bibr">[40]</ref>, <ref type="bibr">[41]</ref>, <ref type="bibr">[53]</ref>, <ref type="bibr">[58]</ref>, <ref type="bibr">[61]</ref>, <ref type="bibr">[62]</ref>, <ref type="bibr">[66]</ref>, <ref type="bibr">[68]</ref>, <ref type="bibr">[73]</ref>, among many others). Typically, DNN verification tools seek to prove that outputs from a given set of inputs are contained within a safe subspace of the <ref type="url">https://doi.org/10.34727/2022/isbn.978-3-85448-053-2 9</ref> output space, using various methods such as SMT solving <ref type="bibr">[1]</ref>, <ref type="bibr">[16]</ref>, <ref type="bibr">[23]</ref>, abstract interpretation <ref type="bibr">[32]</ref>, MILP solving <ref type="bibr">[65]</ref>, and combinations thereof. Notably, many modern approaches <ref type="bibr">[50]</ref>, <ref type="bibr">[53]</ref>, <ref type="bibr">[55]</ref>, <ref type="bibr">[65]</ref> involve a search procedure, in which the verification problem is regarded as a set of constraints. Then, various input assignments to the DNN are considered in order to discover a counter-example that satisfies these constraints, or to prove that no such counter-example exists.</p><p>Verification tools are known to be as prone to errors as any other program <ref type="bibr">[44]</ref>, <ref type="bibr">[72]</ref>. Moreover, the search procedures applied as part of DNN verification typically involve the repeated manipulation of a large number of floating-point equations; this can lead to rounding errors and numerical stability issues, which in turn could potentially compromise the verifier's soundness <ref type="bibr">[12]</ref>, <ref type="bibr">[44]</ref>. When the verifier discovers a counter-example, this issue is perhaps less crucial, as the counter-example can be checked by evaluating the DNN; but when the verifier determines that no counter-example exists, this conclusion is typically not accompanied by a witness of its correctness.</p><p>In this work, we present a novel proof-production mechanism for a broad family of search-based DNN verification algorithms. Whenever the search procedure returns UNSAT (indicating that no counter-example exists), our mechanism produces a proof certificate that can be readily checked using simple, external checkers. The proof certificate is produced using a constructive version of Farkas' lemma, which guarantees the existence of a witness to the unsatisfiability of a set of linear equations -combined with additional constructs to support the non-linear components of a DNN, i.e., its piecewise-linear activation functions. We show how to instrument the verification algorithm in order to keep track of its search steps, and use that information to construct the proof with only a small overhead.</p><p>For evaluation purposes, we implemented our proofproduction technique on top of the Marabou DNN verifier <ref type="bibr">[50]</ref>. We then evaluated our technique on the AC A S X u set of benchmarks for airborne collision avoidance <ref type="bibr">[46]</ref>, <ref type="bibr">[48]</ref>. Our approach was able to produce proof certificates for the safety of various AC A S X u properties with reasonable overhead (5.7% on average). Checking the proof certificates produced by our approach was usually considerably faster than dispatching the original verification query.</p><p>The main contribution of our paper is in proposing a proof-production mechanism for search-based DNN verifiers, which can substantially increase their reliability when de-</p><p>termining unsatisfiability. However, it also lays a foundation for a conflict-driven clause learning (CDCL) <ref type="bibr">[74]</ref> verification scheme for DNNs, which might significantly improve the performance of search-based procedures (see discussion in Sec. IX). The rest of this paper is organized as follows. In Sec. II we provide relevant background on DNNs, formal verification, the Simplex algorithm, and on using Simplex for search-based DNN verification. In Sec. III, I V and V, we describe the proofproduction mechanism for Simplex and its extension to DNN verification. Next, in Sec. VI, we briefly discuss complexitytheoretical aspects of the proof production. Sec. VII details our implementation of the technique and its evaluation. We then discuss related work in Sec. VIII and conclude with Sec. IX.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. BAC K G RO U N D</head><p>Deep Neural Networks. Deep neural networks (DNNs) <ref type="bibr">[36]</ref> are directed graphs, whose nodes (neurons) are organized into layers. Nodes in the first layer, called the input layer, are assigned values based on the input to the DNN; and then the values of nodes in each of the subsequent layers are computed as functions of the values assigned to neurons in the preceding layer. More specifically, each node value is computed by first applying an affine transformation to the values from the preceding layer and then applying a non-linear activation function to the result. The final (output) layer, which corresponds to the output of the network, is computed without applying an activation function.</p><p>One of the most common activation functions is the rectified linear unit (ReLU), which is defined as:</p><p>When b &gt; 0, we say that the ReLU is in the active phase; otherwise, we say it is in the inactive phase. For simplicity, we restrict our attention here to ReLUs, although our approach could be applied to other piecewise-linear functions (such as max pooling, absolute value, sign, etc.). Non piecewise-linear functions, such as as sigmoid or tanh, are left for future work. Formally, a DNN N : R m &#8594; R k , is a sequence of n layers L0 , ..., Ln-1 where each layer L i consists of si N nodes, denoted v 1 , ..., v i . The assignment for the j t h node in the 1 &#8804; i &lt; n -1 layer is computed as</p><p>and neurons in the output layer are computed as:</p><p>where wi,j,l and p j are (respectively) the predetermined weights and biases of N . We set s0 = m and treat v 1 , ..., v m as the input of N .</p><p>A simple DNN with four layers appears in Fig. <ref type="figure">1</ref>. For simplicity, the p j parameters are all set to zero and are ignored. DNN Verification and Proofs. Given a DNN N : R m &#8594; R k and a property P : R m + k &#8594; { T , F } , the DNN verification problem is to decide whether there exist x R m and y R k such that ( N ( x ) = y) P (x, y) holds. If such x and y exist, we say that the verification query &#10216;N , P &#10217; is satisfiable (SAT); and otherwise, we say that it is unsatisfiable (UNSAT). For example, given the toy DNN from Fig. <ref type="figure">1</ref>, we can define a property P : P (x, y ) Typically, P represents the negation of a desired property, and so an input x which satisfies the query is a counterexample -whereas the query's unsatisfiability indicates that the property holds. In this work, we follow mainstream DNN verification research <ref type="bibr">[53]</ref>, <ref type="bibr">[68]</ref> and focus on properties P that are a conjunction of linear lower-and upper-bound constraints on the neurons of x and y. It has been shown that even for such simple properties, and for DNNs that use only the ReLU activation function, the verification problem is NPcomplete <ref type="bibr">[48]</ref>.</p><p>A proof is a mathematical object that certifies a mathematical statement. In case a DNN verification query is SAT, the input x for which P holds constitutes a proof of the query's satisfiability. Our goal here is to generate proofs also for the UNSAT case, which, to the best of our knowledge, is a feature that no DNN verifier currently supports <ref type="bibr">[12]</ref>.</p><p>Verifying DNNs via Linear Programming. Linear Programming (LP) <ref type="bibr">[22]</ref> is the problem of optimizing a linear function over a given convex polytope. An LP instance over variables V = [x1, . . . , xn] R n contains an objective function c &#8226; V to be maximized, subject to the constraints A &#8226; V = b for some A Mm&#215;n (R), b R m , and l &#8804; V &#8804; u for some l, u ( R { &#177; &#8734; } ) n . Throughout the paper, we use l (x i ) and u(xi ), to refer to the lower and upper bounds (respectively) of xi . LP solving can also be used to check the satisfiability of constraints of the form</p><p>The Simplex algorithm <ref type="bibr">[22]</ref> is a widely used technique for solving LP instances. It begins by creating a tableau, which is equivalent to the original set of equations AV = b. Next, Simplex selects a certain subset of the variables, B {x1 , . . . , xn }, to act as the basic variables; and the tableau is considered as representing each basic variable x i B as a linear combination of non-basic variables, x i = cj &#8226; x j .</p><p>We use A i , j to denote the coefficient of a variable x j in the tableau row that corresponds to basic variable xi . Apart from the tableau, Simplex also maintains a variable assignment that satisfies the equations of A, but which may temporarily violate the bound constraints l &#8804; V &#8804; u. The assignment for a variable x i is denoted &#945;(xi ). After initialization, Simplex begins searching for an assignment that simultaneously satisfies both the tableau and bound constraints. This is done by manipulating the set B, each time swapping a basic and a non-basic variable. This alters the equations of A by adding multiples of equations to other equations, and allows the algorithm to explore new assignments. The algorithm can terminate with a SAT answer when a satisfying assignment is discovered or an UNSAT answer when: (i) a variable has contradicting bounds, i.e., l (x i ) &gt; u(xi ); or (ii) one of the tableau equations x i = cj &#8226; x j implies that x i can never satisfy its bounds. The Simplex algorithm is sound, and is also complete if certain heuristics are used for selecting the manipulations of B <ref type="bibr">[22]</ref>.</p><p>A detailed calculus for the version of Simplex that we use appears in the extended version of this paper <ref type="bibr">[42]</ref>. LP solving is particularly useful in the context of DNN verification, and is used by almost all modern tools (either natively <ref type="bibr">[48]</ref>, or by invoking external solvers such as GL PK <ref type="bibr">[54]</ref> or Gurobi <ref type="bibr">[39]</ref>). More specifically, a DNN verification query can be regarded as an LP instance with bounded variables that represents the property P and the affine transformations within N , combined with a set of piecewise-linear constraints that represent the activation functions. We demonstrate this with an example, and then explain how this formulation can be solved.</p><p>Recall the toy DNN from Fig. <ref type="figure">1</ref>, and property P that is used for checking whether there exists an input x in the range [2, 3] &#215; [-1, 1] for which N produces an output y in the range [0.25, 0.5]. We use b1, f1 to denote the input and output to node v1; b2, f2 for the input and output of v2; x1 and x2 to denote the network's inputs, and y to denote the network's output. The linear constraints of the network yield the linear equations b1 = x1 -x2, b2 = -2f1 , and y = f2 (which we name e<ref type="foot">foot_0</ref> , e 2 , and e 3 , respectively). The restrictions on the network's input and output are translated to lower and upper bounds: 2 &#8804; x1 &#8804; 3, -1 &#8804; x2 &#8804; 1, 0.25 &#8804; y &#8804; 0.5. The third equation implies that 0.25 &#8804; f2 &#8804; 0.5, which in turn implies that b2 &#8804; 0.5. Assume we also restrict: -0.5 &#8804; b2, -0.5 &#8804; b1 &#8804; 0.5, 0 &#8804; f1 &#8804; 0.5, . Together, these constraints give rise to the linear program that appears in Fig. <ref type="figure">2</ref>. The remaining ReLU constraints, i.e. f i = ReLU(bi ) for i {1, 2}, exist alongside the LP instance. Together, query &#966; is equivalent to the DNN verification problem that we are trying to solve.</p><p>Using this formulation, the verification problem can be solved using Simplex, enhanced with a case-splitting approach for handling the ReLU constraints <ref type="bibr">[17]</ref>, <ref type="bibr">[48]</ref>. Intuitively, we first invoke the LP solver on the LP portion of the query; and if it returns UNSAT, the whole query is UNSAT. Otherwise, if it finds a satisfying assignment, we check whether this assignment also satisfies the ReLU constraints. If it does, then the whole query is SAT. Otherwise, case splitting is applied in order to split the query into two different subqueries, according to the two phases of the ReLU function. 1  Specifically, in one of the sub-queries, the LP query is adjusted to enforce the ReLU to be in the active phase: the equation f = b is added, along with the bound b &#8805; 0. In the other subquery, the inactive phase is enforced: b &#8804; 0, 0 &#8804; f &#8804; 0. This effectively reduces the ReLU constraint into linear constraints in each sub-query. This process is then repeated for each of the two sub-queries. Case-splitting turns the verification procedure into a search tree <ref type="bibr">[48]</ref>, with nodes corresponding to the splits that were applied. The tree is constructed iteratively, with Simplex invoked on every node to try and derive UNSAT or find a true satisfying assignment. If Simplex is able to deduce that all leaves in the search tree are UNSAT, then so is the original query. Otherwise, it will eventually find a satisfying assignment that also satisfies the original query. This process is sound, and will always terminate if appropriate splitting strategies are used <ref type="bibr">[22]</ref>, <ref type="bibr">[48]</ref>. Unfortunately, the size of the search tree can be exponential in the number of ReLU constraints; and so in order to keep the search tree small, case splitting is applied as little as possible, according to various heuristics that change from tool to tool <ref type="bibr">[55]</ref>, <ref type="bibr">[62]</ref>, <ref type="bibr">[68]</ref>. In order to reduce the number of splits even further, verification algorithms apply clever deduction techniques for discovering tighter variable bounds, which may in turn rule out some of the splits a-priori. We also discuss this kind of deduction, which we refer to as dynamic bound tightening, in the following sections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I I I . PROOF PRODUCTION OV E RV I E W</head><p>A Simplex-based verification process of a DNN is treeshaped, and so we propose to generate a proof tree to match</p><p>it. Within the proof tree, internal nodes will correspond to case splits, whereas each leaf node will contain a proof of unsatisfiability based on all splits performed on the path between itself and the root. Thus, a proof tree constitutes a valid proof of unsatisfiability if each of its leaves contains a proof that demonstrates that all splits so far lead to a contradiction. The proof tree might also include proofs for lemmas, which are valid statements for the node in which they reside and its descendants (lemmas are needed for supporting bound tightening, as we discuss later).</p><p>As a simple, intuitive example, we depict in Fig. <ref type="figure">3</ref> a proof of unsatisfiability for the query &#966; from Fig. <ref type="figure">2</ref>. The root of the proof tree represents the initial verification query, which is comprised of LP constraints and ReLU constraints. The fact that this node is not a leaf indicates that the Simplex-based verifier was unable to conclude UNSAT in this state, and needed to perform a case split on the ReLU node v1. The left child of the root corresponds to the case where ReLU v1 is inactive: the LP is augmented with additional constraints that represent the case split, i.e., f1 = 0 and b1 &#8804; 0. This new fact may now be used by the Simplex procedure, which is indeed able to obtain an UNSAT result. The node then contains a proof of this unsatisfiability: -1 0 0 . This vector instructs the checker how to construct a linear combination of the current tableau's rows, in a way that leads to a bound contradiction, as we later explain in Sec. V.</p><p>Fig. <ref type="figure">3:</ref> A proof tree example.</p><p>In the right child of the root, which represents v1's active phase, the constraints f1 = b1 and b1 &#8805; 0 are added by the split. This node is not a leaf, because the verifier performed a second case split, this time on v2. The left child represents v2's inactive phase, and has the corresponding constraints f2 = 0 and b2 &#8804; 0. This child is a leaf, and is marked with f2 , indicating that f2 is a variable whose bounds led to a contradiction. Specifically, f2 &#8805; 0.25 from &#966; and f2 = 0 from the case split are contradictory.</p><p>The last node (the rightmost leaf) represents v2's active phase, and has the constraints f2 = b2 and b2 &#8805; 0. Here, the node indicates that a contradiction can be reached from the current tableau, using the vector -2 1 0 -2 0 . In Sec. IV, we explain how this process works.</p><p>Because each leaf of the proof tree contains a proof of unsatisfiability, the tree itself proves that the original query is UNSAT. Note that many other proof trees may exist for the same query. In the following sections, we explain how to instrument a Simplex-based verifier in order to extract such proof trees from the solver execution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I V. S I M P L E X W I T H PROOFS A. Producing proofs for LP</head><p>We now describe our approach for creating proof trees, beginning with leaf nodes. We start with the following lemma: Lemma 1. If Simplex returns UNSAT, then there exists a variable with contradicting bounds; that is, there exists a variable x i V with lower and upper bounds l (x i ) and u(xi ), for which Simplex has discovered that l (x i ) &gt; u(xi ). This lemma justifies our choice of using contradicting bounds as proofs of unsatisfiability in the leaves of the proof tree. The lemma follows directly from the derivation rules of Simplex. Specifically, there are only two ways to reach UNSAT: when the input problem already contains inconsistent bounds l (x i ) &gt; u(xi ), or when Simplex finds a tableau row x i = cj &#8226; x j that gives rise to such inconsistent bounds.</p><p>The complete proof appears in the extended version of this paper <ref type="bibr">[42]</ref>.</p><p>We demonstrate this with an example, based on the query &#966; from Fig. <ref type="figure">2</ref>. Suppose that, as part of its Simplex-based solution process, a DNN verifier performs two case splits, fixing the two ReLUs to their active states: f1 = b1 b1 &#8805; 0 and f2 = b2 b2 &#8805; 0. This gives rise to the following (slightly simplified) system of equations: Then, the Simplex solver iteratively alters the set of basic variables, which corresponds to multiplying various equations by scalars and summing them to obtain new equations. At some point, the equation b2 = -2 x 1 + 2x2 is obtained (by computing -</p><p>At this point, the Simplex solver halts with an UNSAT notice. The reason is that b2 is currently assigned the value -2, which is below its lower bound of 0, and so its value needs to be increased. However, the equation, combined with the fact that x1 is pressed against its lower bound, while x2 is pressed against its upper bound, indicates that there is no slack remaining in order to increase the value of b2 (this corresponds to the Failure1 rule in the Simplex calculus described in the extended version of this paper <ref type="bibr">[42]</ref>). The key point is that the same equation could be used in deducing a tighter bound for b2:</p><p>b2</p><p>and a contradiction could then be obtained based on the contradictory facts 0 = l(b2) &#8804; b2 &#8804; -2. In other words, and as we formally prove in the extended version of this paper <ref type="bibr">[42]</ref>, any UNSAT answer returned by Simplex can be regarded as a case of conflicting lower and upper bounds.</p><p>Given Lemma 1, our goal is to instrument the Simplex procedure so that whenever it returns UNSAT, we are able to produce a proof which indicates that l (x i ) &gt; u(xi ) for some variable xi . To this end, we introduce the following adaptation of Farkas' Lemma <ref type="bibr">[67]</ref> to the Simplex setting, which states that a linear-sized proof of this fact exists. Lemma 2. Given the constraints A &#8226; V = 0 and l &#8804; V &#8804; u, where A M m &#215; n ( R ) and l, V, u R n , exactly one of these two options holds:</p><p>1) The SAT case: V R n such that A &#8226; V = 0 and l &#8804; V &#8804; u.</p><p>2) The UNSAT case: w R m such that for all l &#8804; V &#8804; u, w</p><p>&#8226; A &#8226; V &lt; 0, whereas 0 &#8226; w = 0. Thus, w is a proof of the constraints' unsatisfiability. Moreover, these vectors can be constructed during the run of the Simplex algorithm. This Lemma is actually a corollary of Theorem 3, which we introduce later. For a complete proof, see the extended version of this paper <ref type="bibr">[42]</ref>.</p><p>In our previous, UNSAT example, one possible vector is w = -2 1 0 -2 0 . Indeed, w &#8226; A &#8226; V = 0 gives us the equation -2 x 1 + 2 x 2 -b2 = 0. Given the lower and upper bounds for the participating variables, the largest value that the left-hand side of the equation can obtain is:</p><p>Therefore, no variable assignment within the stated bounds can satisfy the equation, indicating that the constraints are UNSAT.</p><p>Given Lemma 2, all that remains is to instrument the Simplex solver in order to produce the proof vector w on the fly, whenever a contradiction is detected. In case a trivial contradiction l (x i ) &gt; u(xi ) is given as part of the input query for some variable xi , we simply return "xi " as the proof (we later discuss also how to handle this case in the presence of dynamic bound tightenings). Otherwise, a nontrivial contradiction is detected as a result of an equation e &#8801; x i = cj &#8226; xj , which contradicts one of the input bounds of xi . In this case, no assignment can satisfy the equivalent equation cj &#8226; x j -x i = 0. Since the Simplex algorithm applies only linear operations to the input tableau, e is given by a linear combination of the original tableau rows.</p><p>Let coef (e) denote the Farkas vector of the equation e, i.e., the column vector such that coef (e) &#8226; A = e, and which proves unsatisfiability in this case. Our framework simply keeps track, for each row of the tableau, of its coefficient vector; and if that row leads to a contradiction, the vector is returned.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Supporting dynamic bound tightening</head><p>So far, we have only considered Simplex executions that do not perform any bound tightening steps; i.e., derive UNSAT by finding a contradiction to the original bounds. However, in practice, modern DNN solvers perform a great deal of dynamic bound tightening, and so this needs to be reflected in the proof.</p><p>We use the term ground bounds to refer to variable bounds that are part of the LP being solved, whether they were introduced by the original input, or by successive case splits, as we will explain in Sec. V. This is opposed to dynamic bounds, which are bounds introduced on the fly, via bound tightening. The ground bounds, denoted l, u R n , are used in explaining dynamic bounds, denoted l &#8242; , u &#8242; R n , via Farkas vectors.</p><p>For simplicity, we consider here a simple and popular version of bound tightening, called interval propagation <ref type="bibr">[25]</ref>, <ref type="bibr">[48]</ref>. Given an equation x i = cj &#8226; x j and current bounds j / B l (x) and u (x) for each of the variables (whether these are the ground bounds or dynamically tightened bounds themselves), a new upper bound for x can be derived:</p><p>x j / B , c j &gt; 0 x j / B , c j &lt; 0 (provided that the new bound is tighter, i.e., smaller, than the current upper bound for xi ). A symmetrical version exists for discovering lower bounds.</p><p>A naive approach for handling bound tightening is to store, each time a new bound is discovered, a separate proof that justifies it; for example, a Farkas vector for deriving the equation that was used in the bound tightening. However, a Simplex execution can include many thousands of bound tightenings -and so doing this would strain resources. Even worse, many of the intermediate bound tightenings might not even participate in deriving the final contradiction, and so storing them would be a waste.</p><p>In order to circumvent this issue, we propose a scheme in which we store, for each variable in the query, a single column vector that justifies its current lower bound, and another for its current upper bound. Whenever a tighter bound is dynamically discovered, the corresponding vector is updated; and even if other, previously discovered dynamic bounds were used in the derivation, the vector that we store indicates how the same bound can be derived using the ground bounds. Thus, the proof of the tightened bounds remains compact, regardless of the number of derived bounds; specifically, it requires only O(n &#8226; m) space overall. Formally, we have the following result: Theorem 3. Let A &#8226; V = 0 such that l &#8804; V &#8804; u be an LP instance, where A M m &#215; n ( R ) and l, V, u R n .</p><p>Let u &#8242; , l &#8242; R n represent dynamically tightened bounds of V . Then i [n] fu (xi ) , fl (xi ) R m such that fu ( x i ) &#8226; A and fl (xi ) &#8226; A can be used to efficiently compute u &#8242; (xi ), l &#8242; (xi ) from l and u. Moreover, vectors f u ( x i ) and f l ( x i ) can be constructed during the run of the Simplex algorithm.</p><p>When a Simplex procedure with bound tightening reaches an UNSAT answer, it has discovered a variable x i with l &#8242; (xi ) &gt; u &#8242; (xi ). The theorem guarantees that in this case we have two column vectors, f u ( x i ) and f l (xi ), which explain how u &#8242; (xi ) and l &#8242; (xi ) were discovered. We refer to these vectors as the Farkas vectors of the upper and lower bounds of xi , respectively. Because u &#8242; ( x i ) -l &#8242; ( x i ) is negative, the column vector w = f u ( x i ) -f l ( x i ) creates a tableau row which is always negative, making w R m a proof of unsatisfiability. The formal, constructive proof of the theorem appears in the extended version of this paper <ref type="bibr">[42]</ref>.</p><p>In order to maintain f u ( x i ) and f l ( x i ) during the execution of Simplex, whenever a tigher upper bound is tightened using Eq. 1, we update the matching Farkas vector:</p><p>where e is the linear equation used for tightening, and coef (e) is the column vector such that coef (e) &#8226; A = e. The lower bound case is symmetrical. To demonstrate the procedure, consider again the verification query from Fig. <ref type="figure">2</ref>. Assume the phases of v1, v2 have both been set to active, and that consequently two new equations have been added: e 4 : f1 = b1, e 5 : f2 = b2. In this example, we have five linear equations, so we initialize a zero vector of size five for each of the variable bounds. Now, suppose Simplex tightens the lower bound of b1 using the first equation e 1 : l&#8242; (b1</p><p>and thus we update fl (b1 ) : = f l ( x ) -fu (y ) + coef (e 1 ) = 0 0 0 0 0 + 0 0 0 0 0 + 1 0 0 0 0 = 1 0 0 0 0 since all f l and f u vectors have been initialized to 0 and coef (e) = 1 0 0 0 0 -which indicates that e 1 is simply the first row of the tableau.</p><p>We can now tighten bounds again, using the fourth row f1 = b1, and get l &#8242; (f1 ) : = l &#8242; (b1 ) = 1. We update fl (f1 ): f l ( f 1 ) : = fl (b1 ) + coef (e 4 ) = 1 0 0 0 0 + 0 0 0 1 0</p><p>To see that the Farkas vector can indeed explain the dynamically tightened bound, observe that the combination 1 0 0 1 0 of tableau rows gives the equation f1 = x1 -x 2 . We can then tighten the lower bound of f1 , using the ground bounds: l &#8242; (f1</p><p>This bound matches the one that we had discovered dynamically, though we derived it using ground bounds only.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. DNN V E R I FI C AT I O N W I T H PROOFS</head><p>A. Producing a proof-tree</p><p>We now discuss how to leverage the results of Sec. I V in order to produce the entire proof tree for an UNSAT DNN verification query. Recall that the main challenge lies in accounting for the piecewise-linear constraints, which affect the solving process by introducing case-splits.</p><p>Each case split performed by the solver introduces a branching in the proof tree -with a new child node for each of the linear phases of the constraint being split on -and introduces new equations and bounds. In the case of ReLU, one child node represents the active branch, through the equation f = b and bound b &#8805; 0; and another represents the inactive branch, with b &#8804; 0 and 0 &#8804; f &#8804; 0. These new bounds become the ground bounds for this node: their Farkas vectors are reset to zero, and all subsequent Farkas vectors refer to these new bounds (as opposed to the ground bounds of the parent node). A new node inherits any previously-discovered dynamic bounds, as well as the Farkas vectors that explain them, from its parent; these vectors remain valid, as ground bounds only become tighter as a result of splitting (see the extended version of this paper <ref type="bibr">[42]</ref>).</p><p>For example, let us return to the query from Fig. <ref type="figure">2</ref> and the proof tree from Fig. <ref type="figure">3</ref>. Initially, the solver decides to split on v1. This adds two new children to the proof tree. In the first child, representing the inactive case, we update the ground bounds u(b1) : = 0, u(f1 ) : = 0, and reset the corresponding Farkas vectors fu (b1 ) and f u ( f 1 ) to 0. Now, Simplex can tighten the lower bound of b1 using the first equation e 1 : l&#8242; (b1 ) : = l(x1 ) -u(x2 ) = 2 -1 = 1 resulting in the the updated fl (b1 ) = 1 0 0 , as shown in Sec. IV, where we use vectors of size three since in this search state we have three equations. Observe this bound contradicts the upper ground bound of b1, represented by the zero vector. We can then use the vector fu (b1 ) -fl (b1 ) = 0 -1 0 0 = -1 0 0 as a proof for contradiction. Indeed, the matrix A &#8242; , which is obtained using the first three rows and columns of A as defined in Sec. III, corresponds to the tableau before adding any new equations. Observe that -1 0 0 &#8226; A &#8242; &#8226; V = 0 gives the equation -x 1 + x 2 + b1 = 0. Given the current ground bounds, the largest value of the left-hand side is:</p><p>which is negative, meaning that no variable assignment within these bounds can satisfy the equation. This indicates that the proof node representing v1's inactive phase is UNSAT.</p><p>In the second child, representing v1's active case, we update the ground bound l(b1 ) : = 0 and the Farkas vector fl (b1 ) : = 0. P We also add the equation e 4 : f1 = b1. Next, the solver performs another split on v2, adding two new children to the tree. In the first one (representing the inactive case) we update the ground bounds u(b2) : = 0, u(f2 ) : = 0, and reset the corresponding Farkas vectors fu (b2 ) and f u ( f 2 ) to 0. In this node, we have a contradiction already in the ground bounds, since u(f2 ) : = 0 but l(f2 ) : = 0.25. The contradiction in this case is comprised of a symbol for f2 .</p><p>We are left with proving UNSAT for the last child, representing the case where both ReLU nodes v1, v2 are active. For this node of the proof tree, we update the ground bound l(b2 ) : = 0 and Farkas vector fl (b2 ) : = 0, and add the equation e 5 : f2 = b2. Recall that previously, we learned the tighter bound l &#8242; (f1 ) = 1. With the same procedure as described in Sec. IV, we can update f l ( f 1 ) = 1 0 0 1 0 . Now, we can use e 2 : b2 = -2 f 1 to tighten u &#8242; (b2) : = -2l &#8242; (f1 ) = -2, and consequently update the Farkas vector:</p><p>The bound u &#8242; (b2) = -2, explained by -2 1 0 -2 0 contradicts the ground bound l(b2 ) = 0 explained by the zero vector. Therefore, we get the vector</p><p>as the proof of contradiction for this node.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Bound tightenings from piecewise-linear constraints</head><p>Modern solvers often use sophisticated methods <ref type="bibr">[25]</ref>, <ref type="bibr">[50]</ref>, <ref type="bibr">[62]</ref> to tighten variable bounds using the piecewise-linear constraints. For example, if f = ReLU(b), then in particular b &#8804; f , and so u(b) &#8804; u(f ). Thus, if initially u(b) = u(f ) = 7 and it is later discovered that u &#8242; (f ) = 5, we can deduce that also u &#8242; (b) = 5. We show here how such tightening can be supported by our proof framework, focusing on some ReLU tightening rules as specified in the extended version of this paper <ref type="bibr">[42]</ref>. Supporting additional rules should be similar.</p><p>We distinguish between two kinds of ReLU bound tightenings. The first are tightenings that can be explained via a Farkas vector; these are handled the same way as bounds discovered using interval propagation. The second, more complex tightenings are those that cannot be explained using an equation (and thus a Farkas vector). Instead, we treat these bound tightenings as lemmas, which are added to the proof node along with their respective proofs; and the bounds that they tighten are introduced as ground bounds, to be used in constructing future Farkas vectors. The proof for a lemma consists of Farkas vectors explaining any current bounds that were used in deducing it; as well as an indication of the tightening rule that was used. The list of allowed tightening rules must be agreed upon beforehand and provided to the checker; in the extended version of this paper <ref type="bibr">[42]</ref>, we present the tightening rules for ReLUs that we currently support.</p><p>For example, if f = ReLU(b) and u &#8242; (f ) = 5 causes a bound tightening u &#8242; (b) = 5, then this new bound u &#8242; (b) = 5 is stored as a lemma. Its proof consists of the Farkas vector f u ( f ) which explains why u &#8242; (f ) = 5, and an indication of the deduction rule that was used (in this case, u &#8242; (b) &#8804; u &#8242; (f )).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V I . PROOF C H E C K I N G A N D N U M E R I C A L S T A B I L I T Y</head><p>Checking the validity of a proof tree is straightforward. First, the checker must read the initial query and confirm that it is consistent with the LP and piecewise-linear constraints stored at the root of the tree. Next, the checker begins a depth-first traversal of the proof tree. Whenever it reaches a new inner node, it must confirm that that node's children correspond to the linear phases of a piecewise-linear constraint present in the query. Further, the checker must maintain a list of current equations and lower and upper bounds, and whenever a new node is visited -update these lists (i.e., add equations and tighten bounds as needed), to reflect the LP stored in that node. Additionally, the checker must confirm the validity of lemmas that appear in the node -specifically, to confirm that they adhere to one of the permitted derivation rules. Finally, when a leaf node is visited, the checker must confirm that the Farkas vector stored therein does indeed lead to a contradiction when applied to the current LP -by ensuring that the linear combination of rows created by the Farkas vector leads to a matrix row cj &#8226; x j = 0, such that for any assignment of the variables, the left-hand side will have a negative value.</p><p>The process of checking a proof certificate is thus much simpler than verifying a DNN using modern approaches, as it consists primarily of traversing a tree and computing linear combinations of the tableau's columns. Furthermore, the proof checking process does not require using division for its arithmetic computations, thus making the checking program more stable arithmetically <ref type="bibr">[44]</ref>. Consequently, we propose to treat the checker as a trusted code-base, as is commonly done <ref type="bibr">[15]</ref>, <ref type="bibr">[49]</ref>.</p><p>Complexity and Proof Size. Proving that a DNN verification query is SAT (by providing a satisfying assignment) is significantly easier than discovering an UNSAT witness using our technique. Indeed, this is not surprising; recall that the DNN verification problem is NP-complete, and that yesinstances of NP problems have polynomial-size witnesses (i.e., polynomial-size proofs). Discovering a way to similarly produce polynomial proofs for no-instances of DNN verification is equivalent to proving that NP = coNP, which is a major open problem <ref type="bibr">[8]</ref> and might, of course, be impossible. Numerical Stability. Recall that enhancing DNN verifiers with proof production is needed in part because they might produce incorrect UNSAT results due to numerical instability. When this happens, the proof checking will fail when checking a proof leaf, and the user will receive warning. There are, however, cases where the query is UNSAT, but only the proof produced by the verifier is flawed. To recover from these cases SMT solvers typically use sound arithmetic (as opposed to DNN verifiers), and so their conclusions are generally more reliable. Further, if a proof-producing SMT solver is used, the proof that it produces could be plugged into the larger proof tree, instead of the incorrect proof previously discovered. Although using SMT solvers to directly verify DNNs has been shown to be highly ineffective <ref type="bibr">[48]</ref>, <ref type="bibr">[59]</ref>, in our evaluation we observed that leaves typically represented problems that were significantly simpler than the original query, and could be solved efficiently by the SMT solver.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V I I . I M P L E M E N TAT I O N A N D E VA L U AT I O N</head><p>Implementation. For evaluation purposes, we instrumented the Marabou DNN verifier <ref type="bibr">[50]</ref>, <ref type="bibr">[69]</ref> with proof production capabilities. Marabou is a state-of-the-art DNN verifier, which uses a native Simplex solver, and combines it with other modern techniques -such as abstraction and abstract interpretation <ref type="bibr">[26]</ref>, <ref type="bibr">[27]</ref>, <ref type="bibr">[57]</ref>, <ref type="bibr">[62]</ref>, <ref type="bibr">[68]</ref>, <ref type="bibr">[71]</ref>, advanced splitting heuristics <ref type="bibr">[70]</ref>, DNN optimization <ref type="bibr">[63]</ref>, and support for varied activation functions <ref type="bibr">[6]</ref>. Additionally, Marabou has been applied to a variety of verification-based tasks, such as verifying recurrent networks <ref type="bibr">[43]</ref> and DRL-based systems <ref type="bibr">[3]</ref>, <ref type="bibr">[5]</ref>, <ref type="bibr">[28]</ref>, <ref type="bibr">[51]</ref>, network repair <ref type="bibr">[34]</ref>, <ref type="bibr">[60]</ref>, network simplification <ref type="bibr">[33]</ref>, <ref type="bibr">[52]</ref>, and ensemble selection <ref type="bibr">[4]</ref>.</p><p>As part of our enhancements to Marabou's Simplex core, we added a mechanism that stores, for each variable, the current Farkas vectors that explain its bounds. These vectors are updated with each Simplex iteration in which the tableau is altered. Additionally, we instrumented some of Marabou's Simplex bound propagation mechanisms -specifically, those that perform interval-based bound tightening on individual rows <ref type="bibr">[25]</ref>, to record for each tighter bound the Farkas vector that justifies it. Thus, whenever the Simplex core declares UNSAT as a result of conflicting bounds, the proof infrastructure is able to collect all relevant components for creating the certificate for that particular leaf in the proof tree. Due to time restrictions, we were not able to instrument all of Marabou's many bound propagation components; this is ongoing work, and our experiments described below were run with yetunsupported components turned off. The only exception is Marabou's preprocessing component, which is not supported, but is run before proof production starts.</p><p>In order to keep track of Marabou's tree-like search, we instrumented Marabou's SmtCore class, which is in charge of case splitting and backtracking <ref type="bibr">[50]</ref>. Whenever a case-split was performed, the corresponding equations and bounds were added to the proof tree as ground truths; and whenever a previous split was popped, our data structures would backtrack as well, returning to the previous ground bounds.</p><p>In addition to the instrumentation of Marabou, we also wrote a simple proof checker that receives a query and a proof artifact -and then checks, based on this artifact, that the query is indeed UNSAT. That checker also interfaces with the cvc5 SMT solver <ref type="bibr">[14]</ref> for attempting recovery from numerical instability errors.</p><p>Evaluation. We used our proof-producing version of Marabou to solve queries on the ACAS -Xu family of benchmarks for airborne collision avoidance <ref type="bibr">[45]</ref>. We argue that the safetycritical nature of this system makes it a prime candidate for proof production. Our set of benchmarks was thus comprised of 45 networks and 4 properties to test on each, producing a total of 180 verification queries. Marabou returned an UNSAT result on 113 of these queries, and so we focus on them. In the future, we intend to evaluate our proof-production mechanism on other benchmarks as well.</p><p>We set out to evaluate our proof production mechanism along 3 axes: (i) correctness: how often was the checker able to verify the proof artifact, and how often did Marabou (prob-ably due to numerical instability issues) produce incorrect proofs?; (ii) overhead: by how much did Marabou's runtime increase due to the added overhead of proof production?; and (iii) checking time: how long did it take to check the produced proofs? Below we address each of these questions.</p><p>Correctness. Over 1.46 million proof-tree leaves were created and checked as part of our experiments. Of these, proof checking failed for only 77 leaves, meaning that the Farkas vector written in the proof-tree leaf did not allow the proof checker to deduce a contradiction. Out of the 113 queries checked, 97 had all their proof-tree leaves checked successfully. As for the rest, typically only a tiny number of leaves would fail per query, but we did identify a single query where a significant number of proofs failed to check (see Fig. <ref type="figure">4</ref>). We speculate that this query had some intrinsic numerical issues encoded into it (e.g., equations with very small coefficients <ref type="bibr">[20]</ref>).  Next, when we encoded each of the 77 leaves as a query to the cvc5 SMT solver <ref type="bibr">[14]</ref>, it was able to show that all queries were indeed UNSAT, in under 20 seconds per query. From this we learn that although some of the proof certificates produced by Marabou were incorrect, the ultimate UNSAT result was correct. Further, it is interesting to note how quickly each of the queries could be solved. This gives rise to an interesting verification strategy: use modern DNN verifiers to do the "heavy-lifting", and then use more precise SMT solvers specifically on small components of the query that proved difficult to solve accurately. Overhead and Checking Time. In Fig. <ref type="figure">5</ref>, we compare the running time of vanilla Marabou, the overhead incurred by our proof-production extension to Marabou, and the checking time of the resulting proof certificates. We can see that the overhead of proof production time is relatively small for all queries (an average overhead of 5.7%), while the certification time is non-negligible, but shorter than the time it takes to solve the queries by a factor of 66.5% on average.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V I I I . R E L AT E D WO R K</head><p>The importance of proof production in verifiers has been repeatedly recognized, for example by the SAT, SMT, and model-checking communities (e.g., <ref type="bibr">[15]</ref>, <ref type="bibr">[21]</ref>, <ref type="bibr">[38]</ref>). Although the risks posed by numerical imprecision within DNN verifiers have been raised repeatedly <ref type="bibr">[12]</ref>, <ref type="bibr">[44]</ref>, <ref type="bibr">[48]</ref>, <ref type="bibr">[47]</ref>, we are unaware of any existing proof-producing DNN verifiers.</p><p>Proof production for various Simplex variants has been studied previously <ref type="bibr">[56]</ref>. In <ref type="bibr">[24]</ref>, Dutertre and de Moura study a Simplex variant similar to ours, but without explicit support for dynamic bound tightening. Techniques for producing Farkas vectors have also been studied <ref type="bibr">[10]</ref>, but again without support for dynamic bound tightening, which is crucial in DNN verification. Other uses of Farkas vectors, specifically in the context of interpolants, have also been explored <ref type="bibr">[9]</ref>, <ref type="bibr">[18]</ref>.</p><p>Other frameworks for proof production for machine learning have also been proposed <ref type="bibr">[7]</ref>, <ref type="bibr">[35]</ref>; but these frameworks are interactive, unlike the automated mechanism we present here.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I X . C O N C L U S I O N A N D F U T U R E W O R K</head><p>We presented a novel framework for producing proofs of unsatisfiability for Simplex-based DNN verifiers. Our framework constructs a proof tree that contains lemma proofs in internal nodes and unsatisfiability proofs in each leaf. The certificates of unsatisfiability that we provide can increase the reliability of DNN verification, particularly when floating-point arithmetic (which is susceptible to numerical instability) is used.</p><p>We plan to continue this work along two orthogonal paths: (i) extend our mechanism to support additional steps performed in modern verifiers, such as preprocessing and additional abstract interpretation steps <ref type="bibr">[53]</ref>, <ref type="bibr">[62]</ref>; and (ii) use our infrastructure to allow learning succinct conflict clauses. During search, the Farkas vectors produced by our approach could be used to generate conflict clauses on-the-fly. Intuitively, conflict clauses guide the verification algorithm to avoid any future search for a satisfying assignment within subspaces of the search space already proven to be UNSAT. Such clauses are a key component in modern S AT and SMT solvers, and are the main component of C D C L algorithms <ref type="bibr">[74]</ref> -and could significantly curtail the search space traversed by DNN verifiers and improve their scalability.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>The approach is easily generalizable to any piecewise-linear constraint, by splitting the query according to the different linear pieces of the activation function.</p></note>
		</body>
		</text>
</TEI>
