<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Verified Correctness, Accuracy, and Convergence of a Stationary Iterative Linear Solver: Jacobi Method</title></titleStmt>
			<publicationStmt>
				<publisher>Springer</publisher>
				<date>09/05/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10532846</idno>
					<idno type="doi"></idno>
					
					<author>Mohit Tekriwal</author><author>Andrew W Appel</author><author>Ariel E Kellison</author><author>David Bindel</author><author>Jean-Baptiste Jeannin</author><author>Catherine Dubois</author><author>Manfred Kerber</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Solving a sparse linear system of the form Ax = b is a common engineering task, e.g., as a step in approximating solutions of differential equations. Inverting a large matrix A is often too expensive, and instead engineers rely on iterative methods, which progressively approximate the solution x of the linear system in several iterations, where each iteration is a much less expensive (sparse) matrix-vector multiplication. We present a formal proof in the Coq proof assistant of the correctness, accuracy and convergence of one prominent iterative method, the Jacobi iteration. The accuracy and convergence properties of Jacobi iteration are well-studied, but most past analyses were performed in real arithmetic; instead, we study those properties, and prove our results, in floatingpoint arithmetic. We then show that our results are properly reflected in a concrete implementation in the C language. Finally, we show that the iteration will not overflow, under assumptions that we make explicit. Notably, our proofs are faithful to the details of the implementation, including C program semantics and floating-point arithmetic.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Many scientific and engineering computations require the solution x of large sparse linear systems Ax = b given an n&#215;n matrix A and a vector b. There are many algorithms for doing this; Gaussian elimination is rare when n is large, since it takes O(n 3 ) time. The widely used stationary iterative methods have an average time complexity of O(nsk), where sparseness s is the number of nonzeros per row (often s n) and k is the number of iterations (often small). Even where iterative methods are not the principal algorithms for solving Ax = b, they are often used in transformations of the problem (preconditioning) before using other workhorses such as Krylov subspace methods <ref type="bibr">[27]</ref>.</p><p>When using a stationary iterative method, one starts with an initial vector x 0 and uses A and b to derive successive vectors x 1 , x 2 , . . . that-one hopes-will converge to a value x k such that the residual Ax kb is small and x k is close to the true solution. Because these methods are often used as subroutines deep within larger computational libraries and solvers, it is quite inconvenient to the end user if some such subroutine reports that it failed to converge-often, the user has no idea what is the subproblem A, b that has failed. Thus it is useful to be able to prove theorems of the following form: "Given inputs A and b with certain properties, the algorithm will converge to tolerance &#964; within k iterations."</p><p>Since these methods are so important, analyses of their convergence properties have been studied in detail. However, most of these analyses assume real number arithmetic operations <ref type="bibr">[27]</ref>, whereas their implementations use floatingpoint; or the analysis uses a simplified floating-point model that omits subnormal numbers <ref type="bibr">[14]</ref>; or the analysis is for a model of an algorithm <ref type="bibr">[19]</ref> but not the actual software. And when one reaches correctness and accuracy proofs of actual software, it's useful to have machine-checked proofs that connect in a machine-checkable way to the actual program that is executed, for programs can be complex and as programs evolve one must ensure that their correctness theorems evolve with them.</p><p>We focus on Jacobi iteration applied to strictly diagonally dominant matrices, i.e., in which in each row the magnitude of the diagonal element exceeds the sum of the magnitudes of the off-diagonals. Strict diagonal dominance is a simple test for invertibility and guarantees convergence of Jacobi iteration in exact arithmetic <ref type="bibr">[27]</ref>. Strictly diagonally dominant matrices arise in cubic spline interpolation <ref type="bibr">[1]</ref>, analysis of Katz centrality in social networks <ref type="bibr">[18]</ref>, Markov chains associated with PageRank and related network analysis methods <ref type="bibr">[12]</ref>, and market equilibria in economic theory <ref type="bibr">[24]</ref>, among other domains.</p><p>We present both a Coq functional model of floating-point Jacobi iteration (at any desired floating-point precision) and a C program (in double-precision floating-point), with Coq proofs that:</p><p>-the C program (which uses efficient sparse-matrix algorithms) correctly implements the functional model (which uses a simpler matrix representation); -for any inputs A, b and desired accuracy &#964; that satisfy the Jacobi preconditions for a given natural number k, the functional model (and the C program) will converge within k iterations to a vector x k such that ||Ax k -b|| 2 &lt; &#964;; -this computation will not overflow into floating-point "infinity" values; -and the Jacobi preconditions are natural properties of A, b, &#964;, k that (1) are easily tested and (2) for many natural engineering problems of interest (mentioned above), are guaranteed to be satisfied.</p><p>Software packages not written in C can still be related to our functional model and make use of our floating-point convergence and accuracy theorems. And even for inputs that do not satisfy the Jacobi preconditions, we have proved that our C program correctly and robustly detects overflow. Together, these theorems guarantee that a Jacobi solver deep within some larger library will not be the cause of a mysterious "failed to converge" message; and that when it does believe it has converged, it will have a correct answer.</p><p>Contributions. First convergence proof of Jacobi that takes into account floating-point underflow or overflow; first machine-checked proof of a stationary iterative method; first machine-checked connection to a real program. Our Coq formalization is available at: <ref type="url">https://github.com/VeriNum/iterative_methods/tree/v0.1.0</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Overview of Iterative Methods and Our Proof Structure</head><p>Stationary iterative methods <ref type="bibr">[27]</ref> are among the oldest and simplest methods for solving linear systems of the form Ax = b, for A &#8712; R n&#215;n , b &#8712; R n . The nonsingular matrix A and vector b in such systems typically appear, for example, in the solution of a partial differential equation. In stationary methods, matrix A is decomposed into A = M + N where M is chosen such that it is easily invertible; for Jacobi it is simply the diagonal of A and we will often call it D. Rather than solving the system Ax = b exactly, one can approximate the solution vector x using stationary iterations of the form</p><p>where the vector x m is an approximation to the solution vector x obtained after m iterations; we typically start with x 0 = 0. The unknown x m is therefore</p><p>This iterates until x k satisfies Ax kb 2 &lt; &#964;, or until the program detects failure: overflow in computing x k , or maximum number of iterations exceeded. Throughout this paper, we let &#8226; denote the infinity vector norm and its induced matrix norm, and we let &#8226; 2 denote the &#8467; 2 norm on vectors.</p><p>For our model problem, the steps are as follows.</p><p>1. Write a C program that implements (2) by Jacobi iterations (and also implements an appropriate stopping condition).</p><p>2. Write a floating-point functional model in Coq (a recursive functional program that operates on floating-point values) that models Jacobi iterations of the form (2). This model must perform almost exactly (see Sect. 7.1) the same floating-point operations as the C program. (As we will explain, we have two statements of this model and we prove the two models equivalent.) 3. Prove that the program written in Step 1 implements the floating-point functional model of Step 2, using a program logic for C. 4. Write a real functional model in Coq that performs Jacobi iteration x m = D -1 (b -Nx m-1 ) in the exact reals. Of course, it is impractical to compute with this model, but it is useful for proofs. 5. Prove a relation between x k (the k-th iteration of the floating-point model) and the real solution x of the real functional model: the Jacobi forward error bound. If one could run the Jacobi method purely in the reals, this is obviously contractive: x k+1x &lt; &#961; x kx , where &#961; &lt; 1 is the spectral radius of D -1 N . But in the floats, there is an extra term caused by roundoff error. 6. Prove floating-point convergence: under certain conditions (Jacobi preconditions), this extra term does not blow up, and within a specified k iterations the residual Ax kb 2 is less than tolerance &#964; . 7. Compose these to prove the main theorem: the C program converges to an accurate result. Figure <ref type="figure">1</ref> shows our correctness and accuracy theorem as a modular composition of reusable models and lemmas. We have two float-valued models: for proving the relation of the float-valued model to the real solution we use the Math-Comp library (in Coq). For proving the C program implements the float-valued model, we use the Verified Software Toolchain (in Coq). But MathComp <ref type="bibr">[22]</ref> and VST <ref type="bibr">[10]</ref> prefer different notations and constructions for specifying data structures such as matrices and vectors; so we write the same functional model in each of the two styles, and prove their equivalence. We used Coq for our formalization because of Coq's expressive general-purpose logic, with powerful libraries such as MathComp, Flocq <ref type="bibr">[8]</ref>, VCFloat <ref type="bibr">[3]</ref>, LAProof <ref type="bibr">[21]</ref>, Coquelicot <ref type="bibr">[7]</ref>, VST, support both high-level mathematical reasoning and low-level program verification.</p><p>The float-valued model will experience round-off errors compared to the realvalued model; we prove bounds on these errors in a forward error bound lemma (Sect. 4). Under certain "Jacobi preconditions" the float-valued model is guaranteed to converge to a result of specified accuracy without ever overflowing; this is the Jacobi iteration bound lemma (Sect. 5).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Parametric Models and Proofs; Important Constants</head><p>We have proved accuracy bounds for any floating-point precision. That is, our floating-point functional models, and the proofs about them, are parameterized by a floating-point type, expressed in Coq as type:Type, with operations: <ref type="bibr">[3]</ref>  So for t:type, we have x:ftype(t) meaning that x is a floating-point number in format t. We will write p for fprec(t) and e max for femax(t). The maximum representable finite value is</p><p>. Given a floating-point format t and matrix-dimension n, the following functions will be useful in reasoning:</p><p>For example, suppose t is double-precision floating-point (p = 53, e max = 1024), n = 10 6 (Jacobi iteration on a million-by-million matrix), s = 5 (the millionelement sparse matrix has 5 nonzeros per row). Then some relevant quantities are,</p><p>Our error analyses will often feature formulas with &#948;, , g &#948; , g ; remember that in double-precision these are small quantities (in single-or half-precision, not so small). Henceforth we will write g &#948; , g for g &#948; (n), g (n).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Forward Error Bound for Dot Product</head><p>In separate work, Kellison et al. <ref type="bibr">[21]</ref> prove (in Coq) the correctness and accuracy of floating-point dot-product and sparse matrix-vector multiply, as Coq functional models and as C programs. Define dot-product u, v between two real vectors u and v as 0&#8804;i&lt;n u i v i . A matrix-vector multiplication Av can be seen as the dot-product of each row of A with vector v. Forward error bounds for a matrix-vector multiplication are therefore based on forward error bounds for dot-product.</p><p>Our implementation and functional model of the dot-product use fused multiply-add (FMA), which computes a floating-point multiplication and addition (i.e., a &#8855; b &#8853; c) with a single rounding error rather than two.</p><p>The parameters to the dotprod functional model are the floating-point format t and two lists of floating-point numbers. The algorithm zips the two lists into a list of pairs (using List.combine) and then adds them from left to right, starting with a floating-point 0 in format t.</p><p>We denote floating-point summation by , so the floating-point dot product is 0&#8804;i&lt;n u i v i ; real-valued summation is denoted as 0&#8804;i&lt;n u i v i . The notation finite(z) signifies that the floating-point number z is within the range of the floating-point format (not an infinity or NaN).</p><p>Theorem 1 (forward error + no overflow). Let u, v be lists of length n of floats in format t = (p, e max ), in which every element is &#8804; v max , and no more than s elements of u are nonzero. The absolute forward error of the resulting dot product is</p><p>Proof. See Kellison et al. <ref type="bibr">[21]</ref>.</p><p>Subnormal Numbers. When some of the vector elements are order-of-magnitude 1, the term g (s) is negligible. But if the u and v vectors are composed of subnormal numbers, then neglecting the underflow-error term would be unsound.</p><p>Most previous proofs about dot-product error (and about Jacobi iteration), and all previous machine-checked proofs to our knowledge, omit reasoning about underflow.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Jacobi Forward Error</head><p>We prove an explicit bound on the distance between the approximate solution x k at step k and the exact solution x of the problem. Such bounds are commonly studied in computational science, but rarely take into account the details of floating-point arithmetic (including underflow and overflow). They also usually have paper proofs while we provide a machine-checked proof.</p><p>Theorem 2 (jacobi_forward_error_bound). After k Jacobi iterations, if no iteration resulted in an overflow, then the distance between the approximate (floating-point) solution x k and the exact (real) solution x is bounded:</p><p>where &#961; is a bound on the spectral radius (largest eigenvalue) of D -1 N , adjusted for floating-point roundoff errors that arise in one iteration of the Jacobi method. where f_error(k) is xx k , and the conditions on</p><p>for n &#8805; 0, are characterized as follows:</p><p>Definition forward_error_cond {ty} {n:nat} (A: 'M[ftype ty]_n.+1) (x0 b: 'cV[ftype ty]_n.+1) :=</p><p>size_constraint n is a constraint on the dimension n of the matrix A (in double-precision, about 6 &#8226; 10 9 ). The predicate input_bound provides conditions on the bounds for the inputs A, b, x 0 ; it is implied by the Jacobi preconditions (Definition 1) defined in the next section.</p><p>Proof. Assuming no overflow, the floating-point iteration satisfies</p><p>where the operators &#8855;, represent the floating-point multiplication and subtraction operations respectively. D-1 is the floating-point (not exact) inverse of the diagonal elements. The forward error at step k + 1 is defined as</p><p>Here, we write the true solution as x = D -1 (b -Nx) which can be derived from Ax = b. To move forward, we will be using the following auxiliary error bounds on matrix-vector operations.</p><p>-Matrix-vector multiplication: N &#8855; x -Nx &#8804; N x g &#948; + g . This bound is derived from the dot product error bound that we stated in Sect. 4. In the &#8226; norm, the dot product errors directly give the error bound for matrix-vector multiplication. -Vector subtraction:</p><p>Here, we use the fact that |x y -</p><p>Here, we make use of the fact that inverting each element of the vector D satisfies, |(1</p><p>We use these norm-wise errors to expand the error definition f k+1 :</p><p>We then collect the coefficients of ||x k || and expand using the error relation for f k as ||x k || &#8804; f k + ||x|| to get the error recurrence relation:</p><p>where d mag is a constant independent of k depending only on ||x||, &#948;, , g &#948; , g . We can then expand the above error recurrence relation to get</p><p>This geometric series converges if &#961; &lt; 1 with closed form</p><p>Note that if the above iterative process were done in reals, then we would only require &#961; r := ||D -1 N || to be less than 1. Thus, the presence of rounding errors forces us to choose a more conservative convergence radius &#961;.</p><p>6 Convergence Guarantee: Absence of Overflow</p><p>Theorem 2 had the premise, "if no iteration resulted in an overflow." Most previous convergence theorems for Jacobi iteration <ref type="bibr">[27]</ref> have been in the real numbers, where overflow is not an issue: multiplying two finite numbers cannot "overflow." Higham and Knight <ref type="bibr">[14]</ref> proved convergence (but not absence of overflow), on paper, in a simplified model of floating-point (without underflows). Let us now state a theorem, for an accurate model of floating-point, not conditioned on absence of overflow.</p><p>Theorem 3 (jacobi_iteration_bound_lowlevel). If the inputs A, b, desired tolerance &#964; , and projected number of iterations k satisfy the (efficiently testable) Jacobi preconditions, then the floating-point functional model of Jacobi iteration converges, within j &#8804; k iterations, to a solution x j such that Ax jb 2 &lt; &#964;, without overflowing.</p><p>Proof. The proof for this theorem follows by the following cases:</p><p>-If A is diagonal then N is a zero matrix. Therefore, the solution vector at each iteration is given by the constant vector x k = D-1 &#8855; b. Hence, the solution of Jacobi iteration has already converged after the first step, assuming certain bounds on b and A implied by the Jacobi preconditions. -If A is not diagonal but the vector b is small enough, Jacobi iteration has already converged, without even running the iteration. -Suppose A is not diagonal and the vector b is not too small. Then (a) The residual does not overflow for every iteration &#8804; j. This follows from the Jacobi preconditions and Theorem 2. (b) We can calculate k min such that the residual &lt; &#964; within k min iterations.</p><p>The Jacobi preconditions (Definition 1) are efficiently computable: a straightforward arithmetic computation with computational complexity linear in the number of nonzero matrix elements.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 1 (jacobi_preconditions_Rcompute). A, b, &#964;, k satisfy the Jacobi preconditions when:</head><p>-All elements of A, b, and D-1 are finite (representable in floating-point); -A is strictly diagonally row-dominant, that is, &#8704;i. For the iteration process to converge in presence of rounding, we want &#961; &lt; 1. dmag is a bound on the additive error in computing the residual Ax kb , the difference between computing the residual in the reals versus in floating-point. &#964; 2 = &#964; &#8855; &#964; is the floating-point square of &#964; . The minimum k for which we guarantee convergence is calculated as</p><p>Indeed these conditions are quite tedious -one might have difficulty trusting them without a machine-checked proof. But they are all easy to compute in linear time. And, although we state them here (mostly) in terms of operations on the reals, they are all straightforwardly boundable by floating-point operations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Remark 1 (not proved in Coq</head><p>). The Jacobi preconditions can be computed in time linear in the number of nonzero entries of A.</p><p>Proof. Let S be the number of nonzeros. Then n &lt; S since the diagonal elements are nonzero. The inverse diagonal D-1 is computed in linear time. The infinity norm ( N , D , d, b ) is simply the largest absolute value of any row-sum (for matrix) or element (for vector), which can be found in O(S) time. Then the values x bound , &#961;, dmag , v max , k min can all be computed in constant time. Then each of the tests in Definition 1 can be done in O(S) time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">An Efficient and Correct C Program</head><p>Our C program uses standard numerical methods to achieve high performance and accuracy: sparse matrix methods, fused multiply-add, efficient testing for overflow, and so on. What's not so standard is that we have proved it correct, with a foundational machine-checked proof that composes in Coq with the numerical accuracy (and convergence) proof of our functional model.</p><p>7.1 Sparse Matrix-Vector Multiply Many implementations of stationary iterative methods, including Jacobi, are on large sparse matrices. A naive dense matrix-vector multiply would take O(n 2 ) time per iteration, while sparse representations permit O(sn) time per iteration, there are s nonzeros per row. Our program uses Compressed Row Storage (CRS), a standard sparse representation [4, &#167;4.3.1]. Kellison et al. [21] describe the Coq floating-point functional model, an implementation in C, and the Coq/VST proof that the C dot-product program correctly implements the model. VST (Verified Software Toolchain) <ref type="bibr">[10]</ref> is a tool embedded in Coq for proving C programs correct. From that, here we prove that the Jacobi program implements its model.</p><p>For sweep-form Jacobi iteration it is useful to have a function that computes just one row of a sparse matrix-vector multiply, which in CRS form is implemented as, double crs_row_vector_multiply(struct crs_matrix * m, double * v, unsigned i); / * compute dot-product of row i of matrix m with vector v * / Separation of Concerns. The floating-point accuracy proof should be kept completely separate from the sparse-matrix data-structure-and-algorithm proof. This function is proved to calculate (almost) exactly the same floating-point computation as the naive dense matrix multiply algorithm. Almost exactly -because where A ij = 0, the dense algorithm computes A ij &#8226; x i + s where the sparse algorithm just uses s. In floating-point it is not the case that &#8704;y. 0 &#8226; y + s = s, for example when y is &#8734; or NaN. Even when y and s are finite, it is not always true that y &#8226; 0 + s is the same floating-point value as s: it could be that one is +0 and the other is -0. And finally, even when matrix A and vector x are all finite, we cannot assume that intermediate results s are finite-there may be overflow.</p><p>So we reason modulo equivalence relations (using Coq's Parametric Morphism system for rewriting with partial equivalence relations). We define feq x y to mean that either both x and y are finite and equal (with +0 = -0), or neither is finite. Our function will have a precondition that A and x are all finite, and postcondition that the computed result is feq to the result that a dense matrix multiply algorithm would compute. The C program in the Lisiting 1.1 loops over rows i of the matrix, which is also elements i of the vectors b and x. For each i it computes a new element y i of the result vector, as well as an element r i of residual vector. It returns s, the sum of the squares of the r i . By carefully computing r i from y i , and not vice versa, we can prove (in Coq, of course) that all overflows are detected: if s is finite, then all the y i must be finite.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.2">Jacobi Iteration</head><p>The program in the Listing 1.2 runs until convergence (s &lt; &#964; 2 ), giving up early if there's overflow (tested by s * 0=0.0, since if s overflows then s * 0 is NaN) or if maxiter iterations is reached. This program starts with x (0) in x, computes x (1) into y, then x (2) back into x, and so on. It mallocs y for that purpose and frees it at the end. Depending on whether the number of iterations is odd or even, it may need to copy from y to x at the end.</p><p>This program is conventional and straightforward. Our proof tools allow the numerical analyst to use standard methods and idioms-and nontrivial data structures-and still get an end-to-end correctness proof. For each of these functions we prove in VST that the function exactly implements the functional model, modulo equivalence relations on floating-point numbers. At this level there are no accuracy proofs, the programs exactly implement the functional models, except that one might have -0 where the other has +0, and one might have different NaNs than the other, if any arise.</p><p>Correctness Theorem. The jacobi2 function is specified and proved with a VST function-spec that we will not show here, but in English it says, Theorem 4 (body_jacobi2). Let A be a matrix, let b and x (0) be vectors, let A1p be the address of a 1-dimensional array holding the diagonal of A, let A2p be the address of a CRS sparse matrix representation of A without its diagonal, let bp and xp be the addresses of arrays holding b and x (0) , let &#964; be desired residual accuracy, and let maxiter be an integer. Suppose these preconditions hold: the dimension of A is n &#215; n, b and x (0) have length n, 0 &lt; n &lt; 2 32 , 0 &lt; maxiter &lt; 2 32 , all the elements of A, b, x, acc<ref type="foot">foot_1</ref> (as well as the inverses of A's diagonal) are finite double-precision floating-point numbers; the data structures A1p,A2p,bp have read permission and xp has read/write permission. Suppose one calls jacobi2(A1p,A2p,bp,xp,acc,maxiter); then afterward it will satisfy this postcondition: the function will return some s and the array at xp will contain some x (k) , such that (s, x (k) )</p><p>jacobi A b x &#964; 2 , maxiter, where is the floating-point equivalence relation and jacobi is our functional model in Coq of Jacobi iteration; and the data structures at A1p,A2p,bp will still contain their original values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">The Main Theorems, Residuals, and Stopping Conditions</head><p>The C program jacobi2() (and its functional model jacobi) satisfies either of two different specifications:</p><p>Theorem 4 (above): if A, b, x satisfy the basic preconditions<ref type="foot">foot_0</ref> then perhaps Jacobi iteration will return after maxiter iterations-having failed to convergeor might overflow to floating-point infinities and stop early. But even so, the result (s, y) will be such that the (squared) residual s = |Ay -b| 2 2 accurately characterizes the result-vector y: if y contains an &#8734; then s = &#8734;, but if &#8730; s &lt; &#964; then y is indeed a "correct" answer. That's because the functional model preserves infinities in this way, and the C program correctly implements the model. Theorem 5: if A, b, x,maxiter satisfy the Jacobi preconditions then the result (s, y) will be such that s = |Ay -b| 2 2 , and &#8730; s &lt; &#964; and indeed y is a "correct" finite answer. In fact this is our main result: Theorem 5 (main_jacobi). If the inputs satisfy the Jacobi preconditions, then the C program will converge within k iterations to an accurate result.</p><p>Proof. Using Theorems 3 and 4, with some additional reasoning about the stopping condition in the functional model of the C program.</p><p>Jacobi Iteration on Inputs Not Known to Satisfy the Jacobi Preconditions. Theorem 4 is useful on its own, since there are many useful applications of stationary iterative methods where one has not proved in advance the convergence conditions (e.g., Jacobi preconditions)-one just runs the program and tests the residual. For such inputs we must take care to correctly stop on overflow.</p><p>The induction hypothesis, for 0 &lt; k &#8804; maxiter iterations, requires that x k has not yet overflowed, otherwise our sparse-matrix reasoning cannot be proved (see Sect. 7.1). Therefore the program must check for floating-point overflow in x k after each iteration. In order to do that efficiently, the program tests s&#8855;0 = 0 (which is a portable and efficient way of testing that s is finite); and if so, then x k+1 is all finite.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="9">Related Work</head><p>Convergence of Jacobi Iteration. The standard error analysis for Jacobi iteration in exact arithmetic is well-described in standard books on iterative solvers <ref type="bibr">[27]</ref>. A floating-point error analysis of Jacobi and related iterations was carried out by Higham and Knight in the early 90s <ref type="bibr">[14]</ref>, and is summarized along with references to related work in <ref type="bibr">[13,</ref><ref type="bibr">Ch. 17]</ref>. The style of analysis is similar to what we present in this paper. However, earlier analyses implicitly assumed that all intermediates remain in the normalized floating-point range, and did not consider the possibility of underflow or overflow.</p><p>Formalization of numerical analysis has been facilitated by advancements in automatic and interactive theorem proving <ref type="bibr">[7,</ref><ref type="bibr">11,</ref><ref type="bibr">23,</ref><ref type="bibr">25]</ref>. Some notable works in the formalization of numerical analysis are the formalization of Kantorovich theorem <ref type="bibr">[26]</ref>, matrix canonical forms by Cano et al. <ref type="bibr">[9]</ref>, Perron-Frobenius theorem in Isabelle/HOL <ref type="bibr">[30]</ref>, Lax-equivalence theorem for finite difference schemes <ref type="bibr">[28]</ref>, consistency, stability and convergence of a second-order centered scheme for the wave equation <ref type="bibr">[5,</ref><ref type="bibr">6]</ref>, formalized flows, Poincar&#233; map of dynamical systems, and verified rigorous bounds on numerical algorithms in Isabelle/HOL <ref type="bibr">[15]</ref><ref type="bibr">[16]</ref><ref type="bibr">[17]</ref>. However, these works do not study the problem of iterative convergence formally. Even though the iterative convergence has been formalized in exact arithmetic <ref type="bibr">[29]</ref>, the effect of rounding error on iterative convergence has not been formalized before.</p><p>End-to-End Machine-Checked Proofs. There are few truly end-to-end (C code to high-level accuracy) machine-checked formal proofs in the literature of numerical programs. Our approach has been to prove that a C program exactly implements a functional model (using VST), prove how accurately the functional model approximates a real-valued model, prove the accuracy of the real-valued model; and compose these results together. Something similar has been done for scalar Newton's method <ref type="bibr">[2]</ref> and for ordinary differential equations <ref type="bibr">[20]</ref>, but those works did not leverage the power of the Mathematical Components and MathComp Analysis libraries for the upper-level proofs.</p><p>Other previous work <ref type="bibr">[6]</ref> verified a C program implementing a second-order finite difference scheme for solving the one-dimensional acoustic wave equation, with a total error theorem in Coq that composes global round-off and discretization error bounds; this was connected (outside of Coq) to a Frama-C correctness proof of the C program. The inexpressiveness of Frama-C's assertion language was a challenge in that verification effort.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="10">Conclusion and Future Work</head><p>In this paper, we have presented a formal proof in Coq of the correctness, accuracy, and convergence of Jacobi iteration in floating-point arithmetic. The same type of analysis should generalize to many other iterative methods, for both linear and nonlinear problems. Even within the scope of stationary iterations for linear problems, there are several avenues for future work.</p><p>We have not fully taken advantage of sparseness in our error bound; many of our g &#948; (n) and g (n) could be g &#948; (s), g (s)-functions of the number of nonzero elements per row. It would be useful to tighten the bound.</p><p>Jacobi iteration is one of the simplest stationary iterative methods. More complicated stationary methods involve splittings A = M +N , where M is not a diagonal matrix. Solving linear systems with such M is more complicated than diagonal scaling, and requires algorithms like forward and backward substitution that may require their own error analysis. Formalizing the floating-point error analysis of these solvers is an important next step.</p><p>Strict diagonal dominance is not a necessary condition for the convergence of Jacobi. We would like to formalize other sufficient conditions for convergence of Jacobi and related iterations. For example, irreducible diagonal dominance is sufficient for convergence of Jacobi in general, while positive definiteness is sufficient for convergence of Gauss-Seidel. However, while these conditions guarantee convergence, they do not guarantee an easy-to-compute rate of convergence in the same way that strict diagonal dominance does, and hence we expect the analysis to be more subtle.</p><p>Finally, we would like to extend our analysis to mixed precision methods, in which computations with M are done in one precision, while the residual is computed in a higher precision. These methods are often used to get errors associated with a higher floating-point precision while paying costs associated with a lower precision. However, the use of multiple precisions opens the door for a host of potential problems related to overflow and underflow, and we see formal verification as a particularly useful tool in this setting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Efforts and Challenges:</head><p>The formalization effort in this work includes about 1,826 lines of Coq proof script for C program verification and about 14,000 lines of Coq proof script for the convergence and accuracy proofs. A total of about 60 lines of C code were verified, which includes 12 lines for header files, 5 lines for surely_malloc, 14 lines of crs_row_vector_multiply, and about 31 lines for jacobi2 and jacobi2_oneiter. It took us about 5 person-months for the formalization of accuracy and convergence of the functional model. The most time-consuming part was the proof of absence of overflow, which involved deriving and formalizing bounds on the input conditions. The proof of correctness of C programs with respect to the functional model, including developing an understanding of what the termination condition should be, determining best ways to compute residuals such that the program properly detects overflow, developing functional models and proving properties about termination of the functional model took us about a couple of weeks.</p><p>The main challenge, in an end-to-end verification, is that each layer's theorem must be sufficiently strong to compose with the next layer. The published theorems about Jacobi convergence were insufficient (no treatment of underflow, error bounds relating the wrong quantities, no handling of -0, inadequate treatment of overflow), and new methods were required, which we address in this work.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>A an n &#215; n matrix; b and x dimension n; 0 &lt; n &lt;</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>32  ; A, b, x all finite; A, b, x stored in memory in the right places-but nothing else about the values of A, b, x.</p></note>
		</body>
		</text>
</TEI>
