<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Reduced Order Model Hessian Approximations in Newton Methods for Optimal Control</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2022 Summer</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10345793</idno>
					<idno type="doi">10.1007/978-3-030-95157-3_18</idno>
					<title level='j'>Realization and Model Reduction of Dynamical Systems -  A Festschrift in Honor of the 70th Birthday of Thanos Antoulas</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Matthias Heinkenschloss</author><author>Caleb Magruder</author><author>C.A. Beattie</author><author>P. Benner</author><author>M. Embree</author><author>S. Gugercin</author><author>S. Lefteriu</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[This paper introduces reduced order model (ROM) based Hessian approximations for use in inexact Newton methods for the solution of optimization problems implicitly constrained by a large-scale system, typically a discretization of a partial differential equation (PDE). The direct application of an inexact Newton method to this problem requires the solution of many PDEs per optimization iteration. To reduce the computational complexity, a ROM Hessian approximation is proposed. Since only the Hessian is approximated, but the original objective function and its gradient is used, the resulting inexact Newton method maintains the first-order global convergence property, under suitable assumptions. Thus even computationally inexpensive lower fidelity ROMs can be used, which is different from ROM approaches that replace the original optimization problem by a sequence of ROM optimization problem and typically need to accurately approximate function and gradient information of the original problem. In the proposed approach, the quality of the ROM Hessian approximation determines the rate of convergence, but not whether the method converges. The projection based ROM is constructed from state and adjoint snapshots, and is relatively inexpensive to compute. Numerical examples on semilinear parabolic optimal control problems demonstrate that the proposed approach can lead to substantial savings in terms of overall PDE solves required.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>We introduce reduced order model (ROM) based Hessian approximations for use in inexact Newton methods for the solution of large-scale smooth optimization problems min u&#8712;R m J (u) = J (y(u), u), <ref type="bibr">(1)</ref> where for given u &#8712; R m the vector y(u) &#8712; R n is the solution of c(y(u), u) = 0.</p><p>(</p><p>The optimization problem <ref type="bibr">(1,</ref><ref type="bibr">2)</ref> often comes from a discretization of optimal control problems and in this case u &#8712; R m is the discretized control, y &#8712; R n is the discretized state, and (2) is the discretized state equation. Our approach can be easily extended to a setting where the control space U and the state space Y are Hilbert spaces, but because of space restrictions we limit ourselves to the finite dimensional case. In our applications, the state Eq. ( <ref type="formula">2</ref>) is a discretized parabolic partial differential equation (PDE). In this case, each objective function <ref type="bibr">(1)</ref> evaluation requires the solution of a discretized PDE, the evaluation of the objective function gradient requires the solution of the (linear) adjoint PDE, and the evaluation of a Hessian-times-vector multiplication requires the additional solution of two (linear) PDE. This is computationally expensive. In addition, the additional PDEs that have to be solved for gradient and Hessian-times-vector computations depend on the solution of y(u) &#8712; R n of the state equation, which for time dependent PDEs is also memory intensive. ROMs can be used to reduce the computation time and memory requirements.</p><p>Previous approaches of using ROMs in optimization have approximated the mapping u &#8594; y(u) &#8712; R n using a ROM applied to the state Eq. <ref type="bibr">(2)</ref>. Typically, the state Eq. ( <ref type="formula">2</ref>) is approximated by a projection based ROM</p><p>where V &#8712; R n&#215;r is a matrix with rank r n. The ROM state solution is then used to approximate <ref type="bibr">(1)</ref> by min u&#8712;R m J (V y(u), u).</p><p>For many problems, including example problems in Sect. 4 of this paper, the ROM approximation <ref type="bibr">(4,</ref><ref type="bibr">3)</ref> of the states implies via the optimality conditions that the controls u are also contained in a low dimensional subspace related to V.</p><p>In special cases, one can extend ROM approaches for dynamical systems <ref type="bibr">[5]</ref> to find one ROM such that the solution of (4, 3) is a good approximation of the solution of <ref type="bibr">(1,</ref><ref type="bibr">2)</ref>. See, e.g., <ref type="bibr">[2,</ref><ref type="bibr">3]</ref>, and the survey <ref type="bibr">[7]</ref>. For general problems, however, the ROM is only valid in a potentially small neighborhood of the current control u c and corresponding state y c = y(u c ). In this case trust-region based model management approaches have been proposed. See <ref type="bibr">[1,</ref><ref type="bibr">10,</ref><ref type="bibr">13,</ref><ref type="bibr">16,</ref><ref type="bibr">19,</ref><ref type="bibr">20]</ref>, and the surveys <ref type="bibr">[7,</ref><ref type="bibr">17]</ref>.</p><p>While in principle, one can use trust-region based model management approaches that generate a new ROM at each new iterate, in practice their use is limited by the computational cost of ROM generation versus the computational savings resulting from that ROM.</p><p>Therefore, instead of generating ROMs for the optimization problem, we generate ROM approximations of the Hessians. These ROMs are computationally cheaper than a ROM that also has to approximate the objective function <ref type="bibr">(1)</ref> and its gradient over a range of controls. Ideally the ROM Hessians still generate optimization steps that are close to the Newton steps, and therefore lead to fast local convergence of the optimization algorithm at a fraction of the cost of a corresponding Newton-type methods applied to the full order model (FOM) problem <ref type="bibr">(1,</ref><ref type="bibr">2)</ref>.</p><p>In Sect. 2 we will outline our new approach and in Sect. 3 we will specify it further for a discretized parabolic optimal control problem. We demonstrate our approach on semilinear parabolic optimal control problems in Sect. 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Inexact Newton Methods and ROM Hessian Approximations</head><p>This section provides a general outline of our proposed approach. We begin with a review of gradient and Hessian computation and the line-search Newton-Conjugate-Gradient (Newton-CG) method. Then we will present the general structure of our ROM Hessian approximation, and a heuristic for choosing the ROM subspace within this approximation. Inexact Newton Method and ROM Hessian Approximation. To make the definition of the objective function J and its gradient and Hessian calculation rigorous, we make the following assumptions throughout. Here we use &#8711; y J (y, u) &#8712; R n , &#8711; u J (y, u) &#8712; R m to denote the partial gradients and c y (y, u) &#8712; R n&#215;n , c u (y, u) &#8712; R n&#215;m to denote the partial Jacobians. Similar notation is used for second derivatives.</p><p>The gradient of the objective J in (1, 2) can be computed using the so-called adjoint equation approach. See, e.g., <ref type="bibr">[12,</ref><ref type="bibr">Sect. 1.6]</ref>. Define the Lagrangian L(y, u, &#955;) := J (y, u) + &#955; T c(y, u) corresponding to <ref type="bibr">(1,</ref><ref type="bibr">2)</ref>. Given a control u and corresponding state y(u) the gradient of the objective J is computed by first solving the adjoint equation</p><p>for &#955; = &#955;(u) and then setting</p><p>Given a control u, the corresponding state y(u), and the corresponding adjoint &#955;(u), the Hessian-times-vector product &#8711; 2 J (u) d is computed by solving the linearized state equation c y (y(u), u) w = c u (y(u), u) d (6a)</p><p>for w, then the second order adjoint equation</p><p>for p, and then computing</p><p>For optimal control problems, the computation of y(u) requires the solution of a nonlinear discretized PDE. The gradient computation requires the solution of a linear discretized adjoint PDE to compute &#955;(u). Each Hessian-times-vector operation &#8711; 2 J (u) d requires the solution of two linear discretized PDEs to compute w and p, respectively.</p><p>The problem (1) is solved using a line-search Newton-CG Method. However, a trust-region method could be used as well, and the proposed ROM Hessian approximation provides the same computational benefits in a trust-region setting.</p><p>Given a current iterate u c the line-search Newton-CG Method [15, Algorithm 7.1] applies the truncated CG method to approximately solve the Newton subproblem</p><p>Then it computes a suitable step-size &#945; c &#8712; (0, 1] such that the sufficient decrease condition</p><p>is satisfied. The new iterate is given by u</p><p>The ROM Hessian approximation is computed by applying a projection ROM to the Eq. (6a, b). Let V &#8712; R n&#215;r , r n, with linearly independent columns such that V T c y (y(u), u)V is invertible.</p><p>The ROM Hessian-times-vector product is computed by first solving the projected linearized state equation</p><p>for w, then solving the projected second order adjoint equation</p><p>) for p, and then computing</p><p>The ROM Hessian approximation ( <ref type="formula">9</ref>) is used to compute the direction in the inexact Newton method. Instead of solving <ref type="bibr">(7)</ref>, given a current iterate u c the direction d is computed by using the truncated CG method to approximately solve</p><p>The step size &#945; c is computed as before using ( <ref type="formula">8</ref>) with d replaced by d. The new iterate is given by u</p><p>The global convergence of the original line-search Newton method and the linesearch Newton method with ROM Hessian approximation can be proven using standard arguments. See, e.g., <ref type="bibr">[15,</ref><ref type="bibr">Theorem 3.2]</ref>. While first order global convergence can be ensured under standard conditions if the ROM Hessian approximation is used, the speed with which the inexact Newton method converges in this case depends on the quality of the ROM Hessian approximation, and in particular on V. Next we motivate our choice of V.</p><p>Basic Properties of the ROM Hessian. Using (6) it can be seen that the Hessian of J is given by</p><p>where we have set y = y(u) and &#955; = &#955;(u) to simplify notation. We will use this simplified notation frequently during the remainder of this section.</p><p>Applying the Implicit Function Theorem to the state Eq. ( <ref type="formula">2</ref>) shows that the sensitivity of u &#8594; y(u) is given by</p><p>Similarly, applying the Implicit Function Theorem to the adjoint Eq. (5a) shows that the derivative of u &#8594; &#955;(u) is given by</p><p>Using these derivatives, the Hessian can be written as</p><p>Using ( <ref type="formula">9</ref>) it can be seen that the ROM Hessian approximation is given by</p><p>If we define</p><p>and</p><p>) then the ROM Hessian approximation can be written as</p><p>The next result ensures the positive definiteness of the Hessian and its ROM approximation in situations that are often encountered in classes of applications, such as those in Sect. <ref type="bibr">4</ref>.</p><p>and &#8711; yu L(y, u, &#955;) = 0, then &#8711; 2 J (u) and &#8711; 2 J (u) are positive definite with smallest eigenvalue bounded from below by the smallest eigenvalue of &#8711; uu L(y, u, &#955;).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Proof</head><p>The statement follows immediately from <ref type="bibr">(11)</ref> and <ref type="bibr">(15)</ref>.</p><p>Comparing <ref type="bibr">(14)</ref> with <ref type="bibr">(18)</ref> shows that &#8711; 2 J (u) &#8776; &#8711; 2 J (u) if y u (u) &#8776; y u (u) and &#955; u (u) &#8776; &#955; u (u). The latter two approximation conditions motivate our choice for the ROM matrix V. Proposition 2 Let V &#8712; R n&#215;r have rank r &lt; n and satisfy that V T c y (y(u), u)V is invertible. If y(u + d) &#8712; R(V) for all sufficiently small d, then</p><p>If, in addition, &#955;(u + d) &#8712; R(V) for all sufficiently small d, then Applying the Implicit Function Theorem to this equation shows that</p><p>and ( <ref type="formula">16</ref>) can be written as y u (u) = V y u (u). Since y(u + d) = V y(u + d) for all sufficiently small d,</p><p>which is <ref type="bibr">(19)</ref>. If, in addition, &#955;(u + d) &#8712; R(V) for all sufficiently small d, then we can use similar argument to show the first equality in <ref type="bibr">(20)</ref>.</p><p>The second identity in <ref type="bibr">(20)</ref> follows immediately from ( <ref type="formula">14</ref>), ( <ref type="formula">18</ref>), <ref type="bibr">(19)</ref>, and first identity <ref type="bibr">(20)</ref>.</p><p>In general there is no matrix V &#8712; R n&#215;r with small rank r n, such that the conditions in Proposition 2 hold. However, if there exists V &#8712; R n&#215;r , r n, with orthogonal columns such that the projections of y(u + d) and &#955;(u + d) onto R(V) are close to y(u + d) and &#955;(u + d), respectively, or equivalently such that</p><p>for a smal tolerance tol 1, then we expect y u (u) &#8776; y u (u), &#955; u (u) &#8776; &#955; u (u), and consequently that the ROM Hessian &#8711; 2 J (u) is a good approximation of the original FOM Hessian &#8711; 2 J (u).</p><p>However, finding V &#8712; R n&#215;r such that (21) is guaranteed to hold for all sufficiently small d would be computationally expensive. In practice we compute V &#8712; R n&#215;r using proper orthogonal decomposition (POD) applied to the state y(u) and the adjoint &#955;(u) at the current control.</p><p>ROM Hessians versus ROM Optimization Proposition 2 and even (21) describe optimistic scenarios. If we were able to find a V such that (21) holds at the current control u = u c , then it could be used to approximate the original problem (1, 2) in a trust-region based model management framework and approximately solve (4, 3) over all u in a neighborhood (trust-region) around the current control u c . Rather than using the ROM model u &#8594; J (V y(u), u) in a neighborhood of the current control u c , our approach uses the approximate quadratic Taylor expansion d &#8594;</p><p>The true function J (u) and this Taylor model have the same function and gradient value at u c , no matter how good V is. If &#8711; 2 J (u c ) well approximates &#8711; 2 J (u c ), we will essentially compute a Newton step. If the resulting V is less good, our approach can still generate a descent direction that is much better than a simple gradient step. Computationally inexpensive, but possibly less accurate ROMs can still be used to accelerate the overall optimization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">ROM Hessian Approximations for Discrete Time Optimal Control Problems</head><p>In this section we apply the ROM Hessian approach to a minimization problem that arises, e.g., from a discretized parabolic optimal control problem. Model Problem. Given functions : R n y &#8594; R, &#963; : R n u &#8594; R, F : R n y &#8594; R n y , and G : R n u &#8594; R n y , and given y 0 &#8712; R n y consider the following minimization problem in the variables u = (u T 1 , . . . , u T n t ) T &#8712; R n u n t , y = (y T 1 , . . . , y T n t ) T &#8712; R n y n t .</p><p>Minimize</p><p>where y and u satisfy an implicit constraint,</p><p>It is easily possible to extent our ROM Hessian approach to generalizations of ( <ref type="formula">22</ref>), e.g., replace (y k ) and G(u k ) by (y k , u k ) and G(y k , u k ). We consider (22) to simplify notation.</p><p>Throughout this section we make the following assumptions, which are adaptations of the assumptions (A1)-(A3) to the problem (22). (A1') There exists an open set D u &#8834; R n u n t such that for every u = (u T 1 , . . . , u T n t ) T &#8712; D the state Eq. (22b) has a unique solution y(u) = (y 1 (u) T , . . . , y n t (u) T ) T . (A2') There exists an open set D y &#8834; R n y n t such that {y(u) : u &#8712; D u } &#8834; D y , and the functions : D y &#8594; R, &#963; : D u &#8594; R, F : D y &#8594; R n y , and G : D u &#8594; R n y are twice continuously differentiable. (A3') For every y = (y T 1 , . . . , y T n t ) T &#8712; {y(u) : u &#8712; D u } and every r 1 , . . . , r n t &#8712; R n y there exists a unique solution w 1 , . . . , w n t &#8712; R n y of the equations</p><p>Under the assumptions (A1')-(A3') we can define the function J : <ref type="formula">22</ref>), is a special case of (1, 2). Gradient and Hessian Computation. Under the assumptions (A1')-(A3') the function J : D u &#8594; R is twice continuously differentiable. Its gradient can be computed using the adjoint equation approach in Algorithm 1. Hessian-times-vector operations are computed using Algorithm 2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Algorithm 1: Gradient Computation</head><p>T solve the state Eq. (22b). 2: Solve the adjoint equations for &#955; nt , . . . , &#955; 1 :</p><p>Note that since the objective function and the implicit constraints in the model problem ( <ref type="formula">22</ref>) are separable, &#8711; uy L(y, u, &#955;) = 0.</p><p>Hessian Approximation by Model Order Reduction. As mentioned towards the end of Sect. 2, we compute a ROM from state y and adjoint &#955; information. In this example the state y and the adjoint &#955; are vectors of vectors y k &#8712; R n y and &#955; k &#8712; R n y respectively. We compute a matrix V &#8712; R n y &#215;r such that the components y k and &#955; k are approximately contained in the range of V (this V is a component of the projection matrix in Sect. 2) The projection matrix for y and &#955; is the n t n y &#215; n t r block diagonal matrix with identical diagonal blocks given by V. Algorithm 2: Hessian-Times-Vector Computation -&#8711; 2 J (u)v 1: Given u = (u T 1 , . . . , u T nt ) T let y = (y T 1 , . . . , y T nt ) T solve the state Eq. ( <ref type="formula">22b</ref>) and let &#955; = (&#955; T 1 , . . . , &#955; T nt ) T solve the adjoint Eq. ( <ref type="formula">23</ref>). 2: Set w 0 = 0 and solve the linearized state equation w 1 , . . . , w nt :</p><p>3: Solve the second order adjoint equations for p nt , . . . , p 1 .</p><p>4: Compute the action of the Hessian</p><p>We apply POD to the computed state and adjoint to compute V &#8712; R n y &#215;r . Since states and adjoint typically have very different scales, we do not apply POD to the combined snapshots, but individually, as stated in Algorithm 3. 2: Construct by truncated SVD matrices V y &#8712; R ny &#215;ry and V &#955; &#8712; R ny &#215;r&#955; so that</p><p>The ROM Hessian-time-vector computation is specified in Algorithm 4. The subspace matrix V &#8712; R n y &#215;n r used in steps 2 and 3 is computed via Algorithm 3.</p><p>Computational Efficiency of ROM Hessian. While matrices like V T F y (y k )V in the ROM Hessian computation are smaller than F y (y k ), the dependence of F y on y k , which changes with time step k makes the computation of V T F y (y k )V expensive. This well-known issue <ref type="bibr">[4,</ref><ref type="bibr">6,</ref><ref type="bibr">9,</ref><ref type="bibr">11,</ref><ref type="bibr">18]</ref> is addressed via hypereduction. Specifically, we use a so-called unassembled form of the Discrete Empirical Interpolation Method (DEIM), which originates from the (D)EIM <ref type="bibr">[6,</ref><ref type="bibr">9]</ref> and is described in more detail Algorithm 4: ROM Hessian-Times-Vector Computation-&#8711; 2 J (u)v 1: Given u = (u T 1 , . . . , u T nt ) T let y = (y T 1 , . . . , y T nt ) T be the solution of the state Eq. (22b) and let &#955; = (&#955; T 1 , . . . , &#955; T nt ) T be the solution of the adjoint Eq. ( <ref type="formula">23</ref>). 2: Set w 0 = 0 and solve the POD linearized state equation for w k ,</p><p>3: Solve the POD second order adjoint equation for p k ,</p><p>4: Compute the approximation to the action of the Hessian,</p><p>in <ref type="bibr">[4,</ref><ref type="bibr">18]</ref>. This leads to a further approximation of the ROM Hessian computed by Algorithm 4. The error in this additional approximation is controlled by the tolerance tol DEIM ) used in DEIM. We refer to [14, Sects. 5.4, 6.2] for further details. We will use &#8711; 2 J (u) to denote the final ROM Hessian approximation. We can replace y k &#8776; V y k , where y k = V T y k &#8712; R r , and &#955; k &#8776; V &#955; k , where &#955; k = V T &#955; k &#8712; R r to reduce storage requirements.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Numerical Results</head><p>We apply the ROM Hessian approximation to semi-linear parabolic optimal control problems of the form min u&#8712;L 2 ( &#215;(0,1))</p><p>where for given function u &#8712; L 2 ( &#215; (0, 1)) the function y(&#8226;, &#8226;; u) is the solution of and = (0, 1) 2 . The problem is discretized in space using piecewise linear finite elements on a triangulation obtained by dividing = (0, 1) 2 into 40 &#215; 40 squares and then dividing each square into two triangles. This results in a semi-discretization with n u = 1681 and n y = 1521 degrees of freedom in the control and state respectively. The resulting semi-discretization is then discretized in time using the backward Euler method using n t = 100 times steps. This results in a problem of the type (22) with G(u k ) = M u u k and &#963; (u k ) = (&#945;/2)u T k M u u k , where M u is the mass matrix, and</p><p>), where M is the mass matrix that also appears in (22b) (M u and M differ in size because of bounday conditions for y) and y k,d comes from a discretization of the desired state y d in (25a).</p><p>All optimization runs are initialized at u = 0. The truncated CG is stopped when negative curvature is detected or when the CG residual satisfies</p><p>depending on whether a FOM or ROM Hessian is used. Example 1: Cubic Reaction. In the first example the nonlinearity in (25b) is cubic, f (y) = y 3 . The desired state is y d (x, t) = 2e t + 2x 1 (x 1 -1) + 2x 2 (x 2 -1), the initial state is y 0 (x, t) = sin(2&#960; x 1 ), and the control penalty is &#945; = 10 -4 .</p><p>Both optimization with FOM Hessians and with ROM Hessians converged to virtually the same solution. The computed optimal control u at t = 0.1, 0.5, 1 is shown in Fig. <ref type="figure">2</ref>. Table <ref type="table">1</ref> shows that the optimization histories for the two approaches are nearly the same.</p><p>While the optimization histories for the two approaches are nearly the same, the ROM Hessian approach is much faster as shown in Table <ref type="table">2</ref>. The timing reported Fig. <ref type="figure">1</ref> Convergence history for Newton-CG with and without POD+DEIM approximations in the Hessian-vector computation in Table <ref type="table">2</ref> is for computations using MATLAB running on a MacBook Pro (13inch, Retina, Mid 2014). The difference in computing time is due to the cost of Hessian-times-vector multiplications. In this example, using FOM Hessian requires 98 Hessian-times-vector multiplications and therefore 196 linear discretized PDE solves. In contrast, using ROM Hessian requires 102 Hessian-times-vector multiplications and therefore 204 linear ROM PDE solves. Figure <ref type="figure">1</ref> is another presentation of the convergence results in Tables <ref type="table">1</ref> and<ref type="table">2</ref>. The left plot shows that both approaches have nearly the same iteration history. However, when convergence history is plotted against computational work performed, measured in terms of PDE solves, the ROM Hessian approach is much faster. Note that here we do not differentiate between the nonlinear state PDE solve and the linear PDE solves needed for gradient, adjoint and FOM Hessian-times-vector computations. While the discretized state Eq. ( <ref type="formula">22b</ref>) is nonlinear and computing y k+1 is performed via Newton's method, only 1-2 Newton iterations per time step in the state computation are needed in this example. Therefore the difference between a nonlinear state PDE solve and linear PDE solve is small. Example 2: Solid Fuel Ignition Model. This example is modeled after <ref type="bibr">[8]</ref>. The nonlinearity in (25b) is f (y) = -&#948;e y with &#948; = 5. The desired state is y d (x, t) = &#960; -2 sin(&#960; x 1 ) sin(&#960; x 2 ), the initial state is y 0 &#8801; 0, and the penalty is &#945; = 5 &#8226; 10 -3 .</p><p>The numerical results for this example mirror those of the previous example. Optimization with FOM Hessians and with ROM Hessians converged to virtually the same solution, and the optimization histories for the two approaches are nearly  the same, as shown in Table <ref type="table">3</ref>. The computed optimal control u at t = 0.1, 0.5, 1 is shown in Fig. <ref type="figure">4</ref>.</p><p>The ROM Hessian approach is much faster as shown in Table <ref type="table">4</ref>. Again, the timing reported in Table <ref type="table">4</ref> is for computations using MATLAB running on a MacBook Pro (13-inch, Retina, Mid 2014) and the reason for the difference in computing time is the cost of Hessian-times-vector multiplications. The ROM Hessian requires 96 ROM PDE solves for ROM Hessian-vector operations in contrast to the 96 FOM PDE solves in the FOM Hessian approach. Figure <ref type="figure">3</ref> shows that when convergence history is plotted not against iteration count, but against computational work performed, measured in terms of PDE solves, the ROM Hessian approach is much faster.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusions</head><p>We have introduced ROM Hessian approximations for use in inexact Newton methods for large-scale smooth optimization problems obtained from discretizations of optimal control problems governed by parabolic PDEs. In contrast to other ROM approaches, which approximate the optimization problem, our approach retains the original FOM objective function and gradient. The Hessian ROM is computed by applying POD to state and adjoint snapshots that have to be computed anyway, and therefore ROM computation is relatively inexpensive. Since original objective functions and gradients are used, the qualitative global convergence properties of the original line-search Newton method and the line-search Newton method with ROM Hessian approximation are the same. However, since Hessian approximations are significantly cheaper, the ROM Hessian approach can lead to substantial savings. This is confirmed by numerical experiments with two semilinear parabolic optimal control problems. The two optimization approaches had essentially the same convergence behavior, but the ROM Hessian approach had a factor 5-8 speedup because of computational savings due to the Hessian approximations. However, it is possible to construct examples, where the ROM Hessian approach does not lead to computational savings. When the ROM Hessian too poorly approximates the true one, the ROM Hessian approach can require substantially more optimization iterations, which erase savings enjoyed within an optimization iteration. Additional analysis and numerical tests are part of future work.</p></div></body>
		</text>
</TEI>
