<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Efficient Upwind Finite-Difference Schemes for Wave Equations on Overset Grids</title></titleStmt>
			<publicationStmt>
				<publisher>SIAM</publisher>
				<date>10/31/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10512256</idno>
					<idno type="doi">10.1137/22M1516178</idno>
					<title level='j'>SIAM Journal on Scientific Computing</title>
<idno>1064-8275</idno>
<biblScope unit="volume">45</biblScope>
<biblScope unit="issue">5</biblScope>					

					<author>J B Angel</author><author>J W Banks</author><author>A M Carson</author><author>W D Henshaw</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[\mathrm{ \mathrm{ \mathrm{\mathrm{ \mathrm{ . \mathrm{ \mathrm{\mathrm{ . \mathrm{\mathrm{\mathrm{\mathrm{ \mathrm{\mathrm{.© 2023 \mathrm{ \mathrm{\mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{\mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{\mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{\mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ \mathrm{ . 45, \mathrm{\mathrm{ . 5, \mathrm{ \mathrm{ . \mathrm{2703--\mathrm{2724]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Upwind dissipation Insufficient dissipation</head><p>Fig. <ref type="figure">1</ref>.1. Computation of a electromagnetic plane wave impinging on a perfectly conducting body using overset grids. The zoomed image on the left shows an example of the numerical instability that can arise when using thin boundary grids and insufficient dissipation. On the right is the same case using the the new upwind scheme described in this article, which is stable.</p><p>of thin boundary grids, for which nondissipative schemes are generally unstable, as shown in the right plot of Figure <ref type="figure">1</ref>.1. These new upwind schemes thus provide a robust and accurate approach for the solution of wave propagation problems on complex geometry using overset grids.</p><p>Despite their promise, the upwind formulations in <ref type="bibr">[6,</ref><ref type="bibr">1]</ref> have at least two notable drawbacks that have limited their application: Their implementation is rather difficult, and their computational cost is significantly higher than standard centered schemes with artificial dissipation. The primary focus of the present manuscript is therefore to address these two drawbacks by reformulating the upwind scheme. It will be seen that the reformulated scheme retains the robustness and stability properties of the original upwind approach, yet it is much faster, easier to implement, and more easily extensible to more complex systems. Furthermore, the analyses in <ref type="bibr">[19,</ref><ref type="bibr">7,</ref><ref type="bibr">2,</ref><ref type="bibr">5]</ref> show that the upwind formulation is necessary even without overlapping grids since perturbations from low-order terms and/or boundary closures can otherwise be destabilizing. The new reformulated upwind scheme has been found to play a key role in stabilizing a variety of solvers and therefore enables robust simulation of new physical regimes. For example, Figure <ref type="figure">1</ref>.2 shows sample simulations using the new reformulated upwind scheme on overset grids for nonlinear dispersive Maxwell's equations <ref type="bibr">[19]</ref> and for the equations of incompressible elasticity <ref type="bibr">[7]</ref>. For the latter problem, use of the new upwind dissipation was key to developing a stable scheme for traction-free boundaries, which are notoriously difficult for finite-difference schemes. Note also that the manuscripts <ref type="bibr">[19,</ref><ref type="bibr">7]</ref> contain extensive verification and accuracy studies.</p><p>2. Formulation of optimized upwind schemes. The new upwind predictorcorrector (UPC) scheme will be derived from a modified equation (ME) analysis of the original upwind schemes from <ref type="bibr">[6]</ref>. For a given discrete scheme, ME analysis identifies a related continuous PDE that approximates well-resolved solutions to the discrete scheme. This new PDE will consist of the original wave equation together with higherorder terms that contribute dispersion and dissipation. This ME analysis will expose the leading-order dissipation term for the original upwind (UW) formulation, which will take the form of a hyperviscosity. Discrete versions of this hyperviscosity will then be appended to the standard nondissipative discrete scheme for the wave equation to yield the optimized UPC scheme.</p><p>Consider solving the initial boundary value problem for the wave equation on a domain \Omega , in n d space dimensions, for u = u(x, t),</p><p>x \in \Omega , t &gt; 0, <ref type="bibr">(2.1)</ref> where t is time, c &gt; 0 is the wave speed, \Delta = \sum n d d=1 \partial 2 x d is the Laplacian in n d space dimensions, x = [x 1 , x 2 , . . . , x n d ] T \in R n d is the vector of spatial coordinates, and \partial m \alpha indicates the mth partial derivative in the \alpha -direction. Equation (2.1) is augmented with appropriate initial conditions and boundary conditions. To clarify the discussion, consider \Omega to be a square domain in n d space dimensions, where the equations are discretized using finite-difference approximations of even order p = 2q, q = 1, 2, . . .. Let x i = (i 1 h 1 , . . . , i n d h n d ) denote grid points on a Cartesian grid with grid-spacings h d , where i = (i 1 , . . . , i n d ) \in Z n d is a multi-index. The grid spacings may alternatively be denoted as h x , h y , and h z . Let U n i \approx u(x i , t n ) denote the discrete solution, where t n = n\Delta t, and \Delta t &gt; 0 is the time step. Introduce the standard divided difference operators in space, <ref type="bibr">(2.2b)</ref> where e d \in R n d is the unit vector in direction d. Also note that the undivided difference operators are defined as \Delta \pm x d = h d D \pm x d and \Delta 0x d = h d D 0x d . Divided and undivided differences in time are defined similarly.</p><p>In order to ground the discussion about the new optimized upwind scheme, it is useful to first provide some general context with respect to numerical methods for wave propagation on overlapping grids. The present manuscript focuses on wave equations formulated in second-order spatial form. Motivation to treat the secondorder form directly, as opposed to conversion to a first-order system, includes fewer degrees of freedom, lack of potentially challenging compatibility constraints, larger time steps, and smaller errors; for a detailed discussion, see <ref type="bibr">[6]</ref>. In the context of second-order wave equations with overlapping grids, there are essentially two options: adding ad hoc dissipation <ref type="bibr">[16]</ref> or using an upwind scheme <ref type="bibr">[6,</ref><ref type="bibr">1]</ref>. The upwind formulation was found to be robustly stable, even on overlapping grids, without the need for any user-defined parameters. However, it was found to be quite complex and rather more computationally expensive than the centered (dissipation-free) scheme even on a Cartesian grid. To illustrate this, first note that the upwind scheme developed previously in <ref type="bibr">[6]</ref> was constructed for the equations written as a first-order system in time by introducing the velocity, v = \partialu/\partialt, as in</p><p>As an example, the fourth-order accurate upwind scheme in two Cartesian dimensions (denoted by UW4-2D) is reproduced here as</p><p>\biggr) , (2.4a)</p><p>where</p><p>Note that L 4h is a fourth-order accurate discretization of the operator \scrL , while L 2h is a second-order accurate approximation. For comparison, the same scheme without upwinding simplifies to</p><p>while the usual fourth-order accurate scheme that discretizes the wave equation directly is even simpler:</p><p>Note that the UW4-2D in (2.4) has a stencil width of p + 3 in each direction, which is two points wider than the stencil width p + 1 of the schemes (2.5) and (2.6). The upwind scheme includes numerous approximations of higher derivatives, and these expressions become even more complicated in three dimensions and on curvilinear grids.</p><p>The fundamental idea pursued in this work is to identify the key leading terms that provide the essential upwind dissipation needed for the desired stability properties. <ref type="foot">1</ref>To identify these terms, an ME analysis can be used. Let \v u = \v u(x, t) denote the continuous function that exactly satisfies the discrete upwind approximation (it is important to note that \v u does not satisfy the wave equation). The ME is found by expanding the equation for \v u in a Taylor series and recursively substituting for derivatives of \v u. For upwind schemes of order p \geq 2, the MEs take the form</p><p>where h = max d h d is a measure of the spatial grid spacing, h p \scrE p \v u is the leading-order dispersion term, and \scrM p \partial t \v u is the leading-order dissipation term with</p><p>where \nu p is a positive (dimensionless) coefficient that determines the strength of the dissipation and will be specified later. In (2.7), the term \scrM p \partial t \v u represents the hyperdiffusion operator, which is introduced by the upwind formulation. The form of the leading-order dissipation term is important both because of its high-derivative hyperdissipation (which necessitates widening the stencil as would be expected for an upwind scheme for the wave equation with waves traveling in both directions) and because of the large dissipation coefficient \nu p c/h d , which scales as O(1/h d ) as the mesh is refined. This is in contrast to a more typical artificial dissipation term which would have a constant dissipation coefficient and a lower-order differential operator that does not necessitate a wider stencil <ref type="bibr">[16]</ref>. See the comments below for further discussion on this point.</p><p>Given the form for \scrM p in (2.8), we then propose the following simplified scheme that consists of a centered approximation together with the upwind dissipation:</p><p>Here L p defines the spatial part of a high-order accurate nondissipative scheme (see <ref type="bibr">(3.9)</ref> in section 3 for more details), M p is a discrete approximation to (1/\nu p ) \scrM p ,</p><p>p/2+1 , (2.10) and D t u n i denotes some approximation to the time derivative of u; the form of this approximation is discussed next. One important aspect of the semidiscrete scheme (2.9) is that for smooth modes, the dissipation operator scales as O(h p+1 ), which implies that the term M p (D t u n i ) can be approximated to low-order accuracy without affecting the overall \scrO (h p ) error for the scheme. Obvious choices for the time derivative D t are to use a centered, D t = D 0t , or backward difference, D t = D - t , in time. The former choice leads to an implicit scheme, and since our focus is on explicit schemes, this is not considered any further. The latter choice leads to an explicit scheme, but a von Neumann analysis reveals that the resulting scheme requires a smaller time step for stability than the scheme without dissipation. To overcome this time-step restriction, we instead propose a predictor-corrector scheme,</p><p>which simplifies to the form of an operator split scheme:</p><p>Equation (2.12) defines the UPC scheme. <ref type="foot">2</ref> As we shall see in section 3, this approach yields a scheme that inherits the standard time-step restriction from the centered scheme. In fact, the modular nature of the scheme is itself advantageous and enables the upwind dissipation term to be added with a simple and computationally efficient implementation. It is worth noting that the upwind dissipation in (2.12) has a stencil width of p + 3, which is two points wider than the original centered scheme, which has a stencil width of p + 1. This is similar to classical upwind schemes for the advection equation which involve additional grid points in the upwind direction for a specific level of accuracy. For the wave equation, which has information propagating in both directions, the upwind scheme is centered and is wider in both directions.</p><p>It is useful to compare the UPC scheme to a more traditional artificial dissipation scheme, e.g., as used in <ref type="bibr">[16]</ref>, which is here called the FDA scheme, for finite difference with artificial dissipation. At fourth-order accuracy, the FDA scheme is composed of the ME scheme (2.6) together with a standard fourth-order artificial dissipation:</p><p>Note that the artificial dissipation (AD) term in (2.13) uses fourth-order undivided differences and includes a user-defined adjustable parameter \alpha , which has dimensions of inverse time or, equivalently, dimensions of c over length (recall that \nu p was dimensionless). The original FDA scheme itself lacked any formulaic mechanism to select \alpha , and the fact that \alpha is dimensional complicates this selection even more. That said, in practice, when the problem has been nondimensionalized, weak instabilities can be stabilized using an O(1) value for \alpha , and in this case, the scheme remains fourth-order accurate. However, for strong instabilities, such as those occurring for overlapping grid computations with thin boundary grids (i.e., a fixed number of grid points between the physical and interpolation boundaries), large values for \alpha must be chosen via trial and error and in fact \alpha must scale as \sim c/h. This scaling ultimately leads to a reduction in the order of accuracy of the scheme in addition to a significant reduction in the maximal stable time step.</p><p>It was the development of the upwind scheme for wave equations in <ref type="bibr">[6]</ref> that provided a path forward to remedy the deficiencies with the FDA scheme. As previously discussed, for a given order of accuracy, the upwind scheme naturally uses a higher-derivative hyperdiffusion that requires a wider stencil (e.g., (\Delta +x d \Delta - x d ) 3 at fourth-order accuracy). With the higher-derivative hyperdiffusion, the large 1/h dissipation coefficient implied by the upwind construction can be used without changing the order of accuracy of the scheme. Overall, the new UPC scheme thus has several significant advantages over the FDA scheme:</p><p>1. dimensionless dissipation coefficient and ultimately no adjustable parameters; 2. no reduction in the order of accuracy of the baseline scheme even with more powerful upwind stabilization; 3. predictor-corrector formulation avoids any time-step reductions with respect to the baseline scheme.</p><p>3. Von Neumann stability and the dissipation coefficient. The stability of the proposed predictor-corrector upwind scheme (2.12) is now studied using von Neumann analysis for the pure initial value problem. One might think that the ME analysis discussed in section 2 would provide the appropriate value for the coefficient, \nu p , of the upwind dissipation. However, this approach to choosing \nu p is problematic, as different values of \nu p can be found, depending on the particular variation of the original upwind scheme (which involves nonunique choices for the stencils chosen for various terms). Von Neumann stability analysis, however, provides a rational way to choose the upwind coefficient \nu p .</p><p>The von Neumann analysis considers the initial value problem on an infinite or periodic domain in space and expands the discrete solution in a Fourier series. We thus look for solutions of the form U n j = a n e ik\cdot x \bfj , where a \in C is the time-stepping amplification factor and k = [k 1 , k 2 , . . . k n d ] T \in R n d is the wave number vector. Difference operators in Fourier space become multiplication by the corresponding Fourier symbol of the operator, for example,</p><p>where \xi d = k d h d \in [ - \pi , \pi ] is the normalized wave number.</p><p>Lemma 3.1. The scheme (2.9) with no upwind dissipation (M p = 0) is stable (no growing modes in time) on an infinite (or periodic) spatial domain provided the Fourier symbol, L p \widehat, is real and</p><p>Proof. Substituting the anstaz U n j = a n e ik\cdot x \bfj into (2.9) and clearing common factors of e ik\cdot x \bfj and a n - 1 leads to the quadratic equation</p><p>Using the theory of von Neumann polynomials <ref type="bibr">[18]</ref>, the roots of <ref type="bibr">(3.3)</ref>  This completes the proof.</p><p>Moving now to the proposed upwind discretization, the stability of the predictorcorrector upwind scheme (2.12) is summarized by the following theorem. Theorem 3.2. Given any L p with L p \widehat \in R, the maximum stable time step for the predictor-corrector upwind scheme (2.12) on an infinite (or periodic) spatial domain is given by the time-step constraint implied by Lemma 3.1 along with the following restriction on the dissipation parameter \nu p :</p><p>where \lambda x d are the componentwise CFL parameters:</p><p>Proof. To perform the analysis, it is convenient to algebraically eliminate the predicted state from (2.12) so that the full time-step update is expressed:</p><p>Seeking separable solutions of the form U n j = a n e ik\cdot x \bfj and clearing common factors of e ik\cdot x \bfj and a n - 1 leads to the quadratic equation</p><p>where M p \widehat is the Fourier symbol of M p in (2.10) given by</p><p>For \nu p M p \widehat = 0, Lemma 3.1 gives conditions for stability. For \nu p M p \widehat \not = 0, stability requires | a| \leq 1, and the theory of von Neumann polynomials yields the following necessary and sufficient conditions: The traditional space-time schemes used here are based on a Taylor time-stepping (Lax Wendroff or ME) approach <ref type="bibr">[16]</ref>. In particular, the pth-order accurate space-time discrete approximation is</p><p>where \Delta m h,r is an rth-order accurate centered approximation to \Delta m . For example, the fourth-order accurate space-time scheme was given in <ref type="bibr">(2.6)</ref>. Note that the operators \Delta m h,r are not uniquely defined, and here the approximation with minimal stencil size is used. These minimal-stencil schemes all appear to have a CFL-one time-step restriction:</p><p>This result has been proved in one dimension in <ref type="bibr">[3]</ref> and in more dimensions for fourthorder and six-order schemes in <ref type="bibr">[14]</ref>. Here the result is conjectured to be true in general. Assuming <ref type="bibr">(3.10)</ref>  where \lambda x d = c\Delta t/h d is the CFL parameter and s v &lt; 1 is a safety factor. Definition 3.4 (constant dissipation parameter). The constant dissipation parameter is defined to be</p><p>where s c is a safety factor. Typical simulations are run with a time step less than the maximum allowed (e.g., CFL= 0.9), in which case s c = 1 is the suggested choice.</p><p>Note that for curvilinear or overset grids, \nu p,v could vary across the grid, whereas \nu p,c does not depend on \lambda x d and so will be constant. Also note that the safety factors are included to accommodate the strict bounds in (3.4) and <ref type="bibr">(3.11)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>4.</head><p>Comparison of the UW and UPC schemes. This section presents a comparison of the discrete solutions from the original UW schemes to those from the reformulated UPC schemes. The purpose in this comparison is primarily to demonstrate that critical aspects of the upwind approximations from <ref type="bibr">[6]</ref> are inherited by the new UPC scheme. These results indicate that although the UPC scheme employs significant algorithmic simplification, the actual quality of the numerical approximations remains undiminished. In fact, for the tests presented here, the approximation errors are actually slightly smaller for the UPC schemes.</p><p>Consider two problems motivated by the presentation in <ref type="bibr">[6]</ref>, one with a smooth solution and one with a discontinuous solution. In both cases, we solve the wave equation (2.1) for x \in ( - 1, 1), c = 1, and periodic boundary conditions. For the smooth solution, we choose initial conditions to match a traveling wave exact solution, u s (x, t) = sin(5\pi (x -ct)). For the discontinuous solution, the initial conditions take the form of a top hat,</p><p>and the resulting d'Alembert solution, u T H (x, t), consists of two top hats, one traveling to the left and one to the right. Note that the top-hat solution is a very weak solution since \partial t u(x, t) contains propagating delta functions, and this is therefore a challenging numerical test.</p><p>Numerical results are generated using the original UW schemes and the new UPC schemes with \nu p,c = sc 2 p+1 and s c = 1. In both cases, \lambda x = .9/ \surd 3 (the factor of \surd 3 is included to mock a simulation in three space dimensions).</p><p>Figure <ref type="figure">4</ref>.1 shows results for the second-, fourth-, and sixth-order UPC and UW schemes. Max-norm errors are shown for the smooth solution at t = 2. Discrete L 1 -norm errors are presented for the top-hat solution at t = 0.5, where the discrete U P C , is also noted in the plots. In all cases, the convergence rates are close to the expected asymptotic rates. The UPC schemes are generally seen to yield smaller errors.</p><p>To get a sense of the qualitative nature of the approximations for the top-hat problem, Figure <ref type="figure">4</ref>.2 shows the computed solutions for u T H with N = 1965 grid points at t = 0.5. As expected, both the UW and UPC solutions have oscillatory behavior near the jump, although those for the UW scheme are somewhat smaller; this could be due to the UPC scheme having somewhat more dissipation.</p><p>Taken together, Figures 4.1 and 4.2 indicate that the UPC and UW schemes produce similar approximate solutions for similar grid resolutions despite the simplifications employed in the development of the UPC scheme. U P C , is also noted. In all cases, the dissipation parameter is \nu p,c = sc 2 p+1 , the safety factor is sc = 1, and the time step is selected using \lambda x = .9/ \surd 3 (the factor of \surd 3 is included to mimic a problem in three dimensions). 5. The upwind scheme on curvilinear and overset grids. In this section, we discuss properties of the new UPC scheme on overset grids. Overset grids (also known as overlapping or Chimera grids) are a powerful technique, enabling the efficient simulation of solutions to PDEs on complex, perhaps moving, geometry using finitedifference and finite-volume methods. Section 5.1 gives a brief overview of overset grids and describes the form of the upwind dissipation for curvilinear grids. The stability of the UPC scheme for a one-dimensional overset grid is studied in section 5.2, where it is shown that the upwinding stabilizes otherwise unstable grid configurations. Section 5.3 presents some long-time simulations of typical two-and three-dimensional overset grids to give further evidence of the robustness of the UPC scheme.</p><p>5.1. Curvilinear and overset grids and upwind dissipation. As illustrated in Figure <ref type="figure">5</ref>.4, an overset grid, denoted as \scrG , consists of a set of component grids \{ G g \} , g = 1, . . . , \scrN g , that cover the entire domain \Omega . Each component grid, G g , is a logically rectangular, curvilinear grid defined by a smooth mapping from a unit cube parameter space r to physical space x:</p><p>All grid points in \scrG are classified as discretization, interpolation, or unused points <ref type="bibr">[8]</ref>.</p><p>The overlapping grid generator Ogen <ref type="bibr">[15]</ref> from the Overture framework is used to construct the overlapping grid information. A typical overset grid consists of one or more background Cartesian grids together with narrow boundary fitted grids. Solution values at interpolation points are obtained using tensor-product Lagrange interpolation from a donor grid. For the wave equation, the interpolation stencil is chosen to have a width of p + 1 at order of accuracy p <ref type="bibr">[8]</ref>. The interpolation may be explicit (no donor points are themselves interpolation points) or implicit.</p><p>The wave equation can be discretized on a curvilinear grid using conservative or nonconservative approximations; see <ref type="bibr">[16]</ref> for details. Nonconservative approximations can be formed using the mapping method. Given a mapping x = G g (r) and its metric derivatives, \partialr j /\partialx i , derivatives of a function u(x) = U (r) are first written in parameter space using the chain rule, for example, \partialu \partialxi = \sum n d j=1 \partialrj \partialxi \partialU \partialrj . On mapping from physical space x to parameter space r, the spatial derivative portion of the wave equation takes the form of a general variable coefficient second-order differential operator. Given the mapped equations, derivatives of U with respect to r j are then approximated with standard finite differences.</p><p>Recalling the modified-equation argument used to develop (2.10) for Cartesian grids, one can similarly derive the ME for the upwind wave operator on a mapped domain. Doing so reveals that the leading-order terms in the dissipation for a curvilinear domain take a similar form to (2.8). As a result, for a curvilinear grid, we take the dissipation \scrM p , to be \scrM p = - where \Delta +,r d and \Delta - ,r d are the usual forward and backward undivided difference operators in parameter space.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Stability on a one-dimensional overlapping grid.</head><p>In this section, we study the stability of the UPC scheme on a one-dimensional overlapping grid consisting of a background grid and a smaller boundary fitted grid as shown in Figure <ref type="figure">5</ref>.1. A collection of different overlapping configurations is generated by varying the grid spacings on the two grids. A matrix stability analysis is used to find any unstable modes.  Algorithm 5.1 UPC scheme for one-dimensional overlapping grid, even order p.</p><p>while t \leq T f inal do \triang Begin time-stepping loop 5:  The nondissipative scheme is found to have many unstable grid configurations. As the coefficient of the upwinding term is turned on, however, the number of instabilities decreases and eventually becomes zero; the UPC scheme is found to be stable on all grid configurations tested.</p><p>We solve the initial boundary value for the wave equation on the interval x \in ( - 1, 1), with c = 1, and with homogeneous Dirichlet boundary conditions. The problem is discretized using the overlapping grid illustrated in Figure <ref type="figure">5</ref>.1. Let U L,j and U R,j denote the discrete solution values on the left and right grids. Let x L,j and x R,j denote the grid points with grid spacings h L and h R , respectively. The UPC scheme is given in Algorithm 5.1. The solutions at interior points are advanced with the ME scheme. Homogeneous Dirichlet boundary conditions are imposed at x L,0 = - 1 and x R,N R = 1. Compatibility boundary conditions are used to obtain ghost point values (not shown in Figure <ref type="figure">5</ref>.1), leading to odd symmetry conditions. For a pth-order accurate scheme, there will be p/2 + 1 interpolation points on either grid, and these are obtained from the opposite grid using Lagrange interpolation with a stencil width of p + 1. The interpolation weights are collected into the vectors w L,k and w R,k , k = 0, 1, . . . , p/2, where k denotes the interpolation point.</p><p>A matrix stability analysis will be used to study the stability of each grid configuration. After eliminating constraint equations (i.e., boundary and interpolation conditions), the scheme in Algorithm 5.1 can be expressed as</p><p>where (5.4) \bigl(</p><p>For our purposes here, solutions to (5.4) with eigenvalues | a| &gt; 1 will be called unstable, as they indicate growth in time. Figure <ref type="figure">5</ref>.2 shows the computed eigenvalues for the fourth-order accurate nondissipative scheme (\gamma = 0) and UPC scheme (\gamma = 1) with variable dissipation \nu p,v and a safety of s v = .9 for the grid defined by \delta = 0.8, where \gamma is an artificial parameter introduced as a switch to slowly turn on the upwind dissipation. Here \gamma = 0 corresponds to the standard nondissipative centered scheme, while \gamma = 1 corresponds to the full upwind dissipation. The nondissipative scheme is seen to be unstable since there are eigenvalues whose magnitude exceeds unity. On the other hand, the UPC scheme supplies damping to these modes, thus bringing them inside the unit circle and eliminating any growth.</p><p>Algorithm 5.2 gives the parameter search used for checking the stability of the scheme. <ref type="foot">3</ref> Different overlapping grid configurations will be generated. Since the focus is on the difficult case of a thin boundary grid, the number of grid points on the right grid will be fixed at N R = 15. The ratio of the grid spacings, denoted by \delta = h L /h R , will be varied. We fix h R = 0.05 and vary \delta to define h L . Given h L , the overlapping grid is determined by requiring a minimal overlap subject to the interpolation being implicit or explicit. The search over grid spacing ratios is carried out over a range \delta \in [\delta min , \delta max ] discretized using N \delta + 1 points with spacing \Delta \delta = (\delta max -\delta min )/N \delta so that \delta j = \delta min + j\Delta \delta , j = 0, 1, . . . , N \delta . Similiarly, the search over the dissipation parameter \gamma is discretized using N \gamma + 1 values with \gamma k = k\Delta \gamma , k = 0, 1, . . . , N \gamma .</p><p>Figure <ref type="figure">5</ref>.3 shows results of the parameter study for the fourth-order accurate UPC scheme with both variable and constant dissipation and both explicit and implicit interpolation. For the study, \delta min = 1 2 , \delta max = 2, N \delta = 501, N \gamma = 201, and c\Delta t = .6 min(h L , h R ). Constant dissipation (3.13) uses a safety factor s c = 1, while variable dissipation (3.12) uses a safety factor s v = 0.9. The total number of unstable modes and unstable grid configurations for selected values of \gamma is shown. As \gamma increases, both the number of unstable modes and the number of grid configurations Algorithm 5.2 Parameter search for unstable modes.  Total number of unstable modes (UM) and unstable grid (UG) configurations as a function of the upwind scaling factor \gamma for the fourth-order accurate UPC scheme on many onedimensional overset grids. The left plot shows the results for constant dissipation (3.13). The right plot shows results for variable dissipation (3.12). Results for explicit and implicit interpolation are shown. The UM marks are colored according to the value of | a| - 1 for the most unstable mode found. The UPC scheme is shown to provide effective stabilizing dissipation since there are no instabilities for \gamma \gtrappr 0.55.</p><p>decrease. The magnitude of the worst unstable mode also decreases as \gamma increases, as indicated by the colored markers. Implicit interpolation is seen to behave better than explicit interpolation, and variable dissipation is seen to behave better than the constant dissipation. In all cases, the parameter search fails to reveal any instabilities for \gamma \gtrappr 0.55.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Overset grids:</head><p>Accuracy and long-time simulations. Experience shows that most overset grids in two and three space dimensions will excite unstable modes for nondissipative schemes. Experience with the UW and UPC schemes when solving the scalar wave equation, Maxwell's equations, and also the equations of incompressible linear elasticity, on a wide variety of overset grids in two and three dimensions, shows that the upwind schemes remain stable. In this section, we provide a few examples of solving the wave equation for very long times in order to illustrate the robustness of the UPC scheme. We also present some sample grid convergence studies showing that the UPC schemes retain the same order of accuracy as the underlying base schemes.</p><p>In order to more easily observe any unstable modes within a moderate number of time steps, we use initial conditions where the grid function values are taken at random in [0, 1] in an attempt to uniformly seed all modes. The UPC scheme is then used to solve the scalar wave equation for very long times, and the solutions are monitored for any growth. Due to the upwind dissipation in the scheme, the magnitude of the computed solutions for a stable scheme is expected to decay to zero over time. The initial decay in the solution is expected to be rapid due to the fact that random initial conditions have significant energy in unresolved high-frequency modes (e.g., the plus-minus mode) which the upwind dissipation will quickly damp. However, for well-resolved modes, the UPC schemes introduce dissipation that scales as O(h p+1 ), i.e., one order higher than the leading truncation error (dispersion error), and so at later times, the damping for random initial conditions will flatten. This behavior is indeed observed in Figure <ref type="figure">5</ref>.5, as discussed further below.</p><p>Figure <ref type="figure">5</ref>.4 shows coarsened versions of overset grids for the four cases considered. The grid for the disk, for example, consists of a background Cartesian grid and an annular grid. The number of points in the radial direction on the annular grid is kept fixed as the grid is refined, resulting in a narrow boundary grid. This is a difficult case from the point of view of stability. The other overset grids are constructed in a similar manner with narrow boundary fitted grids as the grids are refined (see <ref type="bibr">[2]</ref> for details). Without sufficient dissipation, the solution to the wave equation is unstable in all these configurations. Figure <ref type="figure">5</ref>.5 shows max-norms of the computed solutions versus time for the four configurations. Note that the norms are only plotted at a subset of times for t = 100, 200, 300, etc., where a typical time step is \Delta t \approx 10 - 2 . The norms are seen to oscillate as energy is transferred between u and its time derivative, but on average, the norms are seen to slowly decay over time. At large times, only smooth modes remain, and these decay very slowly due to the small damping from the upwind dissipation.  For long times, the norms of the solutions to the second-order accurate scheme are generally smaller and decay more quickly in comparison to the corresponding solutions to the fourth-order accurate scheme. This behavior is expected since the dissipation of smooth modes for the fourth-order scheme is smaller than that for the second-order scheme. In all cases, large numbers of time steps were taken. For the case of the disk, for example, the computation proceeded until t = 10 5 , which required approximately 7.2 million time steps. Results are shown for the second-order accurate and fourthorder accurate UPC schemes using the constant dissipation parameter with s c = 1.</p><p>Figure <ref type="figure">5</ref>.6 shows grid convergence results for solving the wave equation on the circular disk and the solid sphere. For the disk, with radius R d = 1, the exact solution, in polar coordinates (r, \theta ), is taken as the eigenmode u n \theta ,nr (r, \theta , t) = J n \theta (\lambda n \theta ,nr r) cos(n \theta \theta ) cos(\lambda n \theta ,nr c t), <ref type="bibr">(5.5)</ref> where n \theta = 3, n r = 2, and \lambda n \theta ,nr \approx 14.796 is the n r th zero of the Bessel function J n \theta (z). For the sphere of radius R s = 1, a manufactured solution is used, and this solution is defined as a product of trigonometric functions in space and time given by u trig (x, y, z, t) = cos(f x x) cos(f y y) cos(f z z) cos(f t t), <ref type="bibr">(5.6)</ref> for f x = f y = f z = f t = 2\pi . Max-norm errors are plotted for grids of different resolutions for the second-order accurate and fourth-order accurate UPC schemes. The errors are seen to be converging at close to the expected asymptotic convergence rates. We note that the centered schemes without upwinding are unstable on both the disk and sphere grids. 6. Run-time performance comparison: Maxwell's equations. In this section, the CPU run-time performance of the new UPC scheme is compared to the original UW scheme when applied to solving Maxwell's equations. A baseline cost is established using the scheme with no dissipation. In addition, comparison to the traditional artificial dissipation scheme from <ref type="bibr">[16]</ref> is shown. The comparison will be made using the CgMx overset grid solver,<ref type="foot">foot_4</ref> which simulates Maxwell's equations written as a vector wave equation in second-order form for the electric field. Although CgMx can treat dispersive materials <ref type="bibr">[2,</ref><ref type="bibr">5]</ref> and nonlinear active materials <ref type="bibr">[19]</ref>, the comparison here will be restricted to the nondispersive case. Note that both the original and the optimized upwind schemes have undergone extensive accuracy and stability tests using CgMx (see <ref type="bibr">[2,</ref><ref type="bibr">5]</ref>), and therefore the present focus is exclusively on the run-time performance of the schemes. <ref type="foot">5</ref>To be specific, we compare the run-time performance of the following four schemes: 1. FDpg: the nondissipative finite-difference scheme <ref type="bibr">[16]</ref>; 2. FDApg: the finite-difference scheme plus artificial dissipation <ref type="bibr">[16]</ref>; 3. UWpg: the original UW scheme <ref type="bibr">[1]</ref> based on <ref type="bibr">[6]</ref>; 4. UPCpg: the UPC scheme. Here the suffix ``pg"" is used to denote the order of accuracy and grid type, as in FD4c or UPC2c, where p = 2 or p = 4 is the order of accuracy, and g equals ``r"" for a rectangular grid, while g equals ``c"" for a curvilinear grid.</p><p>The run-time performance is compared for both Cartesian and general curvilinear grids. CgMx has both Cartesian grid and curvilinear grid implementations, and the results will show the significant run-time benefit of using Cartesian grids. The Cartesian grid versions are also more memory efficient, but this aspect is not considered here. Note that even when a grid is Cartesian, the curvilinear grid implementation can be used and yields identical numerical results (up to roundoff errors) to the Carte-sian solver, albeit with greater cost. Finally, note that a complete performance study may also consider the use of GPUs and parallel computers, but that is left for future reports.</p><p>The performance of the schemes is evaluated using two grids: a grid for the unit square [0, 1] 2 in two dimensions and a grid for the unit box [0, 1] 3 in three dimensions. The grid for the square is a Cartesian grid with 1024 points in each direction, while the grid for the box is a Cartesian grid with 40 points in each direction. Comparisons involve actual CPU times (as opposed to CPU/step) since not all schemes use the same time step. The FDA scheme, for example, requires a reduced time step for stability as the coefficient of dissipation increases. The UW2 scheme, on the other hand, has a smaller time-step bound, while the UW4 has a larger time-step bound than the corresponding UPC schemes <ref type="bibr">[6]</ref>. Also note that the expected CPU cost on a general overset grid will depend on the relative proportion of grid points that lie on Cartesian and curvilinear grids. Figure <ref type="figure">6</ref>.1 compares the CPU times to solve for each scheme relative to the corresponding FD2r scheme in two or three dimensions. Thus, for example, the UPC4r scheme in two dimensions is 7.9 times more expensive than the FD2r scheme in two dimensions, while the UW2c scheme in three dimensions is 564.2 times more expensive than the FD2r scheme in three dimensions. There is significant information contained in the charts of Figure <ref type="figure">6</ref>.1, and important observations include the following:</p><p>1. The UPC scheme is much faster than the original UW scheme. More details are given below as to specific speedups. 2. For Cartesian grids, the UPC scheme is roughly two to three times more expensive than the finite-difference scheme of the same order. For Cartesian grids, the wider upwind stencil adds significant extra operations. 3. For curvilinear grids, the UPC scheme is at most around 50\% more expensive than the finite-difference scheme of the same order. In this case, the sparse upwind stencil does not add any significant extra work compared to the more complex curvilinear stencil for the interior scheme. To help in the interpretation of the data, Table <ref type="table">6</ref>.1 presents some of the information in Figure <ref type="figure">6</ref> Fig. <ref type="figure">6</ref>.1. Square and box: CPU times relative to the two-or three-dimensional version of the FD2r scheme based on total time to solve. The FD2r scheme is the second-order accurate nondissipative finite-difference scheme. The UPC4c scheme, for example, is 30.6 times slower than the FD2r scheme in two dimensions, while in three dimensions, it is 74.9 times slower than the threedimensional FD2r scheme. Rel-CPU, Square 1024 of Cartesian grid codes to curvilinear grid codes and also the relative costs of secondorder accurate to fourth-order accurate schemes. Important observations include the following:</p><p>1. For Cartesian grids, the cost increase from second order to fourth order is modest, a factor of 2.4 -4. This is consistent with the size of the sparse stencils for Cartesian grids. 2. The increased cost from curvilinear to Cartesian is more significant with factors between 4 and 29 at fourth order and three dimensions being much worse. This shows that there can be significant benefits to using Cartesian grids over the majority of the domain. Figure <ref type="figure">6</ref>.2 compares the CPU times for each scheme relative to the corresponding finite-difference scheme (e.g., UPC4c is compared to FD4c). This gives a sense of how the schemes with dissipation compare to the corresponding nondissipative versions. In this light, the UW scheme is seen to be very expensive. As might be expected, the curvilinear grid versions are more expensive than the corresponding rectangular grid versions. There is less of a cost increase for adding dissipation to curvilinear grids since the interior stencils are more complex, while the dissipation operator is nearly the same as in the Cartesian grid case.</p><p>Finally, Figure <ref type="figure">6</ref>.3 summarizes the CPU-time speedups of the new UPC scheme versus the old upwind scheme UW. Speedups range from 2.8 to 60.9 with speedups generally being greater for curvilinear grids. These numbers reflect the fact that the UW stencil becomes very complex on curvilinear grids, and this complexity increases with increasing order of accuracy. Note that the UW2 scheme has a smaller time-step bound than the UPC2 scheme, while the UW4 scheme has a larger time-step bound than the UPC4 scheme <ref type="bibr">[6]</ref>. This partially explains why the UW2r scheme performs relatively much worse than the UW4r scheme.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusions.</head><p>We have presented a reformulation of the upwind scheme for the second-order wave equation originally derived in <ref type="bibr">[6]</ref> and <ref type="bibr">[1]</ref>. The new formulation has three major advantages. The first advantage is that it greatly simplifies the discrete approximation and implementation. The second advantage is that it can be applied directly to the wave equation in second-order form without the need to form a first-order system in time. The third advantage is that it can be implemented in a modular (predictor-corrector) fashion with the upwinding added as a separate stage following the predicted update of the standard centered scheme. This approach also leads to a scheme that has the same CFL-one time-step restriction as the nonupwind scheme. As a result, the new scheme is easier to implement and much faster.</p><p>The behavior of the new UPC scheme compared favorably to the original UW scheme when solving problems with smooth or discontinuous solutions. A stability analysis showed that the UPC scheme, like the UW scheme, was stable for a model problem with one-dimensional overset grids in the difficult case of a narrow boundary grid. The stability was further confirmed from long-time simulations for a variety of overset grids in two and three dimensions.</p><p>The run-time performance of the UPC scheme, when solving Maxwell's equations, was compared to the original UW scheme as well as to a nondissipative scheme and a scheme with traditional artificial dissipation. Second-order accurate and fourth-order accurate versions of the schemes were compared. Very significant speedups in CPU time were found for the UPC scheme compared to the UW scheme. These speedups ranged by factors of 3 to 60, depending on the order of accuracy and whether the grid was Cartesian or curvilinear.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Copyright &#169; by SIAM. Unauthorized reproduction of this article is prohibited. Downloaded 06/05/24 to 128.113.130.130 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_1"><p>In many ways, this follows the logic described by Christensen in<ref type="bibr">[9]</ref>, where the first-order formulation of the equations of inviscid compressible flow are discretized with the inclusion of an artificial dissipation derived to mimic the effect of upwinding in a Godunov-type scheme.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_2"><p>Note that this operator splitting approach only retains high-order accuracy since the dissipation operator is O(h p+1 ) for smooth modes. For wave propagation problems with inherent physical dissipation and/or dispersion, the predictor step should accurately treat the physical dissipation and dispersion; the upwind dissipation can be then added in a corrector step.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_3"><p>Algorithm 5.2 is written for implicit interpolation. For explicit, line 18 becomes x L,N L = x L,DP + hL(p/2 + 1).</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_4"><p>CgMx is built on and available with the open-source Overture framework, overtureFramework. org.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_5"><p>The results reported here were computed using an Intel Xeon 2.7-GHz processor with the gcc and gfortran compiler and -O optimization flag.</p></note>
		</body>
		</text>
</TEI>
