<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Multivariate localization functions for strongly coupled data assimilation in the bivariate Lorenz 96 system</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>01/01/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10348295</idno>
					<idno type="doi">10.5194/npg-28-565-2021</idno>
					<title level='j'>Nonlinear Processes in Geophysics</title>
<idno>1607-7946</idno>
<biblScope unit="volume">28</biblScope>
<biblScope unit="issue">4</biblScope>					

					<author>Zofia Stanley</author><author>Ian Grooms</author><author>William Kleiber</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Abstract. Localization is widely used in data assimilation schemes to mitigate the impact of sampling errors on ensemble-derived background error covariance matrices. Strongly coupled data assimilation allows observations in one component of a coupled model to directly impact another component through the inclusion of cross-domain terms in the background error covariance matrix.When different components have disparate dominant spatial scales, localization between model domains must properly account for the multiple length scales at play. In this work, we develop two new multivariate localization functions, one of which is a multivariate extension of the fifth-order piecewise rational Gaspari–Cohn localization function; the within-component localization functions are standard Gaspari–Cohn with different localization radii, while the cross-localization function is newly constructed. The functions produce positive semidefinite localization matrices which are suitable for use in both Kalman filters and variational data assimilation schemes. We compare the performance of our two new multivariate localization functions to two other multivariate localization functions and to the univariate and weakly coupled analogs of all four functions in a simple experiment with the bivariate Lorenz 96 system. In our experiments, the multivariate Gaspari–Cohn function leads to better performance than any of the other multivariate localization functions.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>An essential part of any data assimilation (DA) method is the estimation of the background error covariance matrix P b . The background error covariance statistics stored in P b provide a structure function that determines how observed quantities affect the model state variables, which is of particular importance when the state space is not fully observed <ref type="bibr">(Bannister, 2008)</ref>. A poorly designed P b matrix may lead to an analysis estimate, after the assimilation of observations, that is worse than the prior state estimate <ref type="bibr">(Morss and Emanuel, 2002)</ref>. In ensemble DA schemes, the P b matrix is estimated through an ensemble average. Using an ensemble to estimate P b allows the estimates of the background error statistics to change with the model state, which is desirable in many geophysical systems <ref type="bibr">(Smith et al., 2017;</ref><ref type="bibr">Frolov et al., 2021)</ref>. However, this estimate of P b will always include noise due to sampling errors because the ensemble size is finite. In practice, ensemble size is limited by computational resources and, hence, sampling errors can be substantial. The standard practice to mitigate the impact of these errors is localization. A number of different localization methods exist in the DA literature (e.g., <ref type="bibr">Gaspari and Cohn, 1999;</ref><ref type="bibr">Houtekamer and Mitchell, 2001;</ref><ref type="bibr">Bishop and Hodyss, 2007;</ref><ref type="bibr">Anderson, 2012;</ref><ref type="bibr">M&#233;n&#233;trier et al., 2015)</ref>. In this study, we concentrate on distance-based localization. Distanced-based localization uses physical distance as a proxy for correlation strength and sets correlations to zero when the distance between the variables in question is sufficiently large. Localization is typically incorporated into the data assimilation in one of two ways -either through the P b matrix or through the observation error covariance R <ref type="bibr">(Greybush et al., 2011)</ref>. We are focusing on the Schur (or element-wise) product localization applied directly to the P b matrix. The Schur product theorem <ref type="bibr">(Horn and Johnson, 2012, theorem 7.5.3)</ref> guarantees that, if the localization matrix is positive semidefinite, then the localized estimate of P b is also positive semidefinite. Positive semidefiniteness of estimates of P b is essential for the convergence of variational schemes and interpretability of schemes, like the Kalman fil-Published by Copernicus Publications on behalf of the European Geosciences Union &amp; the American Geophysical Union.</p><p>Z. <ref type="bibr">Stanley et al.:</ref> Multivariate localization functions ter, which are intended for minimizing the statistical variance of the estimation error.</p><p>The localization functions presented in this work are suitable for use in coupled DA, where two or more interacting large-scale model components are assimilated in one unified framework. Coupled DA is widely recognized as a burgeoning and vital field of study. In Earth system modeling in particular, coupled DA shows improvements over singledomain analyses <ref type="bibr">(Sluka et al., 2016;</ref><ref type="bibr">Penny et al., 2019)</ref>. However, coupled DA systems face unique challenges as they involve simultaneous analysis of multiple domains spanning different spatiotemporal scales. Cross-domain error correlations in particular are found to be spatially inhomogeneous <ref type="bibr">(Smith et al., 2017;</ref><ref type="bibr">Frolov et al., 2021)</ref>. Schemes that include cross-domain error correlations in the P b matrix are broadly classified as strongly coupled, which is distinguished from weakly coupled schemes, where P b does not include any nonzero cross-domain error correlations. The inclusion of cross-domain correlations in P b offers advantages, particularly when one model domain is more densely observed than another <ref type="bibr">(Smith et al., 2020)</ref>. Strongly coupled DA requires careful treatment of cross-domain correlations, with special attention to the different correlation length scales of the different model components. Previous studies, discussed below, show that appropriate localization schemes are vital to the success of strongly coupled DA.</p><p>As in single domain DA, there is a broad suite of localization schemes proposed for use in strongly coupled DA. <ref type="bibr">Lu et al. (2015)</ref> artificially deflate cross-domain correlations with a tunable parameter. <ref type="bibr">Yoshida and Kalnay (2018)</ref> use an offline method, called correlation cutoff, to determine which observations to assimilate into which model variables and the associated localization weights. The distance-based multivariate localization functions developed in <ref type="bibr">Roh et al. (2015)</ref> allow for different localization functions for each component and are positive definite but require a single localization scale across all components. Other distance-based localization schemes allow for different localization length scales for each component but are not necessarily positive semidefinite <ref type="bibr">(Frolov et al., 2016;</ref><ref type="bibr">Smith et al., 2018;</ref><ref type="bibr">Shen et al., 2018)</ref>. <ref type="bibr">Frolov et al. (2016)</ref> report that their proposed localization matrix is experimentally positive semidefinite for some length scales. <ref type="bibr">Smith et al. (2018)</ref> use a similar method and find cases in which their localization matrix is not positive semidefinite.</p><p>In this work, we build on these methods and investigate distance-based, multivariate, positive semidefinite localization functions and their use in strongly coupled DA schemes. We introduce a new multivariate extension of the popular fifth-order piecewise rational localization function of <ref type="bibr">Gaspari and Cohn (1999, hereafter GC)</ref>. This function is positive semidefinite for all length scales and, hence, appropriate for ensemble variational (EnVar) schemes. We compare this to another newly developed multivariate localization function that extends <ref type="bibr">Bolin and Wallin (2016)</ref> and to two other func-tions from the literature <ref type="bibr">(Daley et al., 2015)</ref>. We investigate the behavior of these functions in a simple bivariate model proposed by <ref type="bibr">Lorenz (1996)</ref>. In particular, we look at the impact of variable localization on the cross-domain localization function. We show that these functions are compatible with variable localization schemes of <ref type="bibr">Lu et al. (2015)</ref> and <ref type="bibr">Yoshida and Kalnay (2018)</ref>. We find that, in some setups, artificially decreasing the magnitude of the cross-domain correlation hinders the assimilation of observations, while in other setups the best performance comes when there are no crossdomain updates. We compare all of the multivariate functions to their univariate and weakly coupled analogs and observe that the new multivariate extension of GC outperforms all multivariate competitors. This paper is organized as follows. In Sect.  XY is necessarily equal to the transpose of P b Y X , i.e., P b XY = P b Y X . Similar to P b , the localization matrix L may be decomposed into a 2&#215;2 block matrix so that the localized estimate of the background error covariance matrix is given by the following:</p><p>where &#8226; denotes a Schur product. In distance-based localization, the elements in the L matrix are evaluated through a localization function L with a specified localization radius, R, beyond which L is zero. For example, if P b ij is the sample covariance Cov(&#951; i , &#951; j ) where &#951; i = &#951;(s i ) denotes the background error in process X at spatial location s i &#8712; R n , then the associated localization weight is</p><p>Often different model components will have different optimal localization radii and, hence, one may wish to use a different localization function for each model component <ref type="bibr">(Ying et al., 2018)</ref>. That is, we may wish to use a different localization function to form each submatrix of L in Eq. ( <ref type="formula">1</ref>). Since we seek a symmetric L matrix, it suffices to construct L XX , L Y Y , and L XY . The remaining submatrix L Y X is determined through L Y X = L XY . Let L XX and the L Y Y be the localization functions associated with model components X and Y , respectively. A fundamental difficulty in localization for strongly coupled DA is how to propose a cross-localization function L XY to populate L XY and, hence, L Y X , such that whenever a block localization matrix L is formed through evaluation of {L XX , L Y Y , L XY }, then L is positive semidefinite. We call this collection of component functions a multivariate positive semidefinite function if it always produces a positive semidefinite L matrix <ref type="bibr">(Genton and Kleiber, 2015)</ref>. We refer to multivariate positive semidefinite functions as multivariate localization functions when they are used to localize background error covariance matrices. In this study, we compare four different multivariate localization functions, including one that extends GC.</p><p>Similar block localization matrices are used in a scaledependent localization, where X and Y are not components, but rather a decomposition of spectral wavebands from a single process. Scale-dependent localization aims to use a different localization radius for each waveband, which leads to the same question of how to construct the between-scale localization blocks. <ref type="bibr">Buehner and Shlyaeva (2015)</ref> constructed L XX and L Y Y through evaluation of localization functions with different radii. They then constructed the crosslocalization matrix through L XY = (L XX ) 1/2 (L Y Y ) T/2 , with L Y X defined analogously. This is appropriate for scaledependent localization where X and Y are defined on the same grid and, hence, L XX and L Y Y are of the same dimension. It is still an open question with regard to how to extend this construction to the strongly coupled application where different components are defined on different grids. The multivariate localization functions we construct below could also be used in scale-dependent localization.</p><p>In our comparison of multivariate localization functions, we investigate the impact of the shape parameters, crosslocalization radius, and cross-localization weight factor. The cross-localization radius, R XY , is the smallest distance such that, for all d &gt; R XY , we have L XY (d) = 0. For all of the functions in this study, the cross-localization radius is related to the within-component localization radii R XX and R Y Y . We define the cross-localization weight factor, &#946; &#8805; 0, as the value of the cross-localization function at distance d = 0, i.e., &#946; := L XY (0). The cross-localization weight factor &#946; is restricted to be less than or equal to 1 in order to ensure positive semidefiniteness <ref type="bibr">(Genton and Kleiber, 2015)</ref> and smaller values of &#946; lead to smaller analysis updates when updating the X model component using observations of Y , and vice versa. Each function we consider has a different maximum allowable cross-localization weight, which we denote &#946; max . Values of &#946; greater than &#946; max lead to functions that are not necessarily positive semidefinite, while values of &#946; less than &#946; max are allowable and may be useful in attenuating unde-sirable correlations between blocks of variables <ref type="bibr">(Lu et al., 2015)</ref>.</p><p>Note that, while this example shows model space localization for a coupled model with two model components, the localization functions we develop and investigate may also be used in observation space localization and can be extended to an arbitrary number of model components.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Kernel convolution</head><p>Localization functions created through kernel convolution, such as GC, may be extended to multivariate functions in the following straightforward manner. Suppose</p><p>* ) denotes convolution over R n , and k X and k Y are square integrable functions. For ease of notation, let the kernels k X and k Y be normalized such that</p><p>As a proof, we define two processes Z j , where j can represent either X or Y , as the convolution of the kernel k j with a white noise field W, as follows:</p><p>(2)</p><p>It is straightforward to show that the localization functions we have defined are exactly the covariance functions for these two processes, L ij (d) = Cov(Z i (s), Z j (t)), with i, j = X, Y , locations s, t &#8712; R n , and distance d = st . Thus, {L XX , L Y Y , L XY } is a multivariate covariance function and, hence, a multivariate, positive semidefinite function <ref type="bibr">(Genton and Kleiber, 2015)</ref>. For localization functions created through kernel convolution, the localization radii are related to the kernel radii. Suppose the kernels k X and k Y have radii c X and c Y , i.e., k j (d) = 0 for all d &gt; c j . The convolution of two kernels is zero at distances greater than the sum of the kernel radii. Thus, the implied within-component localization radii are R jj = 2c j , for processes j = X, Y . Furthermore, the implied cross-localization radius is the sum of the two kernel radii R XY = c X +c Y . Equivalently, the cross-localization radius is the average of the two within-component localization radii,</p><p>, which is how we will write it going forward. Interestingly, this is exactly the cross-localization radius used in <ref type="bibr">Frolov et al. (2016)</ref> and <ref type="bibr">Smith et al. (2018)</ref>.</p><p>Unlike within-component localization functions, crosslocalization functions created through kernel convolution will always have L XY (0) &lt; 1 whenever k X &#8801; k Y . The maximum allowable cross-localization weight factor (&#946; := L XY (0)) is exactly the value produced through the convolution, i.e., &#946; max = [k X * k Y ](0). Smaller crosslocalization weight factors also lead to positive semidefinite <ref type="bibr">(Roh et al., 2015)</ref>. To aid in comparisons to other cross-localization functions, we redefine kernel-based crosslocalization functions as follows:</p><p>with &#946; &#8804; &#946; max . In this way</p><p>which is consistent with our definition of the crosslocalization weight factor in the previous section. We will experiment with the impact of varying &#946; but must always ensure &#946; &#8804; &#946; max to maintain positive semidefiniteness.</p><p>For most kernels, closed-form analytic expressions for the convolutions above are not available. In the following two sections, we present two cases where a closed form is available. The kernels used in these two cases are the tent function (GC) and the indicator function (Bolin-Wallin).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Multivariate Gaspari-Cohn</head><p>The standard univariate GC localization function is constructed through convolution over R 3 of the kernel, k(r) &#8733; 1 -r c + with itself. Here we define r = r with r &#8712; R 3 and z + = max{z, 0}. The kernel has radius c and is normalized so that</p><p>As discussed in the previous section, the localization radius, R, is related to the kernel radius through R = 2c. We develop a multivariate extension of this function through convolutions with two kernels. The kernels are as follows:</p><p>The resulting within-component localization functions</p><p>are exactly equal to GC; see Eq. (4.10) in <ref type="bibr">Gaspari and Cohn (1999)</ref>. The formula for the cross-localization function</p><p>Recalling from the previous section that the maximum cross-localization weight factor is</p><p>, where, for convenience, we define</p><p>as a ratio of the within-component localization radii. As with all localization functions created through kernel convolution, the crosslocalization radius is the average of the within-component localization radii,</p><p>). An example multivariate GC function with R XX = 45, R Y Y = 15, and &#946; = &#946; max is shown in Fig. <ref type="figure">1</ref>. The multivariate GC localization function for three or more model components is discussed in Appendix A3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Multivariate Bolin-Wallin</head><p>We derive our second multivariate localization function through convolution of normalized indicator functions over a sphere in R 3 . As in the previous section, the kernels are supported on spheres of radii c X and c Y as follows:</p><p>where I c j (r) is an indicator function, which is 1 if r &#8804; c j and 0 otherwise. The resulting within-component localization function with localization radius R jj = 2c j is as follows:</p><p>This is commonly referred to as the spherical covariance function. The label BW references Bolin and Wallin, who performed the convolutions necessary to create the associated cross-localization function in a work aimed at a different application of covariance functions (Bolin and Wallin, 2016). While Bolin and Wallin never developed multivariate covariance (or, in our case, localization) functions, the algebra is the same. We present only the localization functions that result from the convolution over R 3 , though similar functions for R 2 and R n are available in <ref type="bibr">Bolin and Wallin (2016)</ref>. Note that there is a typo in Bolin and Wallin ( <ref type="formula">2016</ref>), which has been corrected below. Let c X &gt; c Y be kernel radii, and then the resulting crosslocalization function is as follows:</p><p>Here V 3 (r, x) denotes the volume of the spherical cap with triangular height x of a sphere with radius r, which is given by the following:</p><p>As with multivariate GC, it is convenient to define a ratio of within-component localization radii by</p><p>Then we can write the maximum cross-localization weight factor as</p><p>because it is created through kernel convolution. An example multivariate BW function with R XX = 45, R Y Y = 15, and &#946; = &#946; max is shown in Fig. <ref type="figure">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5">Wendland-Gneiting functions</head><p>We compare the two functions of the preceding sections to the Wendland-Gneiting family of multivariate, compactly supported, positive semidefinite functions. This family is  </p><p>not generated through kernel convolution but rather through Mont&#233;e and Descente operators <ref type="bibr">(Gneiting, 2002)</ref>. The simplest univariate function in this family is the Askey function, which is given by the following:</p><p>with the shape parameter &#957; and localization radius R. Other functions in this family are called Wendland functions. Several examples of univariate Wendland functions are displayed in Table <ref type="table">1</ref>. <ref type="bibr">Porcu et al. (2013)</ref> developed a multivariate version of the Askey function, where the exponent in Eq. ( <ref type="formula">9</ref>) can be different for each process while the localization radius R is constant across all processes. <ref type="bibr">Roh et al. (2015)</ref> found that this multivariate localization function outperforms common univariate localization methods when assimilating observations into the bivariate Lorenz 96 model. <ref type="bibr">Daley et al. (2015)</ref> extended the work of <ref type="bibr">Porcu et al. (2013)</ref> and constructed a multivariate version of general Wendland-Gneiting functions that allow for different localization radii for different processes. The multivariate Askey function from <ref type="bibr">Daley et al. (2015)</ref> has the following form:</p><p>The general form for multivariate Wendland functions is as follows:</p><p>where &#968; is defined as follows:</p><p>with B as the beta function, B(x, y) = 1 0 t x-1 (1 -t) y-1 dt. The parameters &#957; and {&#947; ij } are related to the shape of the localization functions and are necessary to guarantee positive semidefiniteness in a given dimension. The parameter k determines the differentiability of the Wendland functions at lag zero <ref type="bibr">(Gneiting, 2002)</ref>. Note that the Askey function in Eq. ( <ref type="formula">10</ref>) is a special case of the Wendland function (Eq. 11), which corresponds to the case k = 0. <ref type="bibr">Daley et al. (2015)</ref> gave sufficient conditions on the parameters &#957;, k, R ij , &#947; ij , and &#946; to guarantee that Eq. ( <ref type="formula">11</ref>) and, hence, Eq. ( <ref type="formula">10</ref>) is positive semidefinite. In particular, for the two processes of X and Y , Eq. ( <ref type="formula">11</ref>) is positive semidefinite on </p><p>Going forward, we consider the multivariate Askey function (Eq. 10) and the multivariate Wendland function with k = 1 in Eq. ( <ref type="formula">11</ref>). Note that, with both of these functions, the crosslocalization radius depends only on the smallest localization radius. In fact, we choose R XY = min{R XX , R Y Y }, although smaller values for R XY also produce positive semidefinite functions. Thus, for given R XX and R Y Y , the crosslocalization radius for Askey and Wendland functions is always smaller than the cross-localization radius for GC and BW. With the choice R XY = min{R XX , R Y Y }, we see that  <ref type="table">2</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Experiments</head><p>In this section, we investigate the performance of a data assimilation scheme using each of the four multivariate localization functions presented in Sect. 2. We choose a setup which isolates the impact of the cross-localization functions and relate the filter performance to important crosslocalization shape parameters. As a baseline for comparison, we also test two simple approaches to localization for coupled DA. The first method uses a single localization function and radius to localize all within-and crosscomponent blocks of the background error covariance matrix, i.e., L XX &#8801; L Y Y &#8801; L XY . We call this approach univariate localization. In systems with very different optimal localization radii, this type of univariate localization is likely to perform poorly; however, it does provide a useful comparison point. The second approach uses different localization functions for each process and then zeroes out all cross-correlations between processes, i.e., L XX = L Y Y , and L XY &#8801; 0. We call this approach weakly coupled localization as it leads to a weakly coupled data assimilation scheme. All of the experiments are run with the bivariate Lorenz 96 model, which is described below <ref type="bibr">(Lorenz, 1996)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Bivariate Lorenz model</head><p>The bivariate Lorenz 96 model is a conceptual model of atmospheric processes and is comprised of two coupled processes with distinct temporal and spatial scales. The "small" process can be thought of as rapidly varying small-scale convective fluctuations, while the "large" process can be thought of as smooth large-scale waves. The model is periodic in the spatial domain, as a process on a fixed latitude band would be.</p><p>The large process, X, has K distinct variables, X k for k = 1, . . ., K. The small process, Y , is divided into K sectors, with each sector corresponding to one large variable X k . There are J small process variables in each sector, for a total of J K distinct Y variables, Y j,k for j = 1, . . ., J, k = 1, . . ., K. The Y variables, arranged in order, are Y 1,1 , Y 2,1 , . . ., Y J,1 , Y 1,2 , Y 2,2 , . . ., Y J,K . Succinctly, Y j -J,k = Y j,k-1 and Y j +J,k = Y j,k+1 , with periodicity conditions Y j,k+K = Y j,k-K = Y j,k for all j, k. The X process is also periodic with X k+K = X k-K = X k for all k.</p><p>We represent the variables on a circle where the arc length between neighboring Y variables is 1. Equivalently, the radius of the circle is r = J K 2&#960; . Variable Y j,k is located at (r cos(&#952; j,k ), r sin(&#952; j,k )), where &#952; j,k = 2&#960; J K (J (k -1) + j ). We choose to place the variable X k , located at (r cos(&#966; k ), r sin(&#966; k )), in the middle of the sector whose variables are coupled to it, e.g., if J = 10 then X k is halfway between Y 5,k and Y 6,k and &#966; k = 2&#960; 10K (10(k -1) + 5.5). The placement of these variables is illustrated in Fig. <ref type="figure">2</ref>. The chord distance between any two variables is 2r sin &#952; 2 , where &#952; is the angle increment, e.g., the angle increment between</p><p>The governing equations are as follows:</p><p>We follow <ref type="bibr">Lorenz (1996)</ref> and let K = 36 and J = 10, so there are 36 sectors and 10 times more small variables than large variables. We let a = 10 and b = 10, indicating that convective scales fluctuate 10 times faster than the larger scales, while their amplitude is around one-tenth as large.</p><p>For the forcing, we choose F = 10, which <ref type="bibr">Lorenz (1996)</ref> found to be sufficient to make both scales behave chaotically. All simulations are performed using an adaptive fourth-order Runge-Kutta method with relative error tolerance 10 -3 and absolute error tolerance 10 -6 <ref type="bibr">(Dormand and Prince, 1980;</ref><ref type="bibr">Shampine and Reichelt, 1997)</ref>. The solutions are output in each assimilation cycle. Unless otherwise specified, the assimilation cycles are separated by a time interval of 0.005 model time units (MTUs), which <ref type="bibr">Lorenz (1996)</ref> found to be similar to 36 min in more realistic settings. This timescale is 10 times shorter than the timescale typically used in the univariate Lorenz 96 model. The factor of 10 is consistent with the understanding that the small process evolves 10 times faster than the large process, where the large process is akin  to the univariate Lorenz 96 model. In choosing the coupling strength, we follow <ref type="bibr">Roh et al. (2015)</ref> and set h = 2, which is twice as strong as the coupling used by Lorenz. Varying the coupling strength h across values { 1 2 , 1, 4} changes the magnitude of the analysis errors but does not change the relative performance of different localization functions in our experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Assimilation scheme</head><p>In our experiments, we use the stochastic ensemble Kalman filter (EnKF; <ref type="bibr">Evensen, 1994;</ref><ref type="bibr">Houtekamer and Mitchell, 1998;</ref><ref type="bibr">Burgers et al., 1998)</ref>. The EnKF update formula for a single ensemble member is as follows:</p><p>where x a is the analysis vector, x b is the background state vector, y is the observation, each element of &#951; is a random draw from the probability distribution of observation errors, and H is the linear observation operator. The Kalman gain matrix K is as follows:</p><p>where P b is the background error covariance matrix, and R is the observation error covariance matrix. The background covariance matrix is approximated by a sample covariance matrix from an ensemble, i.e., x i for i = 1, . . ., N e , where N e is the ensemble size. In this experiment, we use the adaptive inflation scheme of El Gharamti (2018) and inflate each prior ensemble member through the following:</p><p>where is a diagonal matrix with each element on the diagonal containing the inflation factor for one variable, and x b is the background ensemble mean. We then use x b &#955; in place of x b in Eq. ( <ref type="formula">16</ref>) and in the estimation of P b . Note that estimating P b with the inflated ensemble is equivalent to estimating it with the original ensemble and then multiplying by 1/2 on the left and right, 1/2 P b 1/2 . The Bayesian approach to adaptive inflation in El Gharamti (2018) uses observations to update the inflation distribution associated with each state variable. The inflation prior has an inverse gamma distribution, with shape and rate parameters determined from the mode and prior inflation variance. In this study, we initialize the inflation factors with = 1.1I, where I is the identity matrix. The localization matrix L is incorporated into the estimate of the background covariance matrix through a Schur product, as in Eq. ( <ref type="formula">1</ref>).</p><p>We use N e = 20 ensemble members, except where otherwise noted. The small ensemble size is chosen to accentuate the spurious correlations and, hence, the need for effective localization functions. We run each DA scheme for 3000 analysis cycles, discarding the first 1000 cycles and reporting statistics from the remaining 2000 cycles. Each experiment is repeated 50 times with independent reference states, which serve as the "truth" in our experiments. We generate "observations" by adding independent Gaussian noise to the reference state.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Experimental design</head><p>The experiments described in this section compare the performance of each of the four multivariate localization functions presented in Sect. 2. Performance is measured through the root mean square (RMS) distance between the analysis mean and the true state, which we refer to as the root mean square error (RMSE). We present scaled analysis errors to aid in comparison between the large and small components. RMS errors are divided by the climatological standard deviation for each process. To standardize the comparison of the different shapes, we use the same withincomponent localization radii for all multivariate functions. We also investigate the performance of univariate and weakly coupled localization functions. The univariate localization functions are chosen to be equal to the within-component localization function for Y , i.e., L &#8801; L Y Y . The withincomponent weakly coupled localization functions are equal to the within-component multivariate localization functions. However, the weakly coupled cross-localization functions are identically zero. The free localization function parameters are chosen to balance performance of the univariate, multivariate, and weakly coupled forms of each localization function. We estimate these parameters through a process which we describe in Appendix B.</p><p>We test the performance of multivariate localization functions using three different observation operators. First, we observe all small variables and none of the large variables. In this setup, we isolate the impact the of the cross-localization function, which allows us to make conjectures about important cross-localization shape parameters. Next, we flip the setup and observe all large variables and none of the small variables. We compare and contrast our findings with those from the previous case. Finally, we observe both processes and observe behavior reminiscent of both of the previous cases. The experimental setups are grouped by observation operator below. The source code for all experiments is publicly available (see the code availability section).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.1">Observe only the small process</head><p>To isolate the impact of the cross-localization functions, we fully observe the small process and do not observe the large process at all. In this configuration, analysis increments of the large process can be fully attributed to the cross-domain assimilation of observations of the small process. The treatment of cross-domain background error covariances plays a crucial role in the analysis of the large process, so we expect that changes in the cross-localization function will lead to changes in the large process analyses. All observations are assimilated every 0.005 MTU. We use an observation error variance of &#963; 2 Y = 0.005 both in the generation of synthetic observations from the reference state and in the assimilation scheme. The observation error variance is chosen to be about 5 % of the climatological variance of the Y process. We also run the experiment with &#963; 2 Y = 0.02, or about 20 % of the climatological variance, and find that the analyses are degraded, but the relative performance of the different localization functions is the same.</p><p>The localization parameters we use in this experiment are given in Table <ref type="table">3</ref>. For all functions, we use the maximum allowable cross-localization weight factor, &#946; = &#946; max . In estimating the optimal cross-localization weight factor, we find that, since the only updates to X are through observations of Y , smaller cross-localization weight factors lead to degraded performance (Appendix B).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.2">Observe only the large process</head><p>Next, we investigate the impact of the different localization functions when we fully observe the large process and do not observe the small process at all. The large process fluctuates about 10 times more slowly than the small process, so we use an assimilation cycle length that is 10 times longer than the one in the previous configuration. All observations are assimilated every 0.05 MTU. We use an observation error variance of &#963; 2 X = 0.28 both in the generation of synthetic observations from the reference state and in the assimilation scheme. The observation error variance is chosen to be about 5 % of the climatological variance of the X process. We also run the experiment with &#963; 2 X = 1.1, or about 20 % of climatological variance, and find that the X analyses are degraded, but the relative performance of the different localization functions is the same. The localization parameters we use in this experiment are given in Table <ref type="table">B1</ref>. We find that the analysis errors are similar with all values of &#946;. For consistency with the previous experiment, we use the maximum allowable crosslocalization weight factor, &#946; = &#946; max .</p><p>Table <ref type="table">3</ref>. Localization parameters for the experiment observing only the small process. Note that weakly coupled parameters are not shown in this table because they are exactly equal to the multivariate parameters, except with &#946; = 0.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Function name Univariate parameters Multivariate parameters</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.3">Observe both processes</head><p>Finally, we observe both processes and note the impact of the different localization functions. In this configuration, we observe 75 % of the variables in each process, with the observation locations chosen randomly for each trial. All observations are assimilated every 0.005 MTU, in line with the analysis cycle length for the observation of the small process only. We use observation error variances of &#963; 2 Y = 0.01 and &#963; 2 X = 0.57 in the generation of observations and in the assimilation scheme. The observation error variance is chosen to be about 10 % of the climatological variance of each process. We also run the experiment with &#963; 2 Y = 0.02 and &#963; 2 X = 1.1, or about 20 % of climatological variance, and find that the performance is similar. The localization parameters we use in this experiment are given in Table <ref type="table">B2</ref>. We find that the analysis errors grow with increasing &#946;. Nonetheless, to distinguish between multivariate localization, which allows for cross-domain information transfer, and weakly coupled localization, which does not, we use &#946; = &#946; max for all multivariate functions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.1">Observe only the small process</head><p>Figure <ref type="figure">3</ref> shows the distribution of analysis errors for the configuration described in Sect. 3.3.1. With weakly coupled localization functions, no information is shared in the update step between the observed Y process and the unobserved X process. This leads to no updates of the X variables and, eventually, to catastrophic filter divergence. In principle, the system might be able to synchronize the unobserved (large) process through dynamical couplings with the observed (small) process, but in our setup, this does not happen. Hence, weakly coupled localization functions are not included in the figure. The analysis error distributions for the observed Y process are similar for all functions except multivariate Wendland. For the unobserved X process, the analysis errors are comparable across all of the univariate localization functions. This is consistent with the fact that all of the univariate localization functions have similar shapes, as seen in Fig. <ref type="figure">1b</ref>. The multivariate localization functions, on the other hand, show great diversity of performance. The Wendland function leads to significantly worse performance with the multivariate function when compared to the univariate functions. BW and Askey functions perform similarly for both the multivariate and univariate cases. Out of all of the localization functions we consider, the best performance is achieved with multivariate GC.</p><p>To understand the improved performance with multivariate GC, we consider two different shape parameters. Recall from Sect. 3.3.1 that smaller cross-localization weights led to worse performance when holding all other localization parameters fixed. Extending this finding, we hypothesize that functions with a larger &#946; max will allow for more information to propagate across model domains, thereby improving performance in this setup. With the chosen localization parameters, the multivariate Askey function has the largest crosslocalization weight factor with &#946; max &#8776; 0.46, followed by GC with &#946; max &#8776; 0.38. A visual representation of the crosslocalization weight factor is shown as the height of the crosslocalization function at zero in Fig. <ref type="figure">1c</ref>. The shape of each cross-localization function varies not only in its height at zero but also in its radius and smoothness near zero. For example, while the Askey cross-localization function peaks higher than GC, GC is generally smoother near zero and has a larger cross-localization radius. All of these differences in shape impact how much information propagates across model domains. Based on its height and width, we hypothesize that GC allows for sufficient cross-domain information propagation at both small and long distances, and this is why multivariate GC outperforms all other functions in this experiment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.2">Observe only the large process</head><p>When we observe only the large process (as described in Sect. 3.3.2), we find that all localization functions lead to very similar performance. In this case, the shape of the localization function is not important. Rather, the dynamics of the bivariate Lorenz model are driving the behavior. In this configuration, the true background error cross-correlations are very small (less than 0.1 even at small distances). The Y variables are restored towards h b X in their sector (Eq. 15). Thus, even when the assimilation does not update the Y variables, we expect to recover the mean of the Y process. Based on climatology, we find that the conditional mean of Y j,k , given X k = x, is E[Y j,k |x] &#8776; 0.0559x. The median root mean square difference between Y and its conditional mean  <ref type="bibr">(Hoffmann, 2015)</ref>. Analysis errors are calculated as RMS deviations from the truth and are scaled by the climatological standard deviation of the associated process. All four univariate localization functions perform similarly, while there is a greater range in performance for the multivariate versions of these functions. Multivariate Gaspari-Cohn shows improvement over its univariate counterparts. Univariate and multivariate Bolin-Wallin and Askey functions appear to perform similarly. For Wendland, the multivariate function performs significantly worse than the univariate function. is 0.294. Our results show that the median RMSE in the Y process ranges from 0.294 to 0.297. Thus, the filter does not improve upon a simple linear conditional mean prediction, which is perhaps unsurprising given the small magnitude of error cross-correlations. Figure <ref type="figure">4</ref> shows the distribution of analysis errors for univariate, weakly coupled, and multivariate GC. The distributions for other functions are nearly identical and, hence, not shown.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.3">Observe both processes</head><p>When we observe both processes, the precise shape of the localization function appears to have little impact. We do see differences between univariate, weakly coupled, and multivariate localization functions. Figure <ref type="figure">4</ref> shows analysis error distributions for the three different versions of GC, which are broadly representative of the behavior seen in other functions as well. This configuration is quite unstable. About 80 % of the trials with weakly coupled localization functions lead to catastrophic filter divergence. Trials with univariate and multivariate localization diverge less often, but still diverge about 20 % of the time. Figure <ref type="figure">4</ref> shows results from only the trials (out of 50 total) which did not diverge. Weakly coupled localization appears to lead to the best performance, when the filter does not diverge. There is some variation in the results across the different localization functions. In particular, multivariate Askey appears to lead to better performance than weakly coupled Askey, but this may be attributable to the issues with stability. Catastrophic filter divergence is a welldocumented but poorly understood phenomenon <ref type="bibr">(Gottwald and Majda, 2013;</ref><ref type="bibr">Houtekamer and Zhang, 2016</ref>, their Appendix A). The mechanism is understood in highly idealized models <ref type="bibr">(Kelly et al., 2015)</ref>, but the dynamics of instability in models as simple as the bivariate Lorenz 96 model remains unclear and is outside the scope of the present investigation.</p><p>The complicated story with the weakly coupled schemes indicates that, in this configuration, filter performance is highly sensitive to the treatment of cross-domain background error covariances. The small ensemble size, combined with small true forecast error cross-correlations, can lead to spuriously large estimates of background error cross-covariances. Meanwhile, we have nearly complete observations of both processes, so within-component updates are likely quite good. Thus, zeroing out the cross terms, as in weakly coupled schemes, may improve state estimates. On the other hand, inclusion of some cross-domain terms appears to be important for stability.  <ref type="bibr">(Hoffmann, 2015)</ref>. Analysis errors are calculated as RMS deviations from the truth and are scaled by the climatological standard deviation of the associated process. (a, c) Results from the experiment where we observe only the large process. All functions perform similarly. (b, d) Results from the experiment where we observe both processes. The weakly coupled localization functions appear to lead to the best performance but are highly unstable.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions</head><p>In this work, we developed a multivariate extension of the oft-used GC localization function, where the withincomponent localization functions are standard GC with different localization radii, while the cross-localization function is newly constructed to ensure that the resulting localization matrix is positive semidefinite. A positive semidefinite localization matrix guarantees, through the Schur product theorem, that the localized estimate of the background error covariance matrix is positive semidefinite <ref type="bibr">(Horn and Johnson, 2012, theorem 7.5.3)</ref>. We compared multivariate GC to three other multivariate localization functions (including one other newly presented multivariate function), and their univariate and weakly coupled counterparts. We found that the performance of different localization functions is highly dependent on the observation operator. When we observed only the large process, all localization functions performed similarly. In an experiment where we observed both processes, weakly coupled localization led to the smallest analysis error. When we observed only the small process, multivariate GC led to better performance than any of the other localization functions we considered. We hypothesized that the shape of the GC cross-localization function allows for larger crossdomain assimilation than the other functions. There is still an outstanding question of how multivariate GC will perform in other, perhaps more realistic, systems.</p><p>We found that choosing an appropriate cross-localization weight factor, &#946;, is crucial to the performance of the multivariate localization functions. This parameter determines the amount of information which is allowed to propagate between co-located variables in different model components. We found that this parameter should be as large as possible when observing only the small process. By contrast, the parameter should be small or even zero when both processes are well observed. This is consistent with other studies which have shown the value in deflating cross-domain updates between non-interacting processes <ref type="bibr">(Lu et al., 2015;</ref><ref type="bibr">Yoshida and Kalnay, 2018)</ref>.</p><p>A natural application of this work is localization in a coupled atmosphere-ocean model. The bivariate Lorenz 96 model has a linear relationship between the large and small scales. Hence, the results presented here are relevant to linear coupling in climate models, e.g., the sensible heat exchange between ocean and atmosphere which is approximately linearly proportional to the temperature difference. Multivariate GC allows for within-component covariances to be localized with GC exactly as they would be in an uncoupled setting, using the optimal localization length scale for each component <ref type="bibr">(Ying et al., 2018)</ref>. The cross-localization length scale for GC is the average of the two within-component localization radii, which is the same as the cross-localization radius proposed in <ref type="bibr">Frolov et al. (2016)</ref>. We hypothesize that the crosslocalization radius is important in determining filter performance. However, the functions considered here did not allow for a thorough investigation of the optimal cross-localization radius, which is an important area for future research.</p><p>where &#946; max = 5 2 &#954; -3 -3 2 &#954; -5 and &#946; &#8804; &#946; max . Note that, when we take c X &#8594; c Y , which implies &#954; &#8594; 1, this multivariate function converges to the standard univariate GC function, as we would expect.</p><p>The second case to consider is c X &gt; 2c Y . Again, let c X = &#954; 2 c Y . In this case, the cross-localization function is as follows: where, as in the above case, &#946; max = 5 2 &#954; -3 -3 2 &#954; -5 and &#946; &#8804; &#946; max . Note that when c X = 2c Y , Eq. ( <ref type="formula">A1</ref>) is equal to Eq. (A2).  <ref type="bibr">Gaspari and Cohn (1999)</ref> provides a framework for evaluating the necessary convolutions. It is shown that for radially symmetric functions k j (r) = k 0 j ( r ) compactly supported on a sphere of radius c j , j = X, Y , with c Y &#8804; c X the convolution over R 3 given by the following:</p><p>can equivalently be written as follows:   </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Nonlin. Processes Geophys., 28, 565-583, 2021 https://doi.org/10.5194/npg-28-565-2021</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>https://doi.org/10.5194/npg-28-565-2021Nonlin. Processes Geophys., 28, 565-583, 2021</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_2"><p>Nonlin. Processes Geophys., 28, 565-583, 2021 https://doi.org/10.5194/npg-28-565-2021 Z. Stanley et al.: Multivariate localization functions</p></note>
		</body>
		</text>
</TEI>
