<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Iterative Ensemble Smoothers for Data Assimilation in Coupled Nonlinear Multiscale Models</title></titleStmt>
			<publicationStmt>
				<publisher>AMS</publisher>
				<date>06/01/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10537540</idno>
					<idno type="doi">10.1175/MWR-D-23-0239.1</idno>
					<title level='j'>Monthly Weather Review</title>
<idno>0027-0644</idno>
<biblScope unit="volume">152</biblScope>
<biblScope unit="issue">6</biblScope>					

					<author>Geir Evensen</author><author>Femke C Vossepoel</author><author>Peter Jan van_Leeuwen</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<title>Abstract</title> <p>This paper identifies and explains particular differences and properties of adjoint-free iterative ensemble methods initially developed for parameter estimation in petroleum models. The aim is to demonstrate the methods’ potential for sequential data assimilation in coupled and multiscale unstable dynamical systems. For this study, we have introduced a new nonlinear and coupled multiscale model based on two Kuramoto–Sivashinsky equations operating on different scales where a coupling term relaxes the two model variables toward each other. This model provides a convenient testbed for studying data assimilation in highly nonlinear and coupled multiscale systems. We show that the model coupling leads to cross covariance between the two models’ variables, allowing for a combined update of both models. The measurements of one model’s variable will also influence the other and contribute to a more consistent estimate. Second, the new model allows us to examine the properties of iterative ensemble smoothers and assimilation updates over finite-length assimilation windows. We discuss the impact of varying the assimilation windows’ length relative to the model’s predictability time scale. Furthermore, we show that iterative ensemble smoothers significantly improve the solution’s accuracy compared to the standard ensemble Kalman filter update. Results and discussion provide an enhanced understanding of the ensemble methods’ potential implementation and use in operational weather- and climate-prediction systems.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Numerical weather prediction at national and international weather centers uses different data assimilation approaches to initialize ocean and atmosphere models. de <ref type="bibr">Rosnay et al. (2022)</ref> and <ref type="bibr">Laloyaux et al. (2016)</ref> describe the data assimilation system at ECMWF, which combines the assimilation of ocean observations using a 3DVAR scheme with the assimilation of atmospheric observations using a 4DVAR method. The updates are uncoupled during inner loops but weakly coupled during the short forecast connecting successive assimilation windows. While this approach is less optimal than providing a fully coupled update, it is a practical intermediate step when using variational data assimilation methods with different time windows. The reason for uncoupled inner loops is a fundamental and unsolved problem related to the different time scales of the ocean and atmosphere. An optimal time window for an ocean model might be a week, while it could be a day for the atmosphere at a similar resolution. Hence, one must clarify the optimal window length in a 4DVAR system containing both model components. Furthermore, data latency is a problem because ocean observations tend to have much larger latency than atmospheric observations, mainly for historical reasons. The data latency is one of the reasons that GFDL uses a two-step ensemble-based filtering algorithm applied to a fully coupled climate model <ref type="bibr">(Chang et al. 2013)</ref>, and NCEP assimilates data into partially coupled Earth system components using a 3DVAR-based scheme to incorporate land surface, atmosphere, ocean, and sea ice observations <ref type="bibr">(Saha et al. 2010)</ref>. <ref type="bibr">Penny et al. (2017)</ref> provide a comprehensive overview of data assimilation in coupled ocean-atmosphere models. While most operational approaches have a weak coupling between the two subsystems and so-called outer-loop coupling, as mentioned above, more conceptual studies investigate the effect of a strong coupling. For example, <ref type="bibr">Luo and Hoteit (2014)</ref> use a multiscale Lorenz-96 model to evaluate the performance of a state-space estimation strategy that assimilates data into each subsystem and correlated quantities from other coupled subsystems. In a slightly more realistic setting, <ref type="bibr">Han et al. (2013)</ref> couple a Lorenz-63 model to a pycnocline ocean model. <ref type="bibr">Penny et al. (2019)</ref> compare several data assimilation algorithms on a single quasigeostrophic model for a coupled ocean and atmosphere. <ref type="bibr">Tondeur et al. (2020)</ref> used the same model to study coupled data assimilation with the ensemble Kalman filter (EnKF) for a system consisting of a slow ocean coupled to a fast atmosphere. This study illustrates the mechanisms behind information propagation between the system's two components and concludes that cross-component effects are strong from the slow to the fast scale.</p><p>We can view the current study as an extension of <ref type="bibr">Tondeur et al. (2020)</ref>, in which, on the one hand, we will study the information flow between two coupled systems more closely and, on the other hand, use more sophisticated data assimilation techniques. To this end, we will use a simplified representation of two coupled Earth system components to explore the information propagation between model components with predominantly large and small spatial scales. The coupled model uses the <ref type="bibr">Kuramoto-Sivashinsky (KS)</ref> equations to describe interactions between two systems: one with a longer spatial scale (typically the atmosphere) and one with a shorter spatial scale (typically the ocean). In addition, we evaluate the efficiency of different adjoint-free data assimilation algorithms in coupled data assimilation.</p><p>We describe the model in the following section. After that, we introduce and discuss the data assimilation methods before we run multiple experiments to examine the impact and value of coupled data assimilation and various sensitivities to window lengths, number of iterations, etc.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Coupled multiscale Kuramoto-Sivashinsky model</head><p>We will now describe the nonlinear multiscale model used to study the performance of our data assimilation methods for the coupled data assimilation problem. We need a model with specific properties, including unstable and near-chaotic dynamics with nonlinear saturation of the linear instabilities, similar to oceanic and atmospheric behavior. For computational reasons and ease of interpretation, we search for a onedimensional and univariate equation for each component of the coupled system. Finally, we require a model that can include different spatial and temporal scales, as we wish to study the assimilation of observations in coupled models with different scales. The model system should resemble the behavior of coupled climate models, where the ocean and atmospheric components have vastly different spatial and temporal scales.</p><p>Our choice landed on a variant of the KS equation, from which we derived two model configurations operating on different spatial scales and with a coupling term connecting the two models. In the current study, we only include different spatial scales between the model components, but it is possible to alter the temporal scale of the models in future studies.</p><p>The KS equation is a fourth-order partial differential equation initially used to describe diffusive thermal instabilities in laminar flame fronts <ref type="bibr">(Kuramoto 1978;</ref><ref type="bibr">Sivashinsky 1977</ref><ref type="bibr">Sivashinsky , 1980))</ref>. The model can also describe the dynamics of fluid films on inclines <ref type="bibr">(Shlang and Sivashinsky 1982;</ref><ref type="bibr">Tilley et al. 1994)</ref>, flow in pipes <ref type="bibr">(Chang 1986)</ref>, and dynamics of chemical reactions. <ref type="bibr">Kuramoto (1978)</ref> described how the coupling of an oscillation and a spatial inhomogeneity could produce spatiotemporal chaos and how one can obtain a balance between phase instability and amplitude instability. The KS equation is</p><p>which reduces to the inviscid Burgers equation if we ignore the diffusion terms on the right-hand side. The nonlinear advection term uu x transfers energy between large and small scales and creates shocks. The harmonic diffusion term has a negative sign and acts to enhance any small-scale feature in the model solution. In contrast, the biharmonic diffusion term with a negative sign is a small-scale selective positive diffusion, thereby controlling any growing instabilities in the model. <ref type="bibr">Protas et al. (2004)</ref> introduced the KS equation for an evaluation of 4DVAR data assimilation methods. <ref type="bibr">Jardak et al. (2010)</ref> and <ref type="bibr">Chorin and Krause (2004)</ref> evaluated the performance of Bayesian filters using the KS equation, <ref type="bibr">Azouani and Titi (2014)</ref> and <ref type="bibr">Lunasin and Titi (2017)</ref> used the equations as a testbed for the so-called continuous data assimilation technique, and <ref type="bibr">Waller et al. (2014)</ref> applied it for correlated observation error estimation.</p><p>We refer to <ref type="bibr">Lak (2023)</ref> for a discussion of the numerical implementation and example codes. In short, we have used a Crank-Nicolson/Adams-Bashforth (CNAB) scheme for time stepping the model. The time-stepping method is second order and implicit in time. We discretized space using a Fourier decomposition as this approach renders the inverse matrices in the scheme diagonal, allowing for a highly efficient implementation. Our FORTRAN-90 subroutine for the model integration is available from <ref type="bibr">Evensen (2023b)</ref>.</p><p>In our implementation, we have used two KS models, and we refer to the two model solutions as Atmos and Ocean with the symbols A and O referring to their respective variables. The coupled model equations read</p><p>We couple the two equations through the coupling terms a oa (O 2 A) and v ao (A 2 O) where we used coupling coefficients of 0.003 in both equations in all the following experiments, except for one uncoupled simulation where we set the coefficients to zero. Additionally, we halved the biharmonic damping of the Atmos variable to have more structures in the solutions.</p><p>We defined two different physical lengths for the Ocean and Atmos model domains to introduce different spatial scales in the two models. We used a periodic model domain with 1024 grid points but assumed a physical domain length of 32 for the Atmos domain and 256 for the Ocean domain. Multiplying the time derivative in one of the equations by a number not equal to one would introduce different time scales in the two model components.</p><p>Our multiscale coupled model is suitable for conceptualizing data assimilation in coupled systems. This paper will still refer to the model as KS, although we have modified it from the original KS model in Eq. (1). We can consider it a 1D analog to the 2D Navier-Stokes equations. The model has a unique solution given a set of initial conditions <ref type="bibr">(Tadmor 1986)</ref>, and it has chaotic behavior and a finite-dimensional global attractor <ref type="bibr">(Papageorgiou and Smyrlis 1991)</ref>. The 1D KS model simplifies the analysis and interpretation of results, and we avoid using computationally expensive 2D or 3D models.</p><p>In nonlinear chaotic models, such as weather prediction models, we can define the model's predictability time relative to the initial uncertainty as how long we can integrate the model before the model solution's uncertainty reaches the climatological variability level. Typically, nonlinear unstable ocean and atmosphere models initially experience exponential error growth before the errors saturate at the climatological variability level due to nonlinear effects.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Data assimilation methods</head><p>We have adopted data assimilation methods, notation, and formulations from our recent open-access textbook <ref type="bibr">(Evensen et al. 2022</ref>), where we split the time dimension into a sequence of assimilation time windows as illustrated in Fig. <ref type="figure">1</ref>. The definition of finite-length assimilation windows facilitates assimilation of all data available within the window in one update. It makes it possible to use iterative ensemble smoothers to reduce the model nonlinearity's impact and obtain superior results compared to the standard EnKF solution <ref type="bibr">(Bocquet and Sakov 2013a,b)</ref>. The three methods considered in this paper are versions of the ensemble smoother (ES) that trace back to the ES of van Leeuwen and <ref type="bibr">Evensen (1996)</ref>, the ES with multiple data assimilation (ESMDA) proposed by <ref type="bibr">Emerick and Reynolds (2012)</ref>, and the ensemble randomized maximum likelihood (EnRML) method by <ref type="bibr">Chen and</ref><ref type="bibr">Oliver (2012, 2013)</ref>. We are using the ensemble subspace implementation of <ref type="bibr">Evensen et al. (2019)</ref> for the EnRML, which we in the following refer to as the iterative ES (IES). The term "randomized maximum likelihood" is misleading; as we will see below, we solve for a randomized ensemble of maximum a posteriori solutions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>a. A theoretical basis for ensemble methods</head><p>The ensemble data assimilation methods attempt to sample the posterior probability density function (pdf) f(z|d) for the state z conditional on the measurements d, as defined by Bayes's theorem:</p><p>Here, f [d|g(z)] is the measurement's likelihood function and f(z) is the pdf for the prior estimate of the state z. We have introduced the composite model and measurement operator g in</p><p>which maps the state vector z 2 &lt; n to the predicted measurements y 2 &lt; m . <ref type="bibr">Kitanidis (1995)</ref> and <ref type="bibr">Oliver et al. (1996)</ref> showed that for a Gaussian prior and likelihood, minimization of an ensemble of cost functions, one for each realization j,</p><p>results in an approximate randomized sampling of the posterior Bayesian pdf. This "RML" sampling is exact for a linear model and measurement operator when using a Gaussian prior and likelihood. The significance of the approximation depends on the level of nonlinearity of the model in Eq. ( <ref type="formula">5</ref>). The cost functions are mutually independent for each realization j and use the random samples z f j ; N (z f , C zz ) for the prior state vector and d j ; N (d, C dd ) for the perturbed measurements to represent the uncertainties. The superscript f denotes first guess or forecast. The covariances C zz and C dd are the error covariances for the prior state vector and the measurements. As shown in van Leeuwen (2020), a more consistent interpretation is to perturb g(z j ) instead of the observations, as observations are already perturbed values from the true state, but because of the symmetry in the Gaussian assumption, the results are independent of the perturbation choice.</p><p>Note that to solve the assimilation problem for a particular assimilation window, we must assume that measurements are independent between the different windows and that the model is a first-order Markov process. Additionally, we apply a filtering assumption over time windows by only updating the solution in the current assimilation window and ignoring any updates of the past windows. We can easily relax this filtering assumption, where we do not update previous windows, by applying an ensemble Kalman smoother (EnKS) approach to update the solution in previous windows <ref type="bibr">(Evensen and van Leeuwen 2000)</ref>.</p><p>We derive the ensemble methods by setting the gradient of the cost functions in Eq. ( <ref type="formula">6</ref>) to zero:</p><p>This equation has no explicit solution in the case when the model g(z j ) is nonlinear. However, by introducing a linearization of g(z j ) around the prior estimate z f j , we obtain an ensemble of Kalman filter update equations.</p><p>In the nonlinear case, these equations are only accurate for modest updates when the linear approximation is valid <ref type="bibr">(Bonavita et al. 2018)</ref>. Alternatively, we can use the gradient expression from Eq. ( <ref type="formula">7</ref>) and the Hessian</p><p>in a Gauss-Newton iteration to minimize the ensemble of individual and uncoupled cost functions exactly. This approach corresponds to an ensemble-4DVAR type method where we</p><p>FIG. 1. How we split time into discrete time windows is shown. Within each time window l, we integrate a model K time steps and assimilate the data d l into the state variable z l .</p><p>could use, e.g., incremental 4DVAR methods to minimize each independent cost-function realization. We refer to Evensen (2019) for a complete derivation of the resulting iteration.</p><p>However, in this paper, we will introduce another approximation that allows us to eliminate the model's tangent linear operator. We will represent the model sensitivity by an ensemble-averaged sensitivity through the linear regression or equivalently the least squares approximation:</p><p>where we replace all individual model sensitivities for the different realizations with a common averaged one. This approximation implies that we no longer solve strictly for the minima of the cost functions in Eq. ( <ref type="formula">6</ref>). The final approximation is to represent all covariances using an ensemble of realizations and we define the ensemble matrix Z 2 &lt; n3N as</p><p>Furthermore, we define the matrix P 2 &lt; n3N ,</p><p>where 1 2 &lt; N is a vector with all elements equal to one and I N is the N-dimensional identity matrix. If we multiply an ensemble matrix with the matrix P, this subtracts the mean from the ensemble and divides the result by N 2 1 &#8730; . We can then define the zero-mean and scaled ensembleanomaly matrix as</p><p>and the ensemble covariance is</p><p>where the "overbar" denotes that we have an ensemblecovariance matrix. Correspondingly, we can define an ensemble of perturbed measurements, D 2 &lt; m3N , when given the real measurement vector, d 2 &lt; m , as</p><p>where E 2 &lt; m3N is the centered measurement-perturbation matrix whose columns are sampled from N (0, C dd ) and divided by N 2 1 &#8730; . We define the ensemble covariance matrix for the measurement perturbations as C dd 5 EE T : (16) The ensemble algorithms work both with a full-rank C dd or the ensemble version of the measurement covariance represented by the perturbations in E. Finally, we define the ensemble of model-predicted measurement anomalies Y 5 g(Z f )P, (17) where we have multiplied the model prediction by the matrix P to subtract the ensemble mean and divide the resulting anomalies by N 2 1 &#8730; . FIG. 3. Exp. PRED-coupled: Coupled ensemble-prediction experiment without DA. See description in Fig. 2.</p><p>FIG. 4. The plots show the time evolution of the spatially averaged residuals (RMSE) of the prior ensemble-mean predictions from the PRED-uncoupled and PRED-coupled experiments to the reference solution as the dark lines and the ensemble-predicted RMSE as the light lines. We define the ensemble-predicted RMSE as the RMS of the ensemble-predicted standard deviation where the mean is over the spatial coordinate.</p><p>From Eq. ( <ref type="formula">8</ref>), we then obtain the ensemble Kalman filter or smoother update equation formulated entirely in terms of ensemble matrices <ref type="bibr">(Evensen 2003)</ref> as</p><p>The update becomes a "weakly nonlinear" combination of the prior ensemble anomalies, as W depends on the ensemble of state vectors through the predicted measurements. Still, we will use the notation "linear combination" in the remainder of this paper as this fits well with the form of the update in Eq. ( <ref type="formula">18</ref>).</p><p>Similarly to the EnKF update, we can write the IES equation using the ensemble matrices. Still, a better alternative is to use the EnRML's ensemble subspace variant developed by <ref type="bibr">Evensen et al. (2019)</ref> and <ref type="bibr">Raanes et al. (2019)</ref>, and we have used the particular ensemble subspace implementation from <ref type="bibr">Evensen et al. (2019)</ref> in this work. We refer to these papers and <ref type="bibr">Evensen et al. (2022)</ref> for an in-depth discussion of the subspace algorithm. We also remark that the first step in the IES algorithm becomes identical to the ES update if we choose an iteration steplength of one. Hence, we can use the numerical IES implementation to compute the update in ES and IES (and in the ESMDA approach discussed below).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>b. Ensemble smoother</head><p>When formulated for a single data assimilation window, the update computed by the ensemble smoother is a simple extension of the EnKF update by <ref type="bibr">Evensen (1994)</ref> and <ref type="bibr">Burgers et al. (1998)</ref>. In the standard form of EnKF, one updates the model solution instantly when measurements are available.</p><p>The ES provides a framework where we can update the solution at any or all time steps in an assimilation window, using simultaneously all measurements available within the window. When measurements are only available at the end of the window, the ES solution at the end of this window is identical to the EnKF solution. Furthermore, the ES provides a means for using the EnKF formalism with measurements distributed over the assimilation window and updates computed at the end of the window. Thus, we can view the ES as an extension of the EnKF that adds an update in the space-time domain with measurements distributed in space and time over this domain.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>c. Iterative ensemble smoother</head><p>While the ES computes one linear step in the gradient direction to approximate the cost functions' minima, the IES uses Gauss-Newton iterations to search for the ensemble of cost functions' minima, which define the posterior ensemble. If the cost functions do not contain local minima, the iterative smoothers should converge to the global minimum of each cost-function realization. But note that we are using the ensemble-averaged model sensitivity from Eq. ( <ref type="formula">10</ref>) that slightly changes the gradient Eq. ( <ref type="formula">7</ref>) for each realization. Additionally, we use ensemble covariances, which constrain the posterior ensemble to the ensemble subspace spanned by the prior ensemble of realizations.</p><p>In its standard formulation, IES attempts to estimate the initial conditions of the assimilation window. After updating the initial conditions, we must reintegrate the ensemble of model realizations over the window to compute the updated gradient in each iteration. Realizing that the final converged transition matrix defines the posterior ensemble solution as a linear combination of the prior ensemble realizations and that we do not need the gradient after the last update, we could FIG. 6. Exp. PRED-coupled: Correlations from a coupled ensemble prediction without assimilation of data. See Fig. <ref type="figure">5</ref> for a figure description.</p><p>use this final transition matrix to update the ensemble directly over the whole assimilation window. These two strategies would give identical results in the linear case without model errors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>d. Ensemble smoother with multiple data assimilation</head><p>The ESMDA is an interesting alternative to IES, and it also attempts to approximately sample the posterior from Bayes's theorem. When requiring that a finite number m of coefficients a i satisfy the condition</p><p>we can write Bayes's theorem as</p><p>This rewriting of the likelihood allows for a gradual introduction of the measurements over a predefined number of update steps. For example, we could use two steps and set a 1 5 a 2 5 2, which satisfies the condition in Eq. ( <ref type="formula">19</ref>). In this case, we would have to solve recursively the two updates  </p><p>The impact of raising a Gaussian likelihood to a power 1/a is an inflation of the measurement error variance where the error covariance matrix becomes multiplied by a. Consequently, we can solve each update step using ES and Eq. ( <ref type="formula">18</ref>) but with an inflated measurement error covariance aC dd ' a &#8730; E a &#8730; E T . A word of caution is that it is necessary to resample measurement perturbations a &#8730; E in each step to avoid introducing a bias from using dependent samples. This step is easier to understand if we realize we are perturbing the model-predicted observations and the model changes with each update. During the stepwise updating, we must update the initial state of the assimilation window and then reintegrate the model ensemble over the window to obtain the "prior" ensemble of realizations for the next step.</p><p>The advantage of ESMDA over ES is that while ES computes one large linear update, ESMDA computes a sequence of small linear updates, reducing the impact of the linearization in the ES scheme. We can view ESMDA as an Euler pseudo-time-stepping in state space with a short step size while ES computes the update over one large time step of length one. ESMDA has become one of the most popular data assimilation methods in the petroleum community, and several companies use the method operationally for parameter estimation in large reservoir models (e.g., <ref type="bibr">Zhao et al. 2017;</ref><ref type="bibr">Emerick 2018)</ref>. For linear dynamics and measurements, ESMDA and ES will result in the same solution with increasing ensemble size, independent of the number of ESMDA steps.</p><p>When used in sequential data assimilation, ESMDA updates the initial conditions of the assimilation window through a finite number of ES steps. In each ESMDA step, we must rerun the model ensemble over the assimilation window to obtain the prior for the next update step. However, as for IES, we can choose two strategies for the final step in ESMDA. We can update the initial conditions of the assimilation window and rerun the model ensemble to obtain the solution over the window or update the ensemble directly over the whole assimilation window using the ES algorithm without rerunning the ensemble. Both these approaches are valid and consistent as the ESMDA steps are independent ES steps, and as for ES, we can freely choose the update strategy. A critical remark is that the prior ensemble for the last ESMDA step is the "nearly converged" ensemble from the previous update step. In the experiments below, we will show that, as for ES, it is an advantage to compute directly the final ES update over the whole assimilation window, particularly when the assimilation window becomes long compared to the predictability time of the model.</p><p>There are fundamental differences between IES and ESMDA. While IES is an iterative method, ESMDA computes a predefined number of update steps. IES minimizes the ensemble of initially defined RML cost functions, and ESMDA solves in each step for the minima of a new resampled ensemble of cost functions using a linear ES update. However, although the final ensemble solutions will be different using IES and ESMDA, both methods consistently attempt to sample the posterior pdf, and in the linear case with increasing ensemble size, they converge to the same pdf.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Importance of coupled data assimilation</head><p>We will now demonstrate the importance of computing fully coupled data assimilation updates. By the notation "fully coupled," we understand that measurements of one model component will update all model components through their cross covariances estimated from the ensemble of coupled model realizations. We start in the following section by examining the impact of the coupling term on the KS model and the cross-covariance functions. After that, in section 4b, we will compare coupled and uncoupled data assimilation experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>a. Ensemble predictions and covariances</head><p>We now present two ensemble-prediction experiments to illustrate some properties of the coupled KS models with our choice of model parameters. Figure <ref type="figure">2</ref> shows the results from the uncoupled prediction experiment, where we run an ensemble of 1000 realizations of the two models in Eqs. ( <ref type="formula">2</ref>) and (3) in an uncoupled mode with a oa 5 v ao 5 0. Hence, the two models evolve independently of each other. We generate the initial samples of smooth Gaussian pseudorandom one-dimensional fields using an inverse fast Fourier transform from a Gaussian spectrum in wave space with random phases for each wavenumber <ref type="bibr">(Evensen 2009, see chapter 11)</ref>. Note that the realization's smoothness is not critical as the model quickly develops variability on its inherent scale.</p><p>In Fig. <ref type="figure">3</ref>, we present the coupled reference simulations where a oa 5 v ao 5 0.003. In both the PRED-uncoupled and PRED-coupled experiments, we notice the apparent chaotic behavior of the two model variables and the differences in spatial scales between the Ocean and Atmos variables, as shown in the left panels of the Figs. <ref type="figure">2</ref> and <ref type="figure">3</ref>. While the two models evolve independently in PRED-uncoupled experiment, we observe a significant impact of the coupling in PRED-coupled experiment, where the fine-scale Ocean features follow the large-scale structures in the Atmos solution, and the latter also changes compared to the uncoupled simulation. The ensemble mean converges quickly toward zero for both compartments in both experiments, as shown in the center panels. Very importantly, the standard deviations shown in the right panels remain at the climatological level of around 1.25 and 1.75 for the two model components but appear to saturate at a slightly lower level in the coupled simulation than in the uncoupled one, suggesting that the coupling slightly stabilizes the model. The higher standard deviation in the Atmos variable, as seen in both simulations, is likely a result of the lower biharmonic diffusion in this model and the difference in scales where the biharmonic diffusion has less effect on the Atmos variable's larger scales.</p><p>Figure <ref type="figure">4</ref> shows the time evolution of the root-mean-square residuals between the ensemble mean and the reference solution as the dark lines. The light lines are the root-mean-square of the ensemble standard deviations. We have averaged the residuals over the spatial coordinate. With the current ensemble initialization, the error growth saturates after an integration time of about 10-20 units of time, approximately indicating the model's predictability time.</p><p>In coupled data assimilation, we are interested in the multivariate ensemble correlations between a predicted observation of a model variable with all model variables in all grid points. In Figs. <ref type="figure">5</ref> and <ref type="figure">6</ref>, we show the space-time correlation functions between an observation of either the Ocean or the Atmos variables located in the center of the domain and the respective model components. The correlation function of an observation with the measured variable reflects the spatial scales of the variable. However, we note that there is also a correlation in time, which is essential when computing smoother solutions over an assimilation window. We also notice that the correlations from the uncoupled and coupled models differ slightly due to the change in dynamics introduced by the coupling term.</p><p>In the uncoupled case, the ensemble correlations between the Atmos and Ocean variables are zero (within the sampling errors). Hence, for an uncoupled model system, an Ocean observation will only influence the Ocean variable and similarly for an Atmos observation. However, for the coupled model, we obtain significant cross correlations between the variables, as seen in Fig. <ref type="figure">6</ref>. These cross correlations will cause the assimilation of a measurement of one model variable to update the other model variable. We notice that the Ocean observation cross correlation with other Ocean variables has changed compared to the uncoupled case, even at the observation location, because the ocean takes up larger scales from the Atmos variables due to the coupling.</p><p>The cross correlations indicate that an Ocean observation will influence the Atmos variable predominantly in the past, while an Atmos observation predominantly influences the Ocean variable in the future. Also, with these cross correlations, an Ocean observation will update the Atmos variable on the Atmos variable's spatial and time scale. In contrast, an Atmos observation will update the Ocean variable on multiple scales. That is, the update introduces a prominent Atmos-scale feature and a finer Ocean-scale variability to the Ocean variable. We conclude from studying the spacetime correlations that the Atmos model partly drives the Ocean model, presumably due to the Atmos' larger variability amplitude.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>b. Coupled data assimilation experiments</head><p>We adopt a general naming convention for the paper's experiments, starting with ES, IES, or MDA, indicating the assimilation method used. For the IES and MDA, an integer follows immediately after the method name and defines the number of iterations in IES or steps in ESMDA. IES without a number means we iterate to convergence. Then, the "w05" tells us that the assimilation window length is five units of time, and "d2" means that we have observations available every Dt 5 2. Note that the first measurements become available at time t 5 50, and we assume uncorrelated measurement errors with a standard deviation of 0.3 for all the experiments.</p><p>We have ignored the impact of model errors in this study as the model is highly nonlinear and unstable, and we will have strong error variance growth between assimilation steps that likely will dominate the impact of modest model errors. However, if model errors are substantial, they should be included in the data-assimilation problem as unknown forcing fields. In that case, the methods would update the initial conditions and these forcing fields. <ref type="bibr">Evensen (2019</ref><ref type="bibr">Evensen ( , 2021) )</ref> discussed how to include model errors in iterative ensemble smoothers. The state vector size used in the data assimilation would increase, but depending on how we parameterize the model error, including the model errors might still be feasible. In general, we would argue for including model errors in the data assimilation system, as the weak-constraint data assimilation becomes more stable since observation information can influence the solution at and around the observation time such that the sensitivity to the initial conditions is strongly reduced (see, e.g., <ref type="bibr">Amezcua et al. 2017)</ref>. Including model errors will again result in a more Gaussian assimilation problem. Hence, we expect all methods to perform better in this case, and while the state-vector size increases, the expected number of iterations decreases.</p><p>We have used a 1000-member ensemble in all the following experiments to minimize the impact of sampling errors and to avoid using localization and inflation. In an experiment with localization, it was not possible to visually observe any difference between the global and local analysis. However, in a forthcoming manuscript, we will address localization in experiments with smaller ensemble sizes.</p><p>In the experiments discussed in the current section, we use ESMDA with five steps and assimilation windows having a length of five units of time. We varied the spatial measurement coverage to emphasize the results (see Table <ref type="table">1</ref>). We start by testing the impact of coupled assimilation of only Ocean data in Exp. MDA5-w05-d2-O and only Atmos data in Exp. MDA5-w05-d2-A, and we present the results in Figs. <ref type="figure">7</ref> and <ref type="figure">9</ref>. Here, we jointly update the Ocean and Atmos variables when we assimilate the Ocean or Atmos data. Complementary to these experiments, we have run Exps. MDA5-w05-d2-Osep and MDA5-w05-d2-Asep, using "uncoupled" data assimilation, where we only update the Ocean variable using the Ocean data and only the Atmos variable when conditioning on the Atmos data (see Figs. <ref type="figure">8</ref> and <ref type="figure">10</ref>).</p><p>Assimilation of only Ocean data in experiment MDA5-w05-d2-O constrains the Ocean variable well, as seen in Fig. <ref type="figure">7</ref>. Interestingly, the information from the Ocean data also improves the Atmos variable significantly. After a few data assimilation updates, we have reduced the residuals for the Atmos variable to a low value compared to what we started with (see Fig. <ref type="figure">13</ref>). Note that we have a relatively high density of Ocean measurements in space to resolve the spatial scales in the Ocean model variable. In this case, the Atmos solution's recovery results from the coupled data assimilation. In contrast, the results of Exp. MDA5-w05-d2-Osep illustrate that when we only update the Ocean variables when assimilating Ocean data, we see hardly any impact on the Atmos variable, and we also observe worsening Ocean results compared to the coupled case. This result is significant, as it shows that if we improve the atmosphere via ocean observations, the atmosphere couples back to the ocean to improve the ocean simulation further.</p><p>The situation differs for the alternative case, MDA5-w05-d4-A, where we only assimilate Atmos data. We obtained good convergence for the large-scale Atmos model, but with the current spatial resolution in the data, we would need more information to control the Ocean model. We see some impact of the assimilation in the Ocean variable in Fig. <ref type="figure">9</ref> where we recover the Ocean's "large-scale" structures and somewhat reduce the domain-averaged estimated standard deviation. However, compared with Exp. MDA5-w05-d4-Asep in Fig. <ref type="figure">10</ref>, this weak improvement in the Ocean variable is primarily due to the model coupling and dynamic interaction between the Atmos and Ocean variables. Hence, in our problem setting, the atmospheric observations do not directly update the ocean. Still, the ocean variables benefit indirectly from better atmospheric fields in the forecast stage. Interestingly, the estimated standard deviation is large near sharp gradients in the Ocean solution, for which the Atmos observations are too course. The difference between the coupled assimilation of Ocean and Atmos data in experiment MDA5-w05-d5 and the separate assimilation of Ocean and Atmos data in experiment MDA5-w05-d5sep emphasizes the value of the coupled assimilation. Exp. MDA5-w05-d5 presented in Fig. <ref type="figure">11</ref> converges after a few assimilation windows, and we can control the further evolution of the model ensemble. Exp. MDA5-w05-d5sep shown in Fig. <ref type="figure">12</ref> converges much slower, and the resulting estimate has larger errors for the Ocean and Atmos variables. The residual plots in Fig. <ref type="figure">13</ref> also support the conclusion that combined assimilation of the Ocean and Atmos data captures the interaction between Ocean and Atmos more accurately and can lead to an improvement in the state estimate compared to the results of separate assimilation for the Ocean and Atmos variables.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Sensitivity study for ensemble DA methods</head><p>We will now further examine the properties of the iterative ensemble smoothers with our coupled model. The focus is not on the coupling but, instead, on how to best configure the assimilation setup for the different assimilation methods in terms of the update strategy for the final iteration or step, the length of the data assimilation window relative to the model's predictability time, the optimal number of steps in ESMDA, and the required number of IES iterations. We will now examine ES, IES, and ESMDA in a recursive setting, where the assimilation updates in future assimilation windows will benefit from the updates from the previous assimilation steps and, thereby, a reduced nonlinearity and a more Gaussian pdf of the prior ensemble. We note that the current versions of IES and ESMDA were previously only studied for parameter estimation on a single window. Still, from the results reported by <ref type="bibr">Bocquet and Sakov (2013a,b)</ref>, with their alternative iterative smoother formulation, we expect a significant improvement when using the iterative methods compared to what we can get with EnKF or ES.</p><p>Before we discuss the experiments in more detail, we note that ES allows for state updates over the whole assimilation window, even when there are no model errors. The reason is that the ensemble is present over the whole assimilation window, and Bayes's theorem allows us to update the solution at every time step within the window. Of course, there is a direct advantage to updating the state at the end of the window in a perfect model setting, as this will typically lead to a better forecast than updating only at the beginning of the assimilation window. This approach is impossible in strongconstraint variational methods, which define the data assimilation problem as finding the initial conditions that lead to the model solution that minimizes a cost function. The violation of the strong constraint model assumption in ES might seem inconsistent. Still, it is similar to the inconsistency in a strong-constraint 4DVAR over a sequence of assimilation windows: we violate the perfect model assumption at the start of each assimilation window. <ref type="bibr">Van</ref>  solution at the end of the assimilation window in a strongconstraint 4DVAR.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>a. Window update or model rerun</head><p>From section 3, we noticed that we can update the solution at any instant over the assimilation window when using ES. Furthermore, for an assimilation window, the update uses the same linear combination of the prior ensemble realizations independently of the time step we are updating. Hence, for the ES update, we can choose between two strategies: update the solution over the whole window or update only the assimilation window's initial conditions and rerun the model to obtain the solution over the window. We will see that these two strategies lead to substantially different results for nonlinear dynamical models. At the same time, in the limit of infinite ensemble size, there would be no difference for linear models without model errors.</p><p>For ES, it is always better to update the whole window and, particularly, the end time of the window, as this estimate becomes the initial condition for the continued integration over the following window. The importance of updating the whole window depends on the length of the assimilation window relative to the model's predictability time. For short time windows, the error growth caused by the nonlinear and unstable model dynamics will not have time to impact the prediction significantly. In this case, whether we update the window's initial conditions or compute the linear ES update over the whole window yields similar results. However, for FIG. 14. Exp. ES-w06-d2: ES with an assimilation window with a length of six time units and a data-assimilation update of the final solution over the assimilation window.</p><p>TABLE <ref type="table">2</ref>. The experiments used for testing whether to update initial conditions or the whole ensemble over the data assimilation window. The experiment names ending with ini update the initial conditions of the data assimilation window and then rerun the model ensemble to obtain the final solution over the window.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Experiment</head><p>Window length Num Ocean obs Num Atmos obs Dt Ocean obs Dt Atmos obs ES-w06-d2 6 40 10 2 2 MDA5-w12-d5 12 40 10 5 5 IES-w05-d2 5 40 10 2 2 ES-w06-d2ini 6 40 10 2 2 MDA5-w12-d5ini 12 40 10 5 5 IES-w05-d2ini 5 40 10 2 2</p><p>longer time windows, the uncertainty at the end of the window obtained by integrating the model ensemble from updated initial conditions typically tends toward climatology, and the ensemble prediction in the next assimilation window starts all over again from a climatological level where the ensemble has forgotten all information from previously assimilated data.</p><p>We will now illustrate the impact of updating either the data assimilation window's initial conditions followed by an ensemble integration or updating the ensemble directly over the whole window in the final iteration or step. We have run the experiments listed in Table <ref type="table">2</ref>, where we chose the window lengths in the different experiments to emphasize the impact of using the two strategies with the different ensemble smoothers.</p><p>In Figs. 14 and 15, we illustrate the difference between the two update strategies when using ES for a case with window lengths of six units of time and measurements every second unit of time. We obtain a faster convergence and lower standard deviations when we update the ensemble over the whole window. This case corresponds to computing an EnKS solution, where we have used measurements within an assimilation window to update the ensemble at the end of the assimilation window before continuing the integration.</p><p>ESMDA updates the assimilation window's initial conditions recursively, and following each update, we must rerun the ensemble over the window to obtain the solution for the current ESMDA step. This procedure increases the ensemble spread and uncertainty over the window due to using a nonlinear and unstable model. However, we can control some uncertainty growth at the final update step by calculating the ESMDA update over the whole window. This approach avoids the final ensemble integration. In the case of long assimilation windows, we obtain a better estimate of the solution at the end of the assimilation window and, thereby, a better estimate of the initial conditions of the following window. Figures <ref type="figure">16</ref> and <ref type="figure">17</ref> illustrate how computing the solution over the whole window in the final ESMDA step improves the estimate. In particular, we notice a faster convergence for the Atmos variable, and we are better at controlling the dynamic instabilities that develop while integrating the model over the window. In an operational weather or climate prediction setting, we would typically update the solution at the end of the assimilation window at the final ESMDA step before predicting the solution over the following window.</p><p>The situation changes entirely in the IES as shown in Figs. <ref type="figure">18</ref> and <ref type="figure">19</ref>. Updating the solution over the whole window in the final iteration significantly degrades the estimate. When using IES, we should update the initial condition of the assimilation window in the final integration. The reason ESMDA experiences an improvement while the IES solution degrades, in this case, is the following. When computing the last update in ESMDA, the solution is a linear combination of the ensemble simulations of the previous MDA step. On the other hand, in IES, the estimate is a linear combination of the prior ensemble simulations. Each step in ESMDA improves the initial conditions for the assimilation window, leading to an improved ensemble prediction over the assimilation window with lower uncertainty. Thus, the prior ensemble is already close to the posterior solution in the final update step.</p><p>The summary in Fig. <ref type="figure">20</ref> clearly illustrates the above findings. The most surprising observation from this figure is that the window length used for ESMDA significantly exceeds the one used with IES. We will see in the following section that the possibility of using a standard ES update for the final ESMDA step makes ESMDA less sensitive to the window length when used in sequential data assimilation.</p><p>ESMDA also tolerates more significant errors in the initial conditions of the ensemble since we can partly correct these errors in the final ES-type update step. Furthermore, while ESMDA computes recursive linear regression updates, IES is a gradient-descent method and becomes sensitive to substantial nonlinearities. We will elaborate more on these topics in the following sections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1) ENSEMBLE SMOOTHER SENSITIVITY EXPERIMENTS</head><p>We will now examine how the ensemble smoothers perform when we vary the length of the assimilation windows. In the case of ES, we repeat the experiments for window lengths increasing from one to nine units of time, and the time interval between the measurements is two for all the experiments. We have 10 and 40 equally spaced measurements of the Atmos and Ocean variables at each measurement time. We denote the experiments ES-w0[1-9]-d2 <ref type="bibr">[ini]</ref>, where ES denotes the ensemble smoother, and the numbers 0[1-9] define the length of the assimilation windows, and d2 tells us that we have measurements available every second unit of time. Finally, an "ini" at the end tells us we reran the model ensemble to obtain the solution. In contrast, the experiments without the ini computed the ES update over the whole window.</p><p>Note that the ES and the EnKF will always have identical solutions at the end of the assimilation window; therefore, the prior for the following window will also be the same. We have already shown that we obtain better results from ES when we update the solution at the end of the assimilation window than when we update the window's initial conditions and rerun the ensemble over the window. From the upper plots in Fig. <ref type="figure">20</ref>, we notice that ES generally leads to more accurate solutions when updating the solution over the window and not rerunning the ensemble. Updating the window's initial conditions results in poorer performance for all window lengths, and we can conclude that the standard EnKF with updates at the end of the assimilation window is the way to go when the emphasis is on forecasting.</p><p>Regarding the length of the assimilation window, the ES experiments perform well for a short window length of two units of time. Still, the performance deteriorates rapidly for longer assimilation windows. This worsening of the results with increasing window length comes from the model's nonlinearity, which results in a prior ensemble prediction over the window that approaches climatology for long windows. Then, the linear update computed by ES will break down, as previously discussed by <ref type="bibr">Evensen and van Leeuwen (2000)</ref> in an example using the Lorenz-63 equations. From the upper left plot in Fig. <ref type="figure">20</ref>, we notice how the standard deviation over the window grows until the simulation approaches climatology with long windows. The net effect is that at the end of the assimilation window, no predictive skill propagates forward into the following window. So, when using ES and EnKF, we should use short data assimilation windows and update the solution at the end of, or the whole, window.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2) IES SENSITIVITY EXPERIMENTS</head><p>We now move on to examining IES in more detail. Similarly to the ES experiments in the previous section, we use the notation IES-w[01-15]-d2[ini] for the IES experiments. With IES, we could extend the length of the assimilation window, and we have run experiments with assimilation window lengths up to 15 units of time. In all the experiments, we used a maximum of 12 iterations, but in most cases, we observed no improvement after about five to eight iterations.</p><p>The plots in the second row of Fig. <ref type="figure">20</ref> summarize the residuals from using IES with different assimilation window lengths. IES performs equally well for window lengths of two to seven units of time, as seen from the right plot in the second row of Fig. <ref type="figure">20</ref>, and the results are significantly better than those obtained using ES (see upper left panel in the figure). Thus, an iterative method benefits from a more accurate estimate but comes at a higher computational cost.</p><p>As for ES, we repeated all 15 IES-w[01-15]-d2ini experiments by a set of experiments IES-w[01-15]-d2 where we used the IES transition matrix to update the whole window in the final iteration. However, from the residuals in Fig. <ref type="figure">20</ref>, it is clear that for IES, in contrast to ES, we should rerun the model ensemble in the final iteration. We explained in section 3 that IES computes the updated ensemble of initial conditions as a linear combination of the prior ensemble initial conditions. Furthermore, we recall that the linear combination FIG. 17. Exp. MDA5-w12-d5ini: As in Fig. <ref type="figure">16</ref>, but with an update of the window's initial condition.</p><p>defined by the transition matrix leads to a posterior ensemble prediction that minimizes the ensemble of cost functions in Eq. ( <ref type="formula">6</ref>). Thus, we lose the effect of the model nonlinearity by directly updating the posterior ensemble over the data assimilation window. Note that the estimates would become identical when using a linear model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3) ESMDA SENSITIVITY EXPERIMENTS</head><p>We now apply the ESMDA method in experiments MDAw[01-15]-d2[ini], similar to those from the previous section. The ESMDA formulation for the recursive data assimilation problem assumes that we update the initial state and rerun the model ensemble in each step to recompute the gradient. However, since ESMDA computes a sequence of independent ES updates, we can choose whether to update the initial conditions for the assimilation window or the whole solution over the window in the final ESMDA step.</p><p>From the plots in the bottom row of Fig. <ref type="figure">20</ref>, we notice that ESMDA performs significantly better than ES in both the case with updating the whole window and when we rerun the model ensemble over the window. We can extend the window length significantly compared to when using ES, and there are hardly any differences between the ESMDA or IES solutions for window sizes up to six units of time. We note, however, that the ESMDA residuals diverge from the ensemble standard deviations for window lengths greater than 14 units of time in the MDAX experiments.</p><p>We also note that computing an ES update of the whole window in the final step stabilizes the solution for longer windows and improves the estimate, particularly for long window lengths. This strategy also saves one ensemble integration.</p><p>To conclude, we are less impacted by the model's nonlinearity when using ESMDA than when using ES, and ESMDA has the added advantage of significantly extending the data assimilation window length. We also obtained a substantially better result with ESMDA than ES, although at a higher computational cost for the same window length.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>b. Sensitivity to number of MDA steps</head><p>The number of ESMDA steps can significantly impact the solution <ref type="bibr">(Evensen and Eikrem 2018;</ref><ref type="bibr">Evensen et al. 2021)</ref>. Ideally, we would like to use as few steps as possible to reduce computational cost since every step implies a simulation of the whole ensemble. Using around four to eight steps is common <ref type="bibr">(Zhao et al. 2017;</ref><ref type="bibr">Evensen and Eikrem FIG. 18</ref>. Exp. IES-w05-d2: IES with an assimilation window with a length of 12 time units and a data-assimilation update of the final solution over the assimilation window.</p><p>2018), but we currently lack a general rule for how many steps we need to obtain the best estimate. We can only try a different number of steps, and when the solution does not change within the expected sampling errors, we can assume we have converged.</p><p>An issue is that every ESMDA step is an independent ES update and introduces a new set of perturbed observations with increasing perturbation magnitudes when using multiple steps. Another problem is that using a limited ensemble size introduces sampling errors. In the current experiments, we have used 1000 realizations to minimize the impact of sampling errors and data assimilation windows of 5, 10, and 15 units of time. We ran nine experiments using one to nine steps, and we plot the residuals in Fig. <ref type="figure">21</ref>, averaged over the converged solution during the last 100 units of time. The experiments using a single step are equivalent to running ES, and it is clear that using more than one step improves the solution. In the current example, using two MDA steps leads to a significant improvement, and for a window length of five units of time, we obtain nearly identical accuracy when using ESMDA with two to nine update steps. In this case, using more than two to three steps are useless. For the longer assimilation windows of 10 and 15 units of time, we obtain the most consistent solution using only three to seven MDA steps, all with similar performance. We also note that additional steps lead to divergence between the actual and estimated residuals. In the case with an assimilation window of 15 units of time and nine MDA steps, we experience substantial filter divergence.</p><p>Note that running experiments with a relatively small ensemble size makes it useless to run additional ESMDA steps if the method has already converged to a level lower than the sampling errors resulting from the limited ensemble size. A positive result from these experiments is that we can obtain estimates with excellent accuracy using relatively few MDA steps.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Comparative performance of ensemble smoothers</head><p>Which ensemble method should we use? We can apply the ES with an update of the whole assimilation window or in a pure EnKF setting with an update of the final time step of the window. However, ESMDA and IES will improve upon the ES results in cases with significant nonlinearity at an additional computational cost.</p><p>We like the consistency of the IES performance seen in the middle right plot of Fig. <ref type="figure">20</ref>, where the converged solution shows consistency between actual and estimated residuals up to a window length of eight units of time. On the other hand, ESMDA converged in three to five steps for moderate assimilation window lengths and may be more computationally efficient than IES. ESMDA also works well with longer window lengths due to the final MDA step, where we compute an ES update over the whole window, which results in a higher accuracy and lower uncertainty of the initial conditions for the subsequent window.</p><p>In all the previous IES experiments, we ran a maximum of 12 iterations, but in most cases, IES converged in fewer iterations. In a sequential data assimilation system, the prior for each successive assimilation window is typically rather good, so we could reduce the number of iterations in IES without sacrificing the quality of the results. We also tested IES on the cases IES4-w[02-06]-d2ini using a maximum of four iterations for each update, which in computational cost corresponds to running ESMDA with five steps since ESMDA avoids the final ensemble integration by updating the ensemble over the whole assimilation window in the last step. We note that the final ES update used in ESMDA helps control nonlinear instabilities and leads to an improved initial condition for the following window. In Fig. <ref type="figure">22</ref>, we compare the time series of residuals for the IES (IES-w[02-06]-d2ini), IES with four iterations (IES4w[02-06]-d2ini), and the default case of ESMDA with five steps (MDA5-w[02-06]-d2). We cannot conclude from these plots that one configuration is significantly better or worse. Figure <ref type="figure">23</ref> compares the residuals from the different cases over the time interval 101-200, where the experiments have reached a quasi-steady accuracy level following the initial assimilation updates. Note that using a finite ensemble size introduces a small uncertainty. We could have repeated the experiments with different random seeds to obtain more robust estimates and plotted all residuals with an uncertainty estimate, but this becomes more relevant when using smaller ensemble sizes than the 1000 realizations used in all the experiments in this paper.</p><p>In addition to the actual performance issues, there could also be preferences regarding the theoretical formulations of the different techniques. Could we assume that ESMDA solves for minuscule ES updates that introduce Gaussianity into the next window's prior? The recursive-in-time updating process in sequential data assimilation may also reduce the RML sampling approximation.</p><p>The main cost of running the different ensemble smoothers is associated with integrating the ensemble of model realizations in each update step. While the ES is the most cost efficient as it only computes a single update, ESMDA and IES improve the results but require a few, e.g., three to four iterations to converge. We also note that the shorter the assimilation window, the less nonlinear the problem becomes and the faster the convergence. It is good to realize that ESMDA with, say, 50 localized ensemble members and four iterations have costs similar to, or even less than, an ensemble of 4DVARs. A rough estimate of the computational costs is the number of model runs needed, assuming the actual assimilation update requires significantly less computation than the model runs. In that case, the cost is the number of ensemble members times the number of iterations used.</p><p>Additional sensitivity studies could involve varying ensemble sizes, but we have decided to exclude such a study in this paper since it would also include using localization methods. In future studies, we also want to examine the combined impact of different temporal and spatial scales in the two model components.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Summary</head><p>In this study, we have demonstrated the possibility of using adjoint-free iterative ensemble methods for sequential data assimilation. We defined a coupled multiscale model system based on the nonlinear Kuramoto-Sivashinsky model. The model system allows for examining data assimilation in multiscale dynamical systems and models with unstable dynamics. An advantageous property of this model is its saturation of prediction uncertainty at a climatologic level, similar to what we observe in atmospheric and ocean models. In addition, the model is one dimensional in space, allowing us to represent spatial variability in the model. In addition to demonstrating the importance and value of coupled data assimilation updates, we have studied the properties of iterative ensemble smoothers. The joint assimilation of data for a model operating at different physical scales and with collective updates of both model components yields superior results compared to treating the model components and data independently. In particular, we demonstrated the importance of assimilating measurements from the component with the small spatial scales into the coupled system for reconstructing the component with the large spatial scales. We also explained the similarity between ensemble 4DVAR and our iterative smoothers. The main differences include the replacement of the tangent-linear operator with an ensemble-averaged model sensitivity, which eliminates the construction of tangent-linear and adjoint models. Additionally, we represent all covariance matrices with an ensemble of model realizations. We demonstrated the effectiveness and efficiency of ESMDA and IES in dealing with nonlinearities in a coupled model. Furthermore, we illustrated the effect of updating over the assimilation window versus updating the window's initial conditions in ensemble smoothers. We believe the methods and formulations, initially developed for petroleum applications, will be valuable for FIG. 22. The plots show the time evolution of residuals for the ensemble prediction using different window lengths for a converged IES, an IES using only four iterations (IES4), and an ESMDA with five steps (MDA5). The dark lines indicate RMSE relative to the reference solution, while the light lines are the ensemble-predicted RMSE.</p><p>future data assimilation systems for atmospheric, oceanic, and other coupled Earth system models.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>JUNE 2024</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>E V E N S E N E T A L .</p></note>
		</body>
		</text>
</TEI>
