<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Learning Closed‐Form Equations for Subgrid‐Scale Closures From High‐Fidelity Data: Promises and Challenges</title></titleStmt>
			<publicationStmt>
				<publisher>AGU</publisher>
				<date>07/01/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10558236</idno>
					<idno type="doi">10.1029/2023MS003874</idno>
					<title level='j'>Journal of Advances in Modeling Earth Systems</title>
<idno>1942-2466</idno>
<biblScope unit="volume">16</biblScope>
<biblScope unit="issue">7</biblScope>					

					<author>Karan Jakhar</author><author>Yifei Guan</author><author>Rambod Mojgani</author><author>Ashesh Chattopadhyay</author><author>Pedram Hassanzadeh</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<title>Abstract</title> <p>There is growing interest in discovering interpretable, closed‐form equations for subgrid‐scale (SGS) closures/parameterizations of complex processes in Earth systems. Here, we apply a common equation‐discovery technique with expansive libraries to learn closures from filtered direct numerical simulations of 2D turbulence and Rayleigh‐Bénard convection (RBC). Across common filters (e.g., Gaussian, box), we robustly discover closures of the same form for momentum and heat fluxes. These closures depend on nonlinear combinations of gradients of filtered variables, with constants that are independent of the fluid/flow properties and only depend on filter type/size. We show that these closures are the nonlinear gradient model (NGM), which is derivable analytically using Taylor‐series. Indeed, we suggest that with common (physics‐free) equation‐discovery algorithms, for many common systems/physics, discovered closures are consistent with the leading term of the Taylor‐series (except when cutoff filters are used). Like previous studies, we find that large‐eddy simulations with NGM closures are unstable, despite significant similarities between the true and NGM‐predicted fluxes (correlations >0.95). We identify two shortcomings as reasons for these instabilities: in 2D, NGM produces zero kinetic energy transfer between resolved and subgrid scales, lacking both diffusion and backscattering. In RBC, potential energy backscattering is poorly predicted. Moreover, we show that SGS fluxes diagnosed from data, presumed the “truth” for discovery, depend on filtering procedures and are not unique. Accordingly, to learn accurate, stable closures in future work, we propose several ideas around using physics‐informed libraries, loss functions, and metrics. These findings are relevant to closure modeling of any multi‐scale system.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Turbulent flows are ubiquitous in many geophysical systems, including atmospheric and oceanic circulations, and play an important role, for example, greatly enhancing mixing and transport. Direct numerical simulation (DNS) of high-dimensional turbulent flows often becomes computationally intractable. Therefore, numerical simulations of most geophysical turbulent flows cannot resolve all the relevant scales <ref type="bibr">(Fox-Kemper et al., 2019;</ref><ref type="bibr">Palmer, 2001;</ref><ref type="bibr">Schneider, Teixeira, et al., 2017)</ref>. Large-eddy simulation (LES) is a practical approach to balance computational cost and accuracy: the large scales of the flow are explicitly resolved, while the effects of the small-scale features which cannot be resolved by the given grid resolution, called subgrid-scale (SGS) features, are parameterized as a function of the resolved flow <ref type="bibr">(Pope, 2000;</ref><ref type="bibr">Sagaut, 2006;</ref><ref type="bibr">Smagorinsky, 1963)</ref>. However, the performance of the LES models strongly depends on the accuracy of the employed SGS closure. Over years, there have been extensive efforts focused on formulating physics-based and semi-empirical SGS closures using various techniques in many turbulent flows <ref type="bibr">(Meneveau &amp; Katz, 2000;</ref><ref type="bibr">Moser et al., 2021;</ref><ref type="bibr">Pope, 2000;</ref><ref type="bibr">Sagaut, 2006)</ref>, including geophysical flows <ref type="bibr">(Alexander &amp; Dunkerton, 1999;</ref><ref type="bibr">Anstey &amp; Zanna, 2017;</ref><ref type="bibr">Berner et al., 2017;</ref><ref type="bibr">Cessi, 2008;</ref><ref type="bibr">Gallet &amp; Ferrari, 2020;</ref><ref type="bibr">Herman &amp; Kuang, 2013;</ref><ref type="bibr">Jansen &amp; Held, 2014;</ref><ref type="bibr">Khodkar et al., 2019;</ref><ref type="bibr">O'Kane &amp; Frederiksen, 2008;</ref><ref type="bibr">Sadourny &amp; Basdevant, 1985;</ref><ref type="bibr">Schneider, Teixeira, et al., 2017;</ref><ref type="bibr">Sridhar et al., 2022;</ref><ref type="bibr">Sullivan et al., 1994;</ref><ref type="bibr">Tan et al., 2018;</ref><ref type="bibr">Zanna et al., 2017)</ref>.</p><p>The challenge of modeling SGS closures lies in faithfully representing the two-way interactions between the SGS processes and the resolved large-scale dynamics. There are two general approaches to SGS modeling: (a) functional and (b) structural <ref type="bibr">(Sagaut, 2006)</ref>. The functional SGS closures are developed by considering the interscale interactions (e.g., energy transfers). This is often achieved by introducing a dissipative term. Hence, functional SGS closures generally take an eddy-viscosity form to mimic the average function of the SGS eddies. Among the first and most-used functional closures is the Smagorinsky model <ref type="bibr">(Smagorinsky, 1963)</ref>. Later, dynamic formulations of this model were proposed, in which the key coefficient is dynamically adjusted to the local structures of the flow <ref type="bibr">(Chai &amp; Mahesh, 2012;</ref><ref type="bibr">Germano, 1992;</ref><ref type="bibr">Ghosal et al., 1993;</ref><ref type="bibr">Lilly, 1992)</ref>. Existing functional closures, most of which are the eddy-viscosity type, can be excessively dissipative <ref type="bibr">(Guan et al., 2022;</ref><ref type="bibr">Vreman et al., 1996)</ref>. Furthermore, they cannot capture the structure of the SGS terms, leading to a low correlation coefficient (CC &lt; 0.5) with the true SGS terms, that is, those diagnosed from the DNS data <ref type="bibr">(Carati et al., 2001;</ref><ref type="bibr">Guan et al., 2022;</ref><ref type="bibr">Moser et al., 2021)</ref>.</p><p>On the contrary, structural closures tend to have much higher CC with the true SGS terms. Structural closures approximate the SGS terms by constructing it from an evaluation of large-scale motions or a formal series expansion. One of the most common structural closures is the nonlinear gradient model <ref type="bibr">(Clark et al., 1979;</ref><ref type="bibr">Leonard, 1975)</ref>, referred to as NGM hereafter (it is also known as the tensor diffusivity model <ref type="bibr">(Leonard &amp; Winckelmans, 1999)</ref>). The NGM can be derived analytically: the SGS term is approximated using a first-order truncated Taylor-series expansion of the SGS stress' convolution integral (details discussed later). However, despite CC &gt; 0.9, LES with NGM closure has been found to be unstable in many studies of two-dimensional (2D) and three-dimensional (3D) turbulence. These instabilities are often attributed to insufficient dissipation and more importantly, to the presence of too-strong backscattering in NGM (S. <ref type="bibr">Chen et al., 2003</ref><ref type="bibr">Chen et al., , 2006;;</ref><ref type="bibr">Fabre &amp; Balarac, 2011;</ref><ref type="bibr">Leonard, 1997</ref><ref type="bibr">Leonard, , 2016;;</ref><ref type="bibr">Liu et al., 1994;</ref><ref type="bibr">Lu &amp; Port&#233;-Agel, 2010;</ref><ref type="bibr">Meneveau &amp; Katz, 2000;</ref><ref type="bibr">Moser et al., 2021;</ref><ref type="bibr">Prakash et al., 2021;</ref><ref type="bibr">Vollant et al., 2016)</ref>. As a result, while backscattering (basically anti-diffusion or up-gradient flux) is an important process to represent in closure models <ref type="bibr">(Grooms et al., 2015;</ref><ref type="bibr">Guan et al., 2022;</ref><ref type="bibr">Hewitt et al., 2020;</ref><ref type="bibr">Nadiga, 2010;</ref><ref type="bibr">Shutts, 2005)</ref>, it is ignored in most practical SGS closures in favor of stability (though there has been some new exciting progress in this area; see, e.g., <ref type="bibr">Jansen et al. (2015)</ref> and <ref type="bibr">Juricke et al. (2020)</ref>). In fact, currently operational climate models do not account for backscattering in their ocean parameterizations <ref type="bibr">(Hewitt et al., 2020)</ref>. Consequently, a framework for developing SGS closures with the right amount of diffusion and backscattering, that can capture both the structure and function of the SGS terms, has remained elusive <ref type="bibr">(Moser et al., 2021;</ref><ref type="bibr">Pope, 2000;</ref><ref type="bibr">Sagaut, 2006)</ref>.</p><p>Before moving forward, it should be pointed out that while the discussion so far has been focused on closure for geophysical turbulence, many other critical processes in the Earth system (in atmosphere, ocean, land, cryosphere, biosphere and at their interfaces) require parameterizations in Earth system models <ref type="bibr">(Schneider, Jeevanjee, &amp; Socolow, 2021;</ref><ref type="bibr">Stensrud, 2009)</ref>. Thus, the discussion below and as clarified later, the findings of this paper, are broadly relevant to parameterization efforts in Earth science.</p><p>Recently, machine learning (ML) has brought new tools into SGS closure modeling <ref type="bibr">(Balaji, 2021;</ref><ref type="bibr">Brunton et al., 2020;</ref><ref type="bibr">Duraisamy, 2021;</ref><ref type="bibr">Gentine et al., 2021;</ref><ref type="bibr">Schneider, Lan, et al., 2017;</ref><ref type="bibr">Zanna &amp; Bolton, 2021)</ref>. The strength of ML techniques is their ability to handle high-dimensional data and learn strongly nonlinear relationships. Therefore, ML techniques are attractive tools that might be able to extract more hidden knowledge from data, potentially providing better SGS closures and even new insights into SGS physics. Data-driven SGS closures, for example, based on deep neural networks trained on high-fidelity simulation data such as DNS data, have been developed for canonical geophysical flows such as 2D and quasi-geostrophic turbulence <ref type="bibr">(Bolton &amp; Zanna, 2019;</ref><ref type="bibr">Frezat et al., 2022;</ref><ref type="bibr">Guan et al., 2022</ref><ref type="bibr">Guan et al., , 2023;;</ref><ref type="bibr">Maulik et al., 2018;</ref><ref type="bibr">Pawar et al., 2020;</ref><ref type="bibr">Srinivasan et al., 2023)</ref> and oceanic and atmospheric circulations <ref type="bibr">(Beucler et al., 2021;</ref><ref type="bibr">Brenowitz &amp; Bretherton, 2018;</ref><ref type="bibr">Cheng et al., 2022;</ref><ref type="bibr">Guillaumin &amp; Zanna, 2021;</ref><ref type="bibr">Rasp et al., 2018;</ref><ref type="bibr">Yuval &amp; O'Gorman, 2020;</ref><ref type="bibr">X. Zhang et al., 2022)</ref>. While some of these studies found the learned data-driven SGS closures to lead to stable and accurate LES <ref type="bibr">(Frezat et al., 2022;</ref><ref type="bibr">Guan et al., 2022</ref><ref type="bibr">Guan et al., , 2023;;</ref><ref type="bibr">Yuval &amp; O'Gorman, 2020)</ref>, a number of major challenges remain <ref type="bibr">(Balaji, 2021;</ref><ref type="bibr">Schneider, Jeevanjee, &amp; Socolow, 2021)</ref>. Perhaps the most important one is interpretability, which is difficult for neural networks, despite some recent advances in explainable ML for climate-related applications <ref type="bibr">(Clare et al., 2022;</ref><ref type="bibr">Mamalakis et al., 2022)</ref>, including for SGS modeling <ref type="bibr">(Pahlavan et al., 2024;</ref><ref type="bibr">Subel et al., 2023)</ref>. The black-box nature of neural network-based closures aside, there are also challenges related to generalizability, computational cost, and even implementation <ref type="bibr">(Balaji, 2021;</ref><ref type="bibr">Chattopadhyay et al., 2020;</ref><ref type="bibr">Guan et al., 2022;</ref><ref type="bibr">Kurz &amp; Beck, 2020;</ref><ref type="bibr">Maulik et al., 2019;</ref><ref type="bibr">Subel et al., 2021;</ref><ref type="bibr">Xie et al., 2019;</ref><ref type="bibr">Zhou et al., 2019)</ref>, limiting the broad application of such closures in operational climate and weather models, at least for now.</p><p>An alternative approach that is rapidly growing in popularity involves using ML techniques that provide interpretable, closed-form equations, for example, using sparse linear regression. The underlying idea of this equationdiscovery approach is that given spatial, temporal, or spatio-temporal data from a system, one can discover the governing (algebraic or differential) equations of that system <ref type="bibr">(Brunton et al., 2016;</ref><ref type="bibr">Y. Chen et al., 2022;</ref><ref type="bibr">Goyal &amp; Benner, 2022;</ref><ref type="bibr">Mojgani et al., 2022;</ref><ref type="bibr">Rudy et al., 2017;</ref><ref type="bibr">Schaeffer, 2017;</ref><ref type="bibr">Schmidt &amp; Lipson, 2009;</ref><ref type="bibr">Schneider, Stuart, &amp; Wu, 2021;</ref><ref type="bibr">Schneider et al., 2020</ref><ref type="bibr">Schneider et al., , 2022;;</ref><ref type="bibr">Udrescu &amp; Tegmark, 2020;</ref><ref type="bibr">S. Zhang &amp; Lin, 2018)</ref>. Most of the aforementioned studies are focused on discovering the entire governing equations from data, though few recent studies have used this approach to discover SGS closures (see below). This approach has the following advantages over more complex methods such as neural networks in the context of SGS modeling: (a) the learned closure is significantly easier to interpret based on physics <ref type="bibr">(Zanna &amp; Bolton, 2020)</ref>, (b) the number of required training samples and the training costs are often considerably lower <ref type="bibr">(Brunton et al., 2020;</ref><ref type="bibr">Mojgani et al., 2022</ref><ref type="bibr">Mojgani et al., , 2023))</ref>, and (c) the computational cost of implementation in conventional solvers is lower, as the discovered closures often involve traditional operations, for example, gradients and Laplacians <ref type="bibr">(Ross et al., 2023;</ref><ref type="bibr">Udrescu &amp; Tegmark, 2020)</ref>.</p><p>A number of equation-discovery techniques and test cases have been recently employed for structural modeling of the SGS stress. In the first study of its kind, <ref type="bibr">Zanna and Bolton (2020)</ref> used relevance vector machine (RVM), a sparsity-promoting Bayesian linear regression technique, with a library of second-order velocity derivatives and their nonlinear combinations, to learn a closed-form closure model for the SGS momentum and buoyancy fluxes from filtered high-resolution simulations of ocean mesoscale turbulence. They found a closure that resembled the NGM, with close connections to earlier physics-based modeling work by <ref type="bibr">Anstey and Zanna (2017)</ref>. Although, the discovered closure performed well in a priori (offline) tests, it was unstable a posteriori (online), that is, when it was coupled to a low-resolution ocean solver. Following the same general approach, more recently, <ref type="bibr">Ross et al. (2023)</ref> proposed a novel equation-discovery approach combining linear regression and genetic programming. This hybrid approach uses genetic programming to discover the structure of the equation followed by linear regression to fine-tune the coefficients. In contrast to methods such as RVM, genetic programming does not require an explicit library of features, instead, it uses a simple set of features and operations, and constructs expressions by successively applying operators and combining expressions. Similarly, in other disciplines, <ref type="bibr">Reissmann et al. (2021)</ref> and <ref type="bibr">Li et al. (2021)</ref> recently used are the number of grid points on the DNS grid in the horizontal and vertical directions, respectively. L = 6&#960; is the length of the domain in the horizontal direction. Note that the lowest N LES is chosen such that the LES resolution resolves at least 80% of the DNS kinetic energy <ref type="bibr">(Pope, 2000)</ref>. Filters are only applied along the horizontal direction.  <ref type="bibr">(Pope, 2000)</ref>. Filters are applied in both spatial dimensions for 2D-FHIT.</p><p>gene expression programming to discover SGS stress for the Taylor-Green vortex and the 3D isotropic turbulence, respectively. They developed a nonlinear closure consisting of the local strain rate and rotation rate tensors, based on what is known as Pope tensors <ref type="bibr">(Pope, 1975)</ref>, which will be discussed later. Overall, these more recent studies found that gene expression programming-and genetic programming-based closures often outperform common baselines such as the Smagorinsky and the mixed models when turbulence statistics and flow structures are considered <ref type="bibr">(Li et al., 2021;</ref><ref type="bibr">Reissmann et al., 2021;</ref><ref type="bibr">Ross et al., 2023)</ref>. Note that there also have been a number of studies focused on equation-discovery for functional modeling, for example, using techniques such as Ensemble Kalman inversion <ref type="bibr">(Schneider, Stuart, &amp; Wu, 2021;</ref><ref type="bibr">Schneider et al., 2020)</ref>; see the Section 4.</p><p>In this study, we build on the work by <ref type="bibr">Zanna and Bolton (2020)</ref> and use 2D forced homogeneous isotropic turbulence (2D-FHIT) and Rayleigh-B&#233;nard convection (RBC) to extend and expand their analysis in several directions:</p><p>1. We use RVM with an expansive high-order library to discover closures from DNS data for the SGS momentum flux tensor (2D-FHIT and RBC) and the SGS heat flux vector (RBC). 2. We conduct extensive robustness analysis of the discovered closures across a variety of flow configurations, filter types, and filter sizes, and examine the potential effects of numerical errors. 3. Further clarify the connections between the robustly discovered SGS momentum and heat flux closures, and the SGS closures obtained analytically from the truncated Taylor-series expansion of the filter's convolution integral, the NGM <ref type="bibr">(Leonard, 1975)</ref>. 4. Explain the physical reason for the unstable a posteriori LES with the discovered SGS closures, despite their high a priori accuracy in some metrics (such as CC). 5. Present a decomposition of the SGS tensor to the <ref type="bibr">Leonard, cross, and Reynolds components, showing their</ref> relative importance and dependence on the filter type/size. 6. Based on these findings, we present a number of ideas for discovering stable and accurate SGS closures from the data in future work. Note that while we focus on the use of RVM here, our findings and conclusions in Equations 1-6 are applicable to any equation-discovery effort, and not just for SGS momentum and heat fluxes in geophysical turbulence, but for SGS modeling in any nonlinear dynamical system.</p><p>This paper is organized as follows. In Section 2, we provide an introduction to the methodology, including the governing equations of test cases (2D-FHIT and RBC), filtering procedure for data and equations, RVM algorithm, and the employed library of the basis functions. Section 3 includes the discussion on the discovered closures, a priori and a posteriori tests, connection with the physics-based closures, and contribution of the Leonard, cross, and Reynolds components. Summary and Discussion are in Section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Models, Methods, and Data</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Filtering Procedure</head><p>In DNS, the velocity field, u(x, t), is resolved using high spatio-temporal resolutions down to all relevant scales. In LES, a low-pass filtering operation, denoted by (.), is performed on the equations and flow fields. The resulting filtered fields, for example, filtered velocity, u(x,t) , can be adequately resolved using relatively coarse spatiotemporal resolutions: the required grid spacing is proportional to the specified filter width, &#916;, which is analogous to the size of the smallest eddies resolved in the LES <ref type="bibr">(Pope, 2000;</ref><ref type="bibr">Sagaut, 2006)</ref>. Using u(x, t) as an example, the general spatial filtering operation is defined by Sagaut ( <ref type="formula">2006</ref>)</p><p>where * is the convolution operator, and the integration is performed over the entire domain. The specified filter kernel, G, satisfies the normalization condition </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8747;</head><p>Subsequently, any flow field such as velocity can be decomposed into a filtered (resolved) part and SGS (residual) part:</p><p>where u&#8242; is the SGS field. While this appears to be analogous to the Reynolds decomposition, an important distinction should be noted: the filtered residual field may not be strictly zero (u&#8242; &#8800; 0, thus u &#8800; u), depending on the choice of the filter function <ref type="bibr">(Sagaut, 2006)</ref>. Further details about the filters used in this work (Gaussian, box, Gaussian + box, and sharp-spectral) are given in Appendix A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Two-Dimensional Forced Homogeneous Isotropic Turbulence (2D-FHIT)</head><p>We consider 2D-FHIT as the first test case. This canonical flow has been extensively used for testing novel physics-based and ML-based SGS closures for geophysical turbulence in the past decades <ref type="bibr">(Boffetta &amp; Ecke, 2012;</ref><ref type="bibr">Chandler &amp; Kerswell, 2013;</ref><ref type="bibr">Guan et al., 2022;</ref><ref type="bibr">Tabeling, 2002;</ref><ref type="bibr">Thuburn et al., 2014;</ref><ref type="bibr">Vallis, 2017;</ref><ref type="bibr">Verkley et al., 2019)</ref>. The dimensionless continuity and momentum equations for 2D-FHIT in (x, y) spatial dimensions are:</p><p>where u = (u,v) is the velocity, p is the pressure, F represents a time-constant external forcing, R is the Rayleigh drag, and Re is the Reynolds number. The domain is doubly periodic with length L = 2&#960;.</p><p>The equations for LES are obtained by applying a homogeneous 2D filter (Equation <ref type="formula">1</ref>) to Equations 4 and 5. The filtered continuity and momentum equations are:</p><p>where &#964; is the SGS stress tensor:</p><p>A closure model is needed to represent &#964; xx , &#964; xy (=&#964; yx ), and &#964; yy , in terms of the resolved flow (u,v,p) . However, currently, this is not possible just using the first principles due to the presence of the u 2 ,uv, and v 2 terms.</p><p>We study three cases of 2D-FHIT (Table <ref type="table">1</ref>), creating a variety of flows that differ in dominant length scales and energy/enstrophy cascade regimes. For DNS, as discussed in Appendix B, Equations 4 and 5 are numerically solved at high spatio-temporal resolutions using a Fourier-Fourier pseudo-spectral solver. For the LES, the same solver at lower spatio-temporal resolution is used (Appendix B). </p><p>Note. For Cases R1-R3 and different filter types, the mean and standard deviation of the discovered coefficients over different N LES are reported (see Table <ref type="table">2</ref>). The average CC of the closure for each element of J is shown in parentheses. The last column shows the analytically derived coefficients for the NGM (see Section 3.2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Turbulent Rayleigh-B&#233;nard Convection (RBC)</head><p>As our second test case, we use 2D turbulent RBC, a widely used canonical flow for buoyancy-driven turbulence <ref type="bibr">(Chill&#224; &amp; Schumacher, 2012;</ref><ref type="bibr">Dabbagh et al., 2017;</ref><ref type="bibr">Hassanzadeh et al., 2014;</ref><ref type="bibr">Kooloth et al., 2021;</ref><ref type="bibr">Lappa, 2009;</ref><ref type="bibr">Sondak et al., 2015)</ref>, which in addition to the SGS (momentum) stress, requires closure modeling of the SGS heat flux <ref type="bibr">(Pandey et al., 2022;</ref><ref type="bibr">Peng &amp; Davidson, 2002;</ref><ref type="bibr">Wang et al., 2008)</ref>. Under the Oberbeck-Boussinesq approximation, the dimensionless governing equations for the flow between horizontal walls at fixed temperatures (the bottom wall being warmer than the top) in (x, z) spatial dimensions are:</p><p>where v = (u,w) is the velocity, &#952; is the temperature (T ) departure from the conduction state, &#7825; is the unit vector in the vertical direction, and Ra and Pr are the Rayleigh and Prandtl numbers, respectively. The domain is periodic in the horizontal direction with horizontal domain size L = 6&#960; and vertical length of 1. No-slip boundary conditions are applied at the walls. We use three cases of turbulent RBC (Table <ref type="table">2</ref>) in which the Ra and Pr are varied.</p><p>To properly resolve the thin boundary layers in turbulent RBC, a pseudo-spectral solver with (non-uniform) Chebyshev collocation points in the vertical direction is used. However, filtering variables on a non-uniform grid can cause major errors in the diagnosed SGS terms, because the filters will not commute with spatial derivatives <ref type="bibr">(Yalla et al., 2021)</ref>. As a result, following the common practice for LES, we only filter the equations in the horizontal direction, where (uniform) Fourier collocation points are used. The LES equations obtained by applying a 1D filter along the horizontal direction, x, to Equations 9-11 are:</p><p>where &#964; is the SGS (momentum) stress tensor Note. The CC of P &#964; for both forward transfer (&gt;0) and backscatter (&lt;0) of SGS energy is "undefined" since P NGM &#964; = 0 everywhere for 2D-FHIT (in general, P FDNS &#964; &#8800; 0). On the contrary, the forward transfer and backscatter of SGS enstrophy are captured well by the NGM. The Gaussian filter is used in FDNS.</p><p>and J is the SGS heat flux vector</p><p>Here, in addition to &#964;, J needs a closure model too.</p><p>For DNS, as discussed in Appendix C, Equations 9-11 are numerically solved at high spatio-temporal resolutions using a Fourier-Chebyshev pseudospectral solver. For LES, the same solver with lower spatial resolution is used (Appendix C).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Filtered Direct Numerical Simulation (FDNS) Data</head><p>It should be highlighted that in this study with two canonical test cases, we consider DNS data as the "truth", and use filtered DNS (FDNS) data to discover the closures. However, in reality, performing DNS for many geophysical flows is computationally prohibitive. In such cases, highresolution LES that adequately resolves the process of interest (e.g., ocean eddies, gravity waves, etc.) is often used as the truth to train the ML algorithms for SGS modeling <ref type="bibr">(Shen et al., 2022;</ref><ref type="bibr">Sun et al., 2023;</ref><ref type="bibr">Yuval &amp; O'Gorman, 2020;</ref><ref type="bibr">Zanna &amp; Bolton, 2021)</ref>.</p><p>Here, we compute FDNS variables on the LES grids, which are 4-64 times coarser than the DNS grid in both spatial dimensions for 2D-FHIT and one spatial dimension for RBC (see Tables <ref type="table">1</ref> and <ref type="table">2</ref>). More specifically, we first apply the respective filter's transfer function (Tables <ref type="table">A1</ref> and <ref type="table">A2</ref>) to the DNS data, and then coarse-grain the results onto the LES grid. Note that following some of the recent papers <ref type="bibr">(Grooms et al., 2021;</ref><ref type="bibr">Guan et al., 2022)</ref>, we define "filtering" as an operation that removes the small scales but keeps the grid resolution (e.g., DNS), and "coarse-graining" as an operation that changes the grid size, for example, from the DNS resolution to LES resolution. Note that &#964; and J in Equations 7, 13, and 14 need to be on the LES grid.</p><p>The filtering and coarse-graining are performed following <ref type="bibr">Sagaut (2006)</ref> and <ref type="bibr">Guan et al. (2022)</ref>. Briefly, using the velocity u(x DNS ,t) as an example, and denoting the DNS grid and wavenumber as x DNS and k DNS , we first transform the DNS velocity into the spectral space &#251;(k DNS ,t), where (.) means Fourier transformed. This is followed by applying the filter in the spectral space:</p><p>Here, &#284;(k DNS ) can be any of the transfer functions listed in Tables <ref type="table">A1</ref> and <ref type="table">A2</ref>, and &#8857; is the Hadamard (elementwise) multiplication. After the filtering operation, coarse-graining is performed to transform the filtered variable from the DNS to the LES grid. In this study, we perform coarse-graining in spectral space with cutoff, k c = &#960;/ &#916; LES , which for example, in 2D, yields</p><p>It is important to highlight that the filtering operation can be reversible under certain conditions: The DNS data can be recovered (via deconvolution) from the filtered data (still on the DNS grid) if the filter is a Reynolds operation, not a projection <ref type="bibr">(Sagaut, 2006)</ref>. This is the case for Gaussian, box, and Gaussian + box filters, whose transfer function only attenuates the signal. However, for the sharp-spectral cutoff filter (which is a projection</p><p>Table 7 The Average Correlation Coefficient (CC) Between Inter-Scale Kinetic Energy Transfer (P &#964; ) or Enstrophy Transfer (P Z ) or Potential Energy Transfer (P J ) Patterns of the SGS Fluxes From FDNS and From NGM Closure (Equations 25 and 26) for Cases R1-R3 and Different N LES Cases N LES = 128 N LES = 256 N LES = 512 CC for P &#964; (P &#964; &gt; 0, P &#964; &lt; 0) R1 0.94(0.96, 0.85) 0.99(0.99, 0.98) -R2 0.97(0.81, 0.98) 0.98(0.91, 0.98) 0.99(0.97, 1.00) R3 0.79(0.81, 0.74) 0.88(0.92, 0.81) 0.96(0.97, 0.93) CC for P Z (P Z &gt; 0, P Z &lt; 0) R1 1.00(1.00, 0.99) 1.00(1.00, 1.00) -R2 0.99(0.96, 1.00) 0.99(0.98, 1.00) 1.00(1.00, 1.00) R3 0.96(0.95, 0.96) 0.99(0.99, 0.98) 1.00(1.00, 0.99) CC for P J (P J &gt; 0, P J &lt; 0) R1 0.89(0.89, 0.15) 0.97(0.97, 0.46) -R2 0.76(0.75, 0.65) 0.91(0.91, 0.63) 0.98(0.98, 0.76) R3 0.77(0.75, 0.40) 0.87(0.86, 0.39) 0.94(0.94, 0.44)</p><p>Note. Note that for RBC, filtering is conducted in only one direction (x), therefore, P &#964; is not identically zero. Here, the forward transfer and backscatter of SGS kinetic energy and enstrophy are overall captured well, specially as N LES increases. However, the backscatter of SGS potential energy is not well captured, specially at low LES resolutions. The Gaussian filter is used in FDNS. See Appendix F for the definition of P J .</p><p>operator), and for coarse-graining, information is lost beyond the cutoff wavenumber k c and therefore data recovery is limited only up to k c . This is further discussed in Section 3.2.</p><p>Hereafter, for brevity, we use the term "filtered" (still denoted by .</p><p>-) to mean "filtered" and then "coarse-grained."</p><p>The spectrum of a filtered and coarse-grained variable can be found in Figure <ref type="figure">A1b</ref> in Appendix A.</p><p>Figure <ref type="figure">1</ref> shows the effects of filtering on the vorticity and temperature fields for 2D-FHIT and RBC, illustrating that the small-scale structures of &#969; and T are removed due to filtering and the fields are smoothed out.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5.">The Equation-Discovery Method</head><p>In this study, we employ the RVM <ref type="bibr">(Tipping, 2001)</ref> to discover closed-form closures for each element of the &#964; tensor and J vector from the FDNS data. RVM is a sparsity-promoting Bayesian (linear) regression technique that has shown promise in applications involving dynamical systems <ref type="bibr">(Mojgani et al., 2022;</ref><ref type="bibr">Zanna &amp; Bolton, 2020</ref>; S. <ref type="bibr">Zhang &amp; Lin, 2018)</ref>. RVM relies on a pre-specified library of basis functions &#934;; each column of this matrix is a basis, for example, a linear or nonlinear combination of relevant variables such as velocity and temperature and/or their derivatives. The library should be expressive enough so that s, a vectorized snapshot of a element of any &#964; or J, could be completely represented as</p><p>The vector of regression weights, c, is computed by minimizing the mean-squared error (MSE)</p><p>where vector S consists of n samples of s stacked together. RVM assumes Gaussian prior distributions for each weight, and the width of the Gaussian posterior provides a measure of the weight's uncertainty. Sparsity is enforced via an iterative process: basis functions whose weights' uncertainties exceed a pre-specified hyperparameter (threshold), &#945;, are removed (pruned), and Equation 20 is minimized again. The iterations stop when all the remaining basis functions have uncertainties smaller than &#945;. Larger &#945; results in a lower MSE but more terms in the discovered model (see below).</p><p>A critical step in using RVM (and most equation-discovery methods) is the choice of the library. Here, we have chosen the following libraries. For momentum stress, we use</p><p>where A,B = u or v (2D-FHIT) and C, D = u or w (RBC). Note that experiments with including &#952; in D yield the same results. For heat flux, we use</p><p>where A = u, w, or &#952; (RBC). These libraries are expansive, with integers 0 &#8804; q &#8804; 8 and 0 &#8804; p &#8804; 2, though the total derivative order is limited to 8th (there are a total of 546 and 614 terms in the libraries used for momentum and heat fluxes, respectively). The form of these libraries is motivated by the Galilean-invariant property of the SGS terms, and by past studies. For example, these libraries include Pope's tensors <ref type="bibr">(Pope, 1975)</ref>, which have been used in physics-based <ref type="bibr">(Anstey &amp; Zanna, 2017;</ref><ref type="bibr">Gatski &amp; Speziale, 1993;</ref><ref type="bibr">Jongen &amp; Gatski, 1998;</ref><ref type="bibr">Lund &amp; Novikov, 1993</ref>) and equation-discovery <ref type="bibr">(Li et al., 2021;</ref><ref type="bibr">Reissmann et al., 2021;</ref><ref type="bibr">Ross et al., 2023)</ref> approaches in the past (and include the structure of the Smagorinsky model; see below). Our library also includes the basis functions used by <ref type="bibr">Zanna and Bolton (2020)</ref>.</p><p>Note that all calculations for the libraries (and any computation in this work) is performed using the same spectral methods used for DNS and LES.</p><p>We have found it useful for interpretability of the outcome and improving the robustness of the algorithm to remove redundant terms using the continuity equation (e.g., using &#8706;v/&#8706;y = -&#8706;u/&#8706;x, &#8706; 2 v/&#8706;y&#8706;x = -&#8706; 2 u/&#8706;x 2 , etc.). Also, we have found it essential to normalize each basis in &#934; to have a zero mean and a unit variance, because the amplitude of higher-order derivatives can be much larger than that of the lower-order ones.</p><p>Like any method, equation discovery using RVM has a number of strengths and weaknesses:</p><p>1. It is data efficient <ref type="bibr">(Mojgani et al., 2022;</ref><ref type="bibr">Zanna &amp; Bolton, 2020)</ref>. For example, here, we report the results with n = 100 FDNS samples, but even with n = 1, the results remain practically the same. 2. It is more robust, in terms of convergence, compared to similar sparsity-promoting techniques <ref type="bibr">(Zanna &amp; Bolton, 2020;</ref><ref type="bibr">S. Zhang &amp; Lin, 2018)</ref>. 3. A pre-specified library is needed and it is assumed that the true answer (e.g., the SGS stress) can be represented as a linear combination of the chosen basis functions. 4. The pre-specified hyper-parameter &#945; determines how parsimonious the discovered model is. Decreasing &#945; leads to a smaller (likely, more interpretable) model at the expense of increasing the MSE. Here, we follow the model-selection literature <ref type="bibr">(Mangan et al., 2017;</ref><ref type="bibr">Mojgani et al., 2022)</ref> and objectively choose &#945; using the Lcurve, as shown later. 5. The answer can depend on the choice of the loss function. The RVM's MSE loss (Equation <ref type="formula">20</ref>) is strictly following the principle of structural modeling, matching the flux between the FDNS and the discovered model.</p><p>Note that the above strengths ( <ref type="formula">1</ref>) and ( <ref type="formula">2</ref>) are highly desirable while these weaknesses (3)-( <ref type="formula">5</ref>) are common among many equation-discovery methods, although techniques such as genetic programming and gene expression programming can address (3) and ( <ref type="formula">5</ref>), for example, using an evolving library. We will further discuss (3)-( <ref type="formula">5</ref>) in Section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Results</head><p>In this section, we present and discuss the discovered closures, and analyze them a priori (offline) and a posteriori (online, coupled with LES). We then uncover the connections between the discovered closure and the NGM. For all results presented here, we use n = 100 FDNS samples from a training set and 20 FDNS samples from an independent testing set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">The Discovered Closures for SGS Momentum and Heat Fluxes</head><p>For each of the six cases in Tables <ref type="table">1</ref> and <ref type="table">2</ref>, we separately discover closures for three elements of the SGS stress tensor, that is, &#964; xx , &#964; xy = &#964; yx , and &#964; yy for 2D-FHIT, and &#964; xx , &#964; xz = &#964; zx , and &#964; zz for RBC. Additionally, we discover two elements of the SGS heat flux vector, that is, J x and J z for RBC. We discover individual closures for four filter types: Gaussian, box, sharp-spectral, and Gaussian + box. The first three are common filter types, while the last one is motivated by a few recent studies <ref type="bibr">(Guillaumin &amp; Zanna, 2021;</ref><ref type="bibr">Zanna &amp; Bolton, 2020)</ref>. We also examine several filter sizes, &#916; (see Tables <ref type="table">1</ref> and <ref type="table">2</ref>), and the effect of varying &#945;, which as mentioned earlier, is a key hyperparameter in RVM.</p><p>We analyze the a priori performance of the discovered closures using the most commonly used metric: the average of CCs for testing samples <ref type="bibr">(Guan et al., 2023;</ref><ref type="bibr">Maulik et al., 2019;</ref><ref type="bibr">Sagaut, 2006)</ref>. For each element of &#964; or J, denoted below by &#964; for convenience, the CC for each testing sample is calculated between 2D patterns of &#964; from FDNS and &#964; predicted by the RVM-discovered closure for the corresponding filtered flow variables (e.g., u, v etc.):</p><p>Journal of Advances in Modeling Earth Systems</p><p>where &#9001;&#8901;&#9002; is domain averaging. The same equation is also used for computing CC values of 2D patterns of interscale energy or enstrophy transfer, P (defined later).</p><p>As a representative example of the findings, Figures <ref type="figure">2a</ref> and <ref type="figure">2b</ref> shows the averaged CC for &#964; yy (K1-K3) and J x (R1-R3) as &#945; is increased. Figures <ref type="figure">2c</ref> and <ref type="figure">2d</ref> presents the number of terms in the discovered closures. With small &#945;, the discovery is unsuccessful (CC = 0; zero terms). However, as &#945; is further increased, for all cases, CC abruptly jumps to above 0.8 -0.9 with 1 and 2 discovered terms, and then gradually converges to 1 but with exponentially growing number of terms in the discovered closure. The CC-&#945; relationship forms an "L-curve." The elbow of this curve indicates the &#945; that balances accuracy and model size, and is extensively used in the model-selection and  <ref type="table">2</ref>). The Gaussian filter is applied in both cases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Journal of Advances in Modeling Earth Systems</head><p>10.1029/2023MS003874 equation-discovery literature to objectively choose &#945; <ref type="bibr">(Calvetti et al., 2000;</ref><ref type="bibr">Goyal &amp; Benner, 2022;</ref><ref type="bibr">Lawson &amp; Hanson, 1995;</ref><ref type="bibr">Mangan et al., 2017;</ref><ref type="bibr">Mojgani et al., 2022)</ref>. Examining all cases with other filter sizes and filter types reveals the same behavior as shown in Figure <ref type="figure">2</ref>, with the exception of the sharp-spectral filter as shown in Figure <ref type="figure">D1</ref>. For this filter, the discovery is unsuccessful, leading to low CC and non-robust results; we will explain the reason for this failure later in this section. ) is used, but the same behavior is observed with any other N LES and filter type (except for the sharp-spectral, see the text). In general, for small &#945; (&lt;1), no closure is discovered (CC = 0, zero term). With increasing &#945;, the CC converges to &#8764;1 (a more accurate a priori closure) but at the expense of a larger closure with many more terms (note the logarithmic scale of the y axes in panels (c) and (d)). However, the CC-&#945; relationship forms an "L-curve," whose elbow indicates the &#945; that balances accuracy and model size (see the text).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Journal of Advances in Modeling Earth Systems</head><p>10.1029/2023MS003874</p><p>We use the L-curve to determine the optimal &#945;. In 2D-FHIT, there are two kinks in the curve around the elbow, corresponding to the discovery of closures with 1 and 2 terms, respectively (Figures <ref type="figure">2a</ref> and <ref type="figure">2c</ref>).</p><p>Given the robust and asymptotic behavior in &#945; after the second kink, we use the corresponding &#945; to identify the discovered closure (see the black circles). We find that consistently, across Cases K1-K3, filter types, and filter sizes, this closure is of the form &#964; = [ &#964; xx &#964; xy &#964; yx &#964; yy ] = &#916; 2 &#9121; &#9122; &#9122; &#9122; &#9122; &#9122; &#9122; &#9123; a xx ( &#8706;u &#8706;x ) 2 + b xx ( &#8706;u &#8706;y ) 2 a xy &#8706;u &#8706;x &#8706;v &#8706;x + b xy &#8706;u &#8706;y &#8706;v &#8706;y a xy &#8706;u &#8706;x &#8706;v &#8706;x + b xy &#8706;u &#8706;y &#8706;v &#8706;y a yy ( &#8706;v &#8706;x ) 2 + b yy ( &#8706;v &#8706;y</p><p>where a xx , b xx , a xy , b xy , a yy , and b yy are the discovered coefficients (&#916; 2 is factored out to further highlight the independence of these coefficients from the filter size). Table <ref type="table">3</ref> shows that these six coefficients are the same, and the same for Cases K1-K3, although they can depend on the filter type. This table also shows the average CC values of the discovered closure, which are around 0.99, demonstrating the accurate prediction of each element of the stress tensor and the excellent a priori (offline) performance of the discovered closure for a broad range of LES resolutions.</p><p>Following the same approach, we discover basically the same closure for &#964; in RBC Journal of Advances in Modeling Earth Systems</p><p>where, as before, d xx , d xz , and d zz are the coefficients with &#916; 2 factored out. Note that Equation 25 is the same as Equation <ref type="formula">24</ref>, except that here, there is one term rather than two in each element of the tensor, which is a result of filtering (in RBC) performed only in the horizontal, x, direction. As before, Table <ref type="table">4</ref> shows that these d coefficients are the same, and the same for Cases R1-R3, though varying with filter type. Like before, the discovered closure has fairly high CC values.</p><p>Again, following the same approach, we determine the optimal &#945; for discovering the closure of J. In Figure <ref type="figure">2b</ref>, Case R1 has a clear elbow while Cases R2-R3 have two kinks around the elbow. Examining all cases and the number of discovered terms (Figure <ref type="figure">2d</ref>), we find that the single-term closures discovered at the first kink (circled) provide consistent and robust results. The first kink is observed for all the Cases R1-R3. In contrast, the second kink is observed only in Cases R2-R3. The closure at the first kink is</p><p>where d x and d z are the discovered coefficients with &#916; 2 factored out. Table <ref type="table">5</ref> shows that these d coefficients are the same, and the same for Cases R1-R3, but varying with filter type. As before, the discovered closure has a good a priori performance.</p><p>To summarize the findings, Equations 24-26 and Tables 3-5 show that 1. Closures of the same form are robustly discovered for &#964; in two vastly different systems, 2D-FHIT and RBC.</p><p>Even the closure for J overall has the same form, consisting of the products of the first-order derivatives of the variables involved in the nonlinearity of the SGS term. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Journal of Advances in Modeling Earth Systems</head><p>10.1029/2023MS003874</p><p>2. Not just the form, but even the coefficients of the terms in the closures, are consistently the same as parameters such as Re, forcing wavenumber, Ra, or Pr are changed in Cases K1-K3 and R1-R3, leading to different dynamics. The coefficients are independent of the fluid and even the flow properties. 3. The form of the closures is independent of the filter type unless the sharp-spectral filter is used. The coefficients, once normalized by &#916; 2 , are independent of filter size, but depend on filter type. 4. The discovered closures have outstanding a priori performance, often with CC &gt; 0.95 and even as high as 0.99.</p><p>It should be noted that the CCs reported in these tables are averaged over a broad range of N LES . The values of CC are higher for larger N LES , that is, smaller &#916;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">The Nonlinear Gradient Model (NGM): Taylor-Series Expansion of the SGS Term</head><p>A closer examination of Equation 24 reveals that this closure is indeed the NGM (this includes both the form and the coefficients, within the uncertainty range). This connection was already pointed out by Zanna and Bolton (2020), although the implications and findings such as 1-4 mentioned in the previous subsection were not further discussed in their short letter.</p><p>First, let's briefly review the NGM <ref type="bibr">(Clark et al., 1979;</ref><ref type="bibr">Leonard, 1975;</ref><ref type="bibr">Sagaut, 2006)</ref>. As a simple illustration of the idea behind this model, we have reproduced the derivation of NGM in Appendix E using a 1D arbitrary field, a (x), for the reader's convenience. Taylor-series expansion of a(xr x ) around a(x) (Equation <ref type="formula">E2</ref>) simplifies the convolution integral of the filtering operation (Equation <ref type="formula">E1</ref>) such that a(x) can be written in terms of a(x) and its derivatives, with coefficients that depend only on the moments of the filter's kernel, G (Equation <ref type="formula">E4</ref>). Using u 2 and u 2 as a(x), we eventually arrive at an analytically derived closure for &#964; xx with error O(&#916; 4 ) (Equation <ref type="formula">E12</ref>). In 2D with filtering applied in both directions (like our 2D-FHIT), the NGM is <ref type="bibr">(Sagaut, 2006</ref>)</p><p>&#8706;y ) 2 &#8706;u &#8706;x &#8706;v &#8706;x + &#8706;u &#8706;y &#8706;v &#8706;y &#8706;u &#8706;x &#8706;v &#8706;x + &#8706;u &#8706;y &#8706;v &#8706;y ( &#8706;v &#8706;x ) 2 + ( &#8706;v &#8706;y ) 2</p><p>where c &#964; depends on the filter's kernel. Similarly, for the 2D RBC with filtering only in the x direction, the NGM is</p><p>As emphasized in Appendix E, there is nothing specific to momentum flux or even turbulence (or even physical systems) in the derivation of NGM. In fact, for the filtered quadratic nonlinearity of any two arbitrary variables, one arrives at the same expression with O(&#916; 4 ) accuracy. For example, following this derivation, for the SGS heat flux, we obtain</p><p>where like c &#964; and d &#964; , d J only depends on the filter's kernel. Note that we are referring to NGM as a general class of closure models, representing the leading term of the Taylor-series expansion of the SGS term (whether it is for momentum or heat or any other flux).</p><p>Computing c &#964; , d &#964; , d J for each of the filter types used in this study, we confirm that the discovered closures for the SGS stress are basically the NGM (Equations 27 and 28), and in the case of the SGS heat flux, an NGM (Equation <ref type="formula">29</ref>) closure (see Tables <ref type="table">3</ref><ref type="table">4</ref><ref type="table">5</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Journal of Advances in Modeling Earth Systems</head><p>10.1029/2023MS003874</p><p>Based on the above analyses, we can now explain the findings (1)-( <ref type="formula">4</ref>) in Section 3.1. Closures of the same structure are robustly discovered for both SGS momentum and heat fluxes in two vastly different turbulent flows (and independent of parameters such as Re, Ra, Pr, and forcing) because the first term in the Taylor-series expansion dominates the SGS flux. As a result, in equation-discovery using common loss functions such as MSE and evaluation metrics such as CC, which aim at closely matching &#964; or J, NGM or NGM-like closures are discovered (if the library is expansive enough to include all the relevant terms). We emphasize that this would be the case with discovering the representation of the filtered nonlinearity of any two arbitrary variables. As already observed, the coefficients of the discovered closure become even closer to those of NGM as &#916; is decreased (thus reducing potential contributions from the truncated O(&#916; 4 ) terms).</p><p>The connection to the analytical derivation also explains why the coefficients in the discovered models are independent of the fluid or even the flow properties (Ra, Re, Pr) and only depend on the filter size (&#916;) and filter type.</p><p>For the Gaussian and box filters we obtain c &#964; = d &#964; = d J = 1/12: this is because the parameters of the Gaussian filter are chosen such that Gaussian and box filters' kernels have the same second moment (see Equation <ref type="formula">E5</ref>) for the definition of moment) <ref type="bibr">(Pope, 2000)</ref>. For the Gaussian + box filter, the coefficients are 1/6 because the kernel of this filter is convolution of the Gaussian and box filter kernels. For the sharp-spectral filter, the moments are indefinite, this is why there is no NGM discovery with this filter (and we will discuss later why the equation discovery fails altogether). Note that coarse-graining done here via cutoff in the spectral space does not change c &#964; , d &#964; , and d J ; however, if coarse-graining is done by other techniques such as box averaging, then the coefficients might change (note that the NGM coefficient for Gaussian + box filter is twice the coefficient of either filter, as the coefficients are additive; see Tables <ref type="table">3</ref><ref type="table">4</ref><ref type="table">5</ref>).</p><p>In short, one can explain the effects of different filter kernels and coarse-graining strategies on the discovered closures following the analytically derivable NGM (see Appendix E and Sagaut ( <ref type="formula">2006</ref>)).</p><p>Note that as mentioned earlier, reconstructing SGS fluxes from the filtered solution is a solvable problem as long as the filtering operation is reversible. This is the case for filters with compact support, which include Gaussian, box, and Gaussian + box filters, but not sharp-spectral (cutoff) filters. However, once coarse-graining is applied in addition to any of these filters, the full operation is no longer reversible. Still, the part of the spectrum above the Nyquist frequency can be recovered. It should be noted that NGM is an approximation of the deconvolution of SGS fluxes <ref type="bibr">(Sagaut, 2006)</ref>. The deconvolution procedure can be done only for invertible filters, that is, nonprojective filters. Projective filters, such as sharp-spectral cutoffs, induce an irreversible loss of information, rendering the original data unrecoverable. This is another way of understanding why the NGM exists for Gaussian and box filters (and their combinations) but not for sharp-spectral cutoff filters.</p><p>An important implication of the above findings and discussions is that the discovered closure may not be unique and can depend on the filtering and coarse-graining procedure: it depends on the filter type (and up to a factor, on the filter size). This is not a problem of equation-discovery; in fact, the SGS fluxes diagnosed from FDNS are not unique and depend on the filtering and coarse-graining procedure (this is further shown in Figure <ref type="figure">4</ref> and discussed at the end of this section). This has implications not just for equation-discovery, but more broadly, for the ongoing efforts on learning SGS parameterizations for various processes from high-fidelity data using ML. See <ref type="bibr">Sun et al. (2023)</ref> for extensive discussions about this issue focused on the data-driven SGS modeling of atmospheric gravity waves.</p><p>The next key question is about the accuracy and stability of LES of the 2D-FHIT and RBC with the NGM closures, &#964; NGM and J NGM . However, before discussing the a posteriori (online) performance of NGM closures, we address one more issue, and that is about any potential influence from numerical calculations in our equation discovery.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1.">Effects of Numerical Discretization</head><p>The appearance of gradients of velocity (or temperature) in Equations 27-29 might suggest to some that the discovered equations represent the truncated terms of finite difference/volume discretization schemes (the methods used in <ref type="bibr">Zanna and Bolton (2020)</ref>). The discussions in their paper and the comprehensive analyses here should leave no ambiguity that Equations 27-29 represent the physics of the SGS fluxes, rather than any numerical error. Still, we wish to discuss a few more points here, as numerical errors from finite difference/volume discretizations or from aliasing (in spectral calculations) can certainly contaminate equation discovery.</p><p>All numerical calculations in this study are performed using Fourier and Chebyshev spectral methods. Moreover, we have repeated our calculations of the SGS fluxes and of the basis functions in the library after de-aliasing based on the 2/3 rule <ref type="bibr">(Orszag, 1971)</ref>. Furthermore, we have repeated the discovery on fluxes that are only filtered but not coarse-grained (thus they remain on the high-resolution DNS grid). The outcomes of all these experiments are Equations 27-29, demonstrating that the discovered closures do not contain any contributions from numerical errors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">A Posteriori (Online) Tests and Inter-Scale Energy/Enstrophy Transfer</head><p>For all 6 cases and all tested N LES , the LES runs with NGM closures are unstable: High-wave number features appear and the simulations eventually blow up (not shown). This is consistent with the findings of <ref type="bibr">Zanna and Bolton (2020)</ref>, who only found stable LES once the SGS momentum fluxes predicted by the discovered closure were attenuated (also see recent work by <ref type="bibr">Perezhogin et al. (2023)</ref> who showed that further filtering the output of NGM can lead to promising online results in an ocean model). More generally, this is also consistent with extensive studies in the 1990s (though mainly focused on 3D turbulence), which found that LES with the NGM closure is unstable <ref type="bibr">(Borue &amp; Orszag, 1998;</ref><ref type="bibr">S. Chen et al., 2003</ref><ref type="bibr">S. Chen et al., , 2006;;</ref><ref type="bibr">Leonard, 1997;</ref><ref type="bibr">Liu et al., 1994;</ref><ref type="bibr">Meneveau &amp; Katz, 2000;</ref><ref type="bibr">Pope, 2000;</ref><ref type="bibr">Vreman et al., 1997)</ref>. The exact reason(s) for the instabilities remain unclear but these studies found that in general, in 3D turbulence, NGM has insufficient dissipation and/or too much backscattering; see, for example, the discussions in <ref type="bibr">Leonard (1997</ref><ref type="bibr">Leonard ( , 2016) )</ref> and <ref type="bibr">Sagaut (2006)</ref>. Over the years, several methods, including positive clipping (removing backscattering), regularization, modulation, and the dynamic procedure, have been employed to overcome this instability problem and enhance the NGM's performance in a posteriori tests <ref type="bibr">(Balarac et al., 2013;</ref><ref type="bibr">Khani &amp; Dawson, 2023;</ref><ref type="bibr">Khani &amp; Port&#233;-Agel, 2017</ref><ref type="bibr">, 2022;</ref><ref type="bibr">Liu et al., 1994;</ref><ref type="bibr">Prakash et al., 2022)</ref>. Later studies focused more on eddy-viscosity closures, or on NGM combined with eddy-viscosity, the so-called mixed models <ref type="bibr">(Balarac et al., 2013;</ref><ref type="bibr">Cottet, 1996;</ref><ref type="bibr">Winckelmans et al., 1998)</ref>. Such versions of NGM have been used in some geophysical flows, for example, for atmospheric boundary layer <ref type="bibr">(Khani &amp; Port&#233;-Agel, 2017</ref><ref type="bibr">, 2022;</ref><ref type="bibr">Khani &amp; Waite, 2020;</ref><ref type="bibr">Lu &amp; Port&#233;-Agel, 2010</ref><ref type="bibr">, 2014)</ref> and oceanography <ref type="bibr">(Khani &amp; Dawson, 2023)</ref>.</p><p>In 2D turbulence with filtering done in both directions, such as our 2D-FHIT cases, the NGM has a clear major shortcoming: it cannot capture any energy transfer between the subgrid and resolved scales, despite capturing the enstrophy transfer well (S. <ref type="bibr">Chen et al., 2003</ref><ref type="bibr">Chen et al., , 2006;;</ref><ref type="bibr">Nadiga, 2008)</ref>. To further explore this issue, first note that the rate of kinetic energy transfer between the resolved and subgrid scales, P &#964; , is <ref type="bibr">(Pope, 2000</ref>)</p><p>where summation over repeated indices is implied. S and &#964; r are the 2D filtered rate of strain tensor and the anisotropic part of the SGS stress tensor (see Appendix F for details). In homogeneous 2D turbulence, globally (domain averaged), there is no net forward cascade of kinetic energy, which has been used to develop successful parameterizations that are globally energetically consistent <ref type="bibr">(Boffetta &amp; Ecke, 2012;</ref><ref type="bibr">Jansen et al., 2015)</ref>. However, locally (at a given grid point), there can be both forward transfer and backscatter of energy, which can be important to capture by the SGS closures <ref type="bibr">(Thuburn et al., 2014)</ref>. In 2D turbulence with filtering done in both directions, using &#964; NGM in the above equation shows that P NGM &#964; (x,y,t) is identically zero at every grid point (see Appendix F). This is demonstrated numerically in Table <ref type="table">6</ref>, which also shows that NGM captures both forward transfer and backscatter of SGS enstrophy fairly well (CC &gt; 0.95). Therefore, despite the high CC of &#964; NGM with &#964; FDNS , and even a fairly accurate inter-scale enstrophy transfer, NGM cannot capture any inter-scale energy transfer, indicating a major failure from a functional modeling perspective (note that in this context, "inter-scale" means between the resolved and subgrid scales). A physical/mathematical interpretation of this failure is that while NGM reproduces the structure of &#964; remarkably well, it does not at all capture the correlations between the &#964; and S tensors, for example, the angles between their principle directions <ref type="bibr">(Leonard, 2016)</ref>. This inability to represent any inter-scale energy transfer is likely the reason for the instabilities of LES with NGM closure in Cases <ref type="bibr">K1-K3 (and generally, in 2D turbulence)</ref>. But how about in RBC? In Cases R1-R3, filtering is conducted only in the horizontal direction, and as a result, P NGM &#964; is not identically zero. In fact, in these cases, the forward transfer and backscatter of both kinetic energy and enstrophy are captured fairly well by NGM, with CC often above 0.95 (Table <ref type="table">7</ref>). However, a deeper examination shows that the backscatter (anti-diffusion) of interscale SGS potential energy, measured as P J (see Appendix F), may not be captured well, specially at low N LES Journal of Advances in Modeling Earth Systems 10.1029/2023MS003874 (Table <ref type="table">7</ref>). Poor representation of backscattering can certainly lead to instabilities, as for example, shown by <ref type="bibr">Guan et al. (2022)</ref> for 2D turbulence.</p><p>To further explore other potential shortcomings of NGM, we have also examined the spectra of elements of &#964; NGM and J NGM in comparison to those from FDNS (Figure <ref type="figure">3</ref>). This analysis shows that the spectra of SGS momentum and heat fluxes are captured well across wavenumbers, even at high wavenumbers, indicating that NGM performs well in this a priori (offline) metric.</p><p>To summarize the above analyses: we find all LES with NGM closures for 2D-FHIT and RBC cases to become unstable even at high LES resolutions. Understanding the reason(s) for this poor a posteriori (online) performance is essential to make further progress. Examining a few functional and structural metrics beyond CC of SGS fluxes (e.g., inter-scale energy/enstrophy transfers, spectra) point to only one major shortcoming that is relevant to 2D-FHIT (and any 2D turbulent flow): NGM cannot capture any inter-scale kinetic energy transfer, which is likely the reason for the instabilities. This is not an issue in RBC, for which we only identify one shortcoming, and that is the poor representation of backscatter (anti-diffusion) of potential energy, specially at low LES resolution. These findings indicate that the poor a posteriori (online) performance of NGM might have different causes in different flows and requires more extensive investigations.</p><p>Before discussing ideas for addressing these challenges in future work, we present more analyses in two areas: a closer examination of the physics included in the library (Section 3.4) and the decomposition of the SGS fluxes and the sensitivity of the diagnosed fluxes to the filter type/size (Section 3.5).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">A Physics-Guided Library: Pope Tensors</head><p>In Section 3.1, we consider an expansive library of basis functions combining the low-and high-order derivatives and polynomials of velocity and temperature. Under certain assumptions, smaller but physics-informed libraries can be devised. For example, <ref type="bibr">Boussinesq (1877)</ref> hypothesized that for a nearly homogeneous, incompressible, high-Re flow, the anisotropic SGS stress &#964; r (Equation <ref type="formula">F2</ref>) is only a function of the filtered rates of strain S (Equation F1) and rotation &#937; (Equation <ref type="formula">32</ref>) tensors:</p><p>Note that in Equations 7 and 13 and in general, only &#8711; &#8901; &#964; r has to be parameterized as the rest of &#8711; &#8901; &#964; can be absorbed into &#8711;p <ref type="bibr">(Sagaut, 2006)</ref>. Owing to Cayley-Hamilton theorem <ref type="bibr">(Gantmakher, 2000)</ref>, &#964; r can be represented as a linear combination of a finite number of tensors, the so-called Pope tensors <ref type="bibr">(Pope, 1975)</ref>. In 2D, there are only three Pope tensors Z, thus</p><p>The three Pope's tensors are Z (0) = I, Z (1) = S, and</p><p>&#8706;x ) 2 2( &#8706;u &#8706;x &#8706;v &#8706;x + &#8706;u &#8706;y &#8706;v &#8706;y ) 2( &#8706;u &#8706;x &#8706;v &#8706;x + &#8706;u &#8706;y &#8706;v &#8706;y ) -( &#8706;u &#8706;y ) 2</p><p>Journal of Advances in Modeling Earth Systems 10.1029/2023MS003874</p><p>which is related to the anisotropic part of the NGM stress. In fact, &#964; NGM-r = -&#916; 2 Z (2) &#8260;12 (see Equation <ref type="formula">F3</ref>). Note that this is also the physics-based closure derived in <ref type="bibr">Anstey and Zanna (2017)</ref>. Coefficients &#950; (n) are functions of invariants I 1 = tr(S 2 ) and I 2 = tr(&#937; 2 ). The standard Smagorinsky model is &#950; (1) (I 1 )Z 1 .</p><p>Our expansive library, described in Equations 21 and 22, includes the individual terms to discover Z (n) (n = 0, 1, 2); however, we have always found the NGM stress, &#964; NGM . To see whether the results will change with a discovery only done on the anisotropic SGS stress, &#964; r , and with a smaller library that only has the terms relevant to the Pope tensors, we have conducted more experiments with three libraries for Cases K1-K3. The first library only includes the three Pope tensors {Z (0) , Z (1) , Z (2) }, the second library only includes the six non-zero elements of these tensors, and the third library only includes the eight terms that compromise these six:</p><p>( &#8706;v &#8706;x ) 2 , &#8706;u &#8706;x &#8706;v &#8706;x , &#8706;u &#8706;y &#8706;v &#8706;y }.</p><p>The RVM with any of these libraries robustly discovers &#964; NGM-r = -&#916; 2 Z (2) &#8260;12, without Z (1) (or Z (0) ) showing up (thus, no Smagorinsky/eddy viscosity-like term). Needless to say, LES with this closure is unstable.</p><p>The above analyses show the prevalence of NGM: it emerges whether the full or just the anisotropic part of the SGS stress tensor is discovered, and whether an expansive or a small physics-guided library is used.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Decomposition of SGS Fluxes: Leonard, Cross, and Reynolds Stresses</head><p>As discussed earlier, whether a closure could be successfully discovered from FDNS data or if the NGM could be derived depend on the choice of the filter. The latter was explained based on the dependence of the derived closure on the moments' of the filter kernel. Furthermore, the coefficients of the discovered closure and the analytically derived coefficients of the NGM depend on the choice of the filters (Tables <ref type="table">3</ref><ref type="table">4</ref><ref type="table">5</ref>). Here, we further demonstrate the sensitivity of the diagnosed FDNS SGS flux (which is treated as truth in offline/supervised learning data-driven modeling approaches) to the choice of the filter, and then decompose the flux into its three components to gain further insight.</p><p>The top row of Figure <ref type="figure">4</ref> shows examples of SGS &#964; in 2D-FHIT diagnosed from the FDNS data using different filter types. It is clear that the diagnosed fluxes are not unique and particularly different between Gaussian/box filters and sharp-spectral filter (similar differences can be seen in SGS momentum and heat fluxes in RBC). This sensitivity, which has important implications for data-driven SGS modeling efforts <ref type="bibr">(Sun et al., 2023)</ref>, has been known for a long time in the LES community <ref type="bibr">(Leonard, 1975;</ref><ref type="bibr">Sagaut, 2006)</ref>. The Gaussian and box filters extract fairly similar features, even of almost the same amplitude (which is due to their matched kernels' second moments). The Gaussian + box filter captures similar features but with a factor of &#8764;2 difference in amplitude (related to the factor of two difference in NGM coefficients). However, the sharp-spectral filter extracts very different features that have much smaller length scales and amplitudes. We speculate that the failure of the discovery with the cutoff filter is a result of the inability of the current library in representing these features (at least in sparsely representing these features, if it exists). We also point out that in <ref type="bibr">Guan et al. (2022</ref><ref type="bibr">Guan et al. ( , 2023))</ref>, deep convolutional neural networks (CNNs) could not be successfully trained on FDNS data obtained using sharp-spectral cutoff filters, while high CC and stable/accurate LES runs in different systems were achieved using CNNs trained on FDNS data obtained through the Gaussian filter. Note that <ref type="bibr">Ross et al. (2023)</ref> successfully trained CNNs (and performed equation-discovery) using a "smoothed" sharp-spectral filter that had exponential decay at high wavenumbers (rather than a cutoff). These findings further show the importance of how the "true" SGS fluxes are diagnosed for offline/supervised learning.</p><p>To see the reason for this difference, we decompose the SGS tensor using u = u + u&#8242;. <ref type="bibr">Leonard (1975)</ref> introduced a decomposition of the SGS tensor into three components. However, since two of these components were not Galilean-invariant <ref type="bibr">(Speziale, 1985)</ref>, a Galilean-invariant decomposition was later proposed by <ref type="bibr">Germano (1986)</ref>: </p><p>&#964; and J of RBC can be decomposed in the same fashion. The most familiar component, the Reynolds stress, represents interactions in the unresolved scales that project onto the resolved scale. The cross stress represents the direct interactions between the unresolved and resolved scales that project onto the resolved scale. The Leonard stress includes interactions between the resolved scales not captured by the low-resolution LES grid. See <ref type="bibr">Leonard (1975</ref><ref type="bibr">), McDonough (2007)</ref>, and <ref type="bibr">Perezhogin and Glazunov (2023)</ref> for more discussions.</p><p>The relative importance of these three components in &#964; and J depends on the filter type and size (and even the flow characteristics). Rows 2-4 of Figure <ref type="figure">4</ref> show examples of the Leonard, cross, and Reynolds stress components of &#964; xy . For Gaussian, box, and Gaussian + box filters, the Leonard stress dominates, followed by cross and then Reynolds stress. However, for sharp-spectral, only the Reynolds stress has coherent structures that more or less resemble the Reynolds stress from Gaussian/box filters. The strong dependence on filter type comes from the fact that for Gaussian box filters, u&#8242; &#8800; 0 and u &#8800; u, leading to non-zero Leonard and cross stresses.</p><p>However, for the sharp-spectral filters, u&#8242; = 0 and u = u. As a result, with de-aliasing applied, the (coarsegrained) Leonard stress may or may not be zero, depending on details of the calculations as discussed next (the cross stress is non-zero). This is because of the nuances in how the cutoff wavenumber for the sharp-spectral kernel is defined for both filtering and coarse-graining; for example, we find the coarse-grained Leonard stress to be non-zero in Figure <ref type="figure">4</ref>. Here, following the common approach in the literature <ref type="bibr">(Pope, 2000)</ref>, for filtering (Table <ref type="table">A2</ref>), we compare the total wavenumber</p><p>) with the cutoff wavenumber (k c ). However, for coarse-graining, we have to compare k x and k y with k c (Equation <ref type="formula">18</ref>). This subtle difference results in the Leonard stress not being zero here. Note that our conclusion about the failure of discovering a robust closure model with the sharp-spectral filter (at least with the current library and algorithm) is not affected by these choices. We have found that we still cannot discover a closure when we repeat the equation-discovery process on the total stress or the Reynolds stress calculated with filtering that also uses Equation <ref type="formula">18</ref>.</p><p>As for the dependency on filter size, as &#916; increases, the relative importance of Reynolds stress increases: See Figure <ref type="figure">5</ref> for examples from Cases K3 (&#964; xx ) and R3 (J z ). Finally, note that the relative importance of these three components might depend on the flow itself. For example, in 3 km-resolution regional simulations of the tropics, <ref type="bibr">Sun et al. (2023)</ref> found that the vertical (horizontal) flux of the SGS zonal momentum is dominated by the Reynolds (Leonard) stress, which was attributed to the substantial differences of the filtered vertical wind and the filtered zonal or meridional winds.</p><p>The above analyses further explain the strong dependency of the diagnosed "true" SGS flux and the discovered closures on the filter type and size. These analyses also show that depending on the filter type/size, the Reynolds stress may not be the only component of the SGS flux that needs to be parameterized. In fact, the Leonard and cross stresses might be even larger and have to be included in the calculation of the total SGS flux and in the closure. Thus, there is no unique way to quantify the SGS fluxes, and these sensitivities have major implications for the "true" SGS flux that is fed into the RVM or any equation-discovery algorithm (and more broadly, any ML algorithm). Needless to say, these sensitivities have major impacts on the choices to make for developing datadriven closures in real-world applications (see item (f) in the next section).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Journal of Advances in Modeling Earth Systems</head><p>10.1029/2023MS003874</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Summary and Discussion</head><p>In this work, we have used relevance vector machine (RVM) to discover subgrid-scale (SGS) closures from filtered direct numerical simulation (DNS) data for both the SGS momentum flux tensor (in 2D forced homogeneous isotropic turbulence, 2D-FHIT, and Rayleigh-B&#233;nard convection, RBC) and the SGS heat flux vector (in RBC). The expansive library includes derivatives of velocity (and temperature) up to 8th order (calculated using spectral methods) and their quadratic combinations. We have conducted extensive robustness analysis of the discovered closures across a variety of flow configurations (changing Re, Ra, Pr, and the forcing wavenumber), filter types (Gaussian, box, Gaussian + box, and sharp-spectral cutoff), and filter sizes.</p><p>Based on these analyses, except for when the sharp-spectral filter is used (see below), we have robustly discovered the same closure for the SGS stress in 2D-FHIT and RBC. We have further shown that this closure model is in fact the NGM, which can be derived analytically from the first term of the Taylor-series expansion of the convolution integral. The discovered SGS heat flux in RBC is also consistent with the truncated Taylor-series expansion. We have demonstrated a few important points about these discovered closures:</p><p>1. They all have high CC (often &gt;0.9 -0.95) with the true SGS terms obtained from filtered DNS data, that is, excellent performance based on this commonly used a priori test metric. The same closure is discovered regardless of the system because the expansion's first term dominates the MSE loss function of RVM. 2. Despite this high CC, all a posteriori (online) tests result in unstable LES. This is consistent with the past findings about the NGM in the LES community (mainly for 3D turbulence) and in the climate community (for geophysical turbulence). Here, we argue that the inability of NGM to capture any inter-scale kinetic energy transfer in 2D-FHIT (or any 2D flow filtered in both directions) is likely the reason for the instability. For RBC, where filtering is done only in one direction, deeper investigations into the spectra of the SGS fluxes and inter-scale enstrophy and potential energy transfer, pointed to another likely reason for the instability: poor representation of the backscatter of SGS potential energy. This suggests that the poor a posteriori (online) performance of NGM might have different reasons in different flows. 3. The exact form of the discovered closure depends on the filter type and the filter size, &#916;. For filters with compact support (i.e., all filters used here except for sharp-spectral cutoff), the structure of the closures is the same, but the coefficients are different (still, consistent with the Taylor-series expansion, as shown in the appendices). For the sharp-spectral cutoff filter, the equation discovery fails, again, consistent with the fact that the Taylor-series expansion cannot be conducted, a known issue in the LES literature <ref type="bibr">(Sagaut, 2006)</ref>. Finally, we would like to emphasize that what is reported here, for example, based on low CC, is the "failure" of the "equation discovery process," whose goal was to minimize the MSE loss to match the stress (i.e., structural modeling). As discussed earlier in the paper as well as further below, functional closures with low CC (such as <ref type="bibr">Smagorinsky (1963)</ref>) can produce stable and relatively reasonable LES, thus low CC does not necessarily mean a failed closure. That said, we could not discover any parsimonious closure with the sharp-spectral filtered data that was consistent across several cases and resolutions, and thus, we do not perform any a posteriori LES with these low-CC closures. Note that as mentioned before, with a "smoothed" sharp-spectral filter, <ref type="bibr">Ross et al. (2023)</ref> successfully performed equation-discovery.</p><p>As a side note, while the terms of the discovered closures might look like truncation error of finite difference/ volume discretization, in our work, all calculations (DNS solver, SGS terms, library) are done using spectral methods. This further shows, along with the Taylor-series expansion results, that the discovered closures are indeed representing the physics of the SGS terms, rather than any numerical error.</p><p>As an additional piece of analysis, we also present the decomposition of the SGS terms to the Leonard, cross, and Reynolds terms. We show that the Leonard and then cross terms often dominate the total SGS term, though the relative amplitude of these terms decreases as the filter size increases. However, this analysis shows that only computing the Reynolds momentum stress or heat flux can lead to discovering an inaccurate closure (and in general, in data-driven SGS modeling, in too-small SGS fluxes). That said, the relative importance of these three terms depends on the filter type and size, and likely, on the flow's spatial spectrum <ref type="bibr">(Sun et al., 2023)</ref>.</p><p>The analyses presented in this paper are aimed at highlighting the promises and challenges of the equationdiscovery approach to SGS modeling. On one hand, it is promising that this approach robustly discovers closures that could be closely connected with those mathematically derived, and could be easily interpreted and analyzed in terms of turbulence physics. On the other hand,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Journal of Advances in Modeling Earth Systems</head><p>10.1029/2023MS003874</p><p>a) The commonly used MSE loss function, or similar loss functions, will be always dominated by the leading term(s) of the Taylor-series expansion. Thus, sparsity-promoting equation-discovery techniques, at least with the common derivative/polynomial-based libraries, will always find the NGM (if all the relevant terms exists in the library). Note that this is true for the closure of any SGS process, as the Taylor-series expansion applies to any compact filter. Thus, this point and many of the main points of this paper are relevant beyond just SGS modeling for turbulence, but also SGS modeling of other nonlinear, multi-scale processes in the Earth system. b) Given that our diagnoses show shortcomings of the NGM with functional modeling metrics (e.g., inter-scale energy transfer), one idea is to include such physics constraints in the loss function. For example, <ref type="bibr">Guan et al. (2023)</ref> demonstrated that a loss function that combines structural and functional modeling constraints can enhance the a priori and a posteriori performance of the data-driven closure model in the small-data regime. More functional-modeling physics constraints (as domain averaged or wavenumber-dependent quantities) can be included in the loss function, which can potentially close the gap between a priori (offline) and a posteriori (online) performance. While the loss function of some techniques such as RVM may not be flexible to change beyond MSE, other methods such as genetic programming, gene expression programming or symbolic regression provide such flexibility <ref type="bibr">(Cranmer, 2023;</ref><ref type="bibr">Ross et al., 2023)</ref>. Also, equation-discovery using neural network-based algorithms has gained popularity recently, as for example, their loss functions can be very flexible given the use of backpropagation for training (Z. <ref type="bibr">Chen et al., 2021)</ref>. That said, "spectral bias" <ref type="bibr">(Xu et al., 2019)</ref>, the fundamental challenge of neural networks in learning high wavenumbers, can become an issue when equation-discovery is the goal; see <ref type="bibr">Mojgani et al. (2023)</ref> for an example and a solution in a quasigeostrophic turbulence test case. c) The fault may not entirely (or at all) lie with the MSE loss function. <ref type="bibr">Guan et al. (2022)</ref> showed that a deep CNN with basically the same MSE loss function as the one used here (which only accounts for structural modeling) can learn a closure for 2D turbulence that has CC &gt; 0.95 and leads to stable and accurate LES (and accurate inter-scale transfers; see <ref type="bibr">Guan et al. (2023)</ref>). But a major difference between the CNN and RVM is that the former does not use a pre-defined set of basis functions, but rather, learns them. Recent work by <ref type="bibr">Subel et al. (2023)</ref> has shown that the CNN of <ref type="bibr">Guan et al. (2022)</ref> learned a set of low-pass, high-pass, and band-pass Gabor filters. As another major difference, the CNN's sparsity is not user-defined, but rather, comes from overparameterization. d) Related to (c), the discovered closures can depend on the choice of the library. This issue can be addressed by trying more expansive libraries (though this can lead to non-robust discoveries) or as mentioned earlier, by using methods such as genetic programming or gene expression programming, which allow the library to evolve (see <ref type="bibr">Schmidt and Lipson (2009)</ref>, <ref type="bibr">Udrescu and</ref><ref type="bibr">Tegmark (2020), and</ref><ref type="bibr">Ross et al. (2023)</ref>). Libraries inspired by the CNNs' basis functions or distilled from other deep neural networks could be explored as well <ref type="bibr">(Cranmer et al., 2020;</ref><ref type="bibr">Subel et al., 2023)</ref>. Furthermore, there are studies, for example, based on the Mori-Zwanzig formalism, suggesting that memory has to be included in closures <ref type="bibr">(Parish &amp; Duraisamy, 2017;</ref><ref type="bibr">Wouters &amp; Lucarini, 2013)</ref>. Hence, basis functions that include temporal information (as already used in <ref type="bibr">Ross et al. (2023)</ref>) should be further explored in future work. e) Choosing the hyper-parameter(s) that determine the level of sparsity might require more thought too. While the L-curve criterion has shown success in many problems, the metrics for which the curve is calculated for should be further investigated. The common a priori metrics such as CC of SGS fluxes are completely incapable of identifying shortcomings from a functional modeling perspective, such as lack of inter-scale energy transfer or poor representation of backscattering, which can be diagnosed using additional metrics. Note that a high CC of SGS fluxes has been found as the necessary but not sufficient condition for a successful closure <ref type="bibr">(Meneveau, 1994)</ref>. f) Aside from all of the above issues related to the discovery algorithm, what needs to be discovered (the "truth") should be further examined. The discovered closures can depend on the filter type/size and the methodology (e.g., calculating Reynolds stress or the full SGS stress), because what is diagnosed as the "true" SGS flux from DNS has such dependencies. This has important implications for any data-driven SGS modeling approach, including those using deep neural networks or any other statistical learning method <ref type="bibr">(Fatkullin &amp; Vanden-Eijnden, 2004;</ref><ref type="bibr">Grooms et al., 2021;</ref><ref type="bibr">Guan et al., 2022;</ref><ref type="bibr">Sun et al., 2023;</ref><ref type="bibr">Zanna &amp; Bolton, 2021)</ref>. In fact, this non-uniqueness and uncertainty of the true SGS term is a major shortcoming of the supervised/offline learning approach to data-driven closure modeling in real-world applications, and is one of the main motivations to pursue online or at least offline-online learning approaches <ref type="bibr">(Pahlavan et al., 2024;</ref><ref type="bibr">Schneider, Stuart, &amp; Wu, 2021;</ref><ref type="bibr">Schneider et al., 2023)</ref>.</p><p>&#8706;&#969; &#8706;t + N(&#969;, &#968;) = 1 Re &#8711; 2 &#969;&#967;&#969;f , (B3) where N(&#969;, &#968;) is N(&#969;, &#968;) = &#8706;&#968; &#8706;y &#8706;&#969; &#8706;x -&#8706;&#968; &#8706;x &#8706;&#969; &#8706;y . (B4)</p><p>f is a deterministic forcing <ref type="bibr">(Chandler &amp; Kerswell, 2013;</ref><ref type="bibr">Kochkov et al., 2021)</ref>:</p><p>where f k x and f k y are the forcing wavenumbers and &#967; = 0.1 represents the Rayleigh drag coefficient. Comparing Equation <ref type="formula">5</ref>with Equation B3, it is evident that &#8711; &#215; R =&#967;&#969; and &#8711; &#215; F =f .</p><p>In DNS, Equations B2 and B3 are solved in a doubly periodic domain using a Fourier-Fourier pseudo-spectral solver with second-order Adams-Bashforth and Crank Nicholson time-integration schemes for the advection and viscous terms, respectively (time step &#916;t DNS ). For LES, we use the same solver with lower spatio-temporal resolution: We use N LES that is 8-64 times smaller than N DNS , and &#916;t LES = 10&#916;t DNS .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Appendix C: The RBC Numerical Solver</head><p>We solve Equations 9-11 using a Fourier-Chebyshev pseudo-spectral solver <ref type="bibr">(Khodkar &amp; Hassanzadeh, 2018;</ref><ref type="bibr">Khodkar et al., 2019)</ref>. Briefly, using the &#969;&#968; formulation, the dimensionless governing equations become For DNS, we solve Equations C1-C3 in domain (6&#960;,1) . Periodic boundary conditions are imposed in the horizontal direction and no-slip and fixed temperature boundary conditions are imposed on the horizontal walls. Second-order Adams-Bashforth and Crank Nicholson time integration schemes are used for the advection and viscous terms, respectively. Table <ref type="table">2</ref> presents the N DNS and N LES for each case. For LES, we use the same solver but with lower resolution in the horizontal direction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Appendix E: Taylor-Series Expansion of the SGS Flux for a 1D Field</head><p>Let's focus on a spatially 1D field a(x) (dependence on t is not explicitly written for brevity). The filtering operation's convolution integral (Equation <ref type="formula">1</ref>) becomes</p><p>G(r x ) a(xr x ) dr x , (E1)</p><p>The Taylor-series expansion of a(xr x ) around a(x) gives a(xr x ) = a(x) -</p><p>Substituting this into Equation E1 and using a = a(x), a = a(x) for brevity yields</p><p>The second line follows the first line considering that a and its derivatives do not depend on the variable of integration, r x . In Equation E4, a depends on a and its derivatives, with coefficients that only depend on the filter type and size through moments of the kernel, G. For example, for a Gaussian filter (Table <ref type="table">A1</ref>)</p><p>Note that all the odd moments are 0, resulting in O(&#916; 4 ) as the order of the truncated terms once the moments in Equation E5 are substituted in Equation E4: Journal of Advances in Modeling Earth Systems 10.1029/2023MS003874 a = a + 1 2! &#916; 2 12 &#8706; 2 a &#8706;x 2 + O(&#916; 4 ). (E6) To calculate a term like &#964; xx = u 2u 2 , we first use a = u in Equation E6 and then square it to arrive at u 2 = u 2 + 2u( 1 2! &#916; 2 12 &#8706; 2 u &#8706;x 2 ) + O(&#916; 4 ). (E7) Next, we use a = u 2 in Equation E6 to obtain u 2 = u 2 + 1 2! &#916; 2 12 &#8706; 2 u 2 &#8706;x 2 + O(&#916; 4 ), (E8) = u 2 + 2 2! &#916; 2 12 (( &#8706;u &#8706;x ) 2 + u &#8706; 2 u &#8706;x 2 ) + O(&#916; 4 ). (E9) Using Equations E7 and E9 we find &#964; xx = u 2u 2 = &#916; 2 12 ( &#8706;u &#8706;x ) 2 + O(&#916; 4 ). (E10) Note that this expression depends on u rather than u, which is what we desire. Next, we use a = &#8706;u/&#8706;x in Equation E6 to obtain &#8706;u &#8706;x = &#8706;u &#8706;x + 1 2! &#916; 2 12 &#8706; 3 u &#8706;x 3 + O(&#916; 4 ). (E11) Using this expression in Equation E10 yields an analytically derived closure for &#964; xx with error O(&#916; 4 ) &#964; NGM xx = u 2u 2 = &#916; 2 12 ( &#8706;u &#8706;x ) 2 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>(E12)</head><p>This is the NGM <ref type="bibr">(Leonard, 1975;</ref><ref type="bibr">Sagaut, 2006)</ref>. Four issues should be emphasized here i. This procedure can be followed for any filter type. However, the Taylor series does not exist for filters such as sharp-spectral (cutoff), whose kernel's second-order moment is indefinite. Thus, for such filters, NGM does not exist <ref type="bibr">(Meneveau &amp; Katz, 2000;</ref><ref type="bibr">Sagaut, 2006)</ref>. ii. The same procedure can be followed to derive NGM for higher dimensions, for example, &#964; NGM xx , &#964; NGM xy , and &#964; NGM yy in 2D; see <ref type="bibr">Sagaut (2006)</ref>. iii. The coefficients in NGM depend on the filter's kernel and its moments (Equation E5). For Gaussian and tophat, the parameters of the kernels are chosen to match their first moment, leading to &#916; 2 /12 coefficient for both. However, the coefficients differ for higher-order terms <ref type="bibr">(Sagaut, 2006)</ref>. iv. The procedure presented above is not specific to turbulence or even dynamical systems. The procedure and its outcome are valid for the filtered quadratic nonlinearity of any two variables, even random variables.</p><p>S = &#9121; &#9122; &#9122; &#9122; &#9122; &#9122; &#9123; &#8706;u &#8706;x 1 2 ( &#8706;u &#8706;y + &#8706;v &#8706;x ) 1 2 ( &#8706;u &#8706;y + &#8706;v &#8706;x ) &#8706;v &#8706;y &#9124; &#9125; &#9125; &#9125; &#9125; &#9125; &#9126; , (F1) &#964; r = &#964; -1 2 tr(&#964;)I, (F2) where I is the identity matrix. In 2D with filtering in both directions, the anisotropic part of the SGS stress tensor from the NGM is &#964; NGM-r = &#964; NGM -1 2 tr(&#964; NGM ) I. (F3) &#964; NGM-r 2D = &#916; 2 12 &#9121; &#9122; &#9122; &#9122; &#9122; &#9122; &#9122; &#9122; &#9123; 1 2 (( &#8706;u &#8706;y ) 2 -( &#8706;v &#8706;x ) 2 ) &#8706;u &#8706;x &#8706;v &#8706;x + &#8706;u &#8706;y &#8706;v &#8706;y &#8706;u &#8706;x &#8706;v &#8706;x + &#8706;u &#8706;y &#8706;v &#8706;y -1 2 (( &#8706;u &#8706;y ) 2 -( &#8706;v &#8706;x )</p><p>2</p><p>Inserting this and S (Equation <ref type="formula">F1</ref>) into Equation <ref type="formula">30</ref>shows zero point-wise inter-scale (kinetic) energy transfer in NGM: P NGM &#964; (x, y, t) = 0.</p><p>In buoyancy-driven turbulence such as RBC, the total inter-scale energy transfer rate P E is the sum of the rate of transfer of kinetic energy (P &#964; ) due to SGS momentum fluxes and potential energy (P J ) due to SGS heat fluxes <ref type="bibr">(Eidson, 1985;</ref><ref type="bibr">Peng &amp; Davidson, 2002)</ref>: </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>19422466, 2024, 7, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023MS003874 by University Of Chicago Library, Wiley Online Library on [03/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License</p></note>
		</body>
		</text>
</TEI>
