Abstract There are different strategies for training neural networks (NNs) as subgrid‐scale parameterizations. Here, we use a 1D model of the quasi‐biennial oscillation (QBO) and gravity wave (GW) parameterizations as testbeds. A 12‐layer convolutional NN that predicts GW forcings for given wind profiles, when trained offline in abig‐dataregime (100‐year), produces realistic QBOs once coupled to the 1D model. In contrast, offline training of this NN in asmall‐dataregime (18‐month) yields unrealistic QBOs. However, online re‐training of just two layers of this NN using ensemble Kalman inversion and only time‐averaged QBO statistics leads to parameterizations that yield realistic QBOs. Fourier analysis of these three NNs' kernels suggests why/how re‐training works and reveals that these NNs primarily learn low‐pass, high‐pass, and a combination of band‐pass filters, potentially related to the local and non‐local dynamics in GW propagation and dissipation. These findings/strategies generally apply to data‐driven parameterizations of other climate processes.
more »
« less
Explaining the physics of transfer learning in data-driven turbulence modeling
Abstract Transfer learning (TL), which enables neural networks (NNs) to generalize out-of-distribution via targeted re-training, is becoming a powerful tool in scientific machine learning (ML) applications such as weather/climate prediction and turbulence modeling. Effective TL requires knowing (1) how to re-train NNs? and (2) what physics are learned during TL? Here, we present novel analyses and a framework addressing (1)–(2) for a broad range of multi-scale, nonlinear, dynamical systems. Our approach combines spectral (e.g. Fourier) analyses of such systems with spectral analyses of convolutional NNs, revealing physical connections between the systems and what the NN learns (a combination of low-, high-, band-pass filters and Gabor filters). Integrating these analyses, we introduce a general framework that identifies the best re-training procedure for a given problem based on physics and NN theory. As test case, we explain the physics of TL in subgrid-scale modeling of several setups of 2D turbulence. Furthermore, these analyses show that in these cases, the shallowest convolution layers are the best to re-train, which is consistent with our physics-guided framework but is against the common wisdom guiding TL in the ML literature. Our work provides a new avenue for optimal and explainable TL, and a step toward fully explainable NNs, for wide-ranging applications in science and engineering, such as climate change modeling.
more »
« less
- Award ID(s):
- 2005123
- PAR ID:
- 10472803
- Editor(s):
- Yortsos, Yannis
- Publisher / Repository:
- Oxford University
- Date Published:
- Journal Name:
- PNAS Nexus
- Volume:
- 2
- Issue:
- 3
- ISSN:
- 2752-6542
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Optical network failure management (ONFM) is a promising application of machine learning (ML) to optical networking. Typical ML-based ONFM approaches exploit historical monitored data, retrieved in a specific domain (e.g., a link or a network), to train supervised ML models and learn failure characteristics (a signature) that will be helpful upon future failure occurrence in that domain. Unfortunately, in operational networks, data availability often constitutes a practical limitation to the deployment of ML-based ONFM solutions, due to scarce availability of labeled data comprehensively modeling all possible failure types. One could purposely inject failures to collect training data, but this is time consuming and not desirable by operators. A possible solution is transfer learning (TL), i.e., training ML models on a source domain (SD), e.g., a laboratory testbed, and then deploying trained models on a target domain (TD), e.g., an operator network, possibly fine-tuning the learned models by re-training with few TD data. Moreover, in those cases when TL re-training is not successful (e.g., due to the intrinsic difference in SD and TD), another solution is domain adaptation, which consists of combining unlabeled SD and TD data before model training. We investigate domain adaptation and TL for failure detection and failure-cause identification across different lightpaths leveraging real optical SNR data. We find that for the considered scenarios, up to 20% points of accuracy increase can be obtained with domain adaptation for failure detection, while for failure-cause identification, only combining domain adaptation with model re-training provides significant benefit, reaching 4%–5% points of accuracy increase in the considered cases.more » « less
-
Abstract Turbulence quantities in geophysical fluids roughly follow a lognormal distribution. Consequently, extreme values dominate arithmetic means, and it is challenging for regression algorithms to accurately predict point-to-point and mean values. We train neural networks (NNs) to predict logarithmic values of turbulence diffusivityKTfrom multiyear observational records of turbulence in the deep-cycle layer (DCL) of the upper ocean at Earth’s equator, with the objective of accurately predicting long-term statistics ofKT. Depending on prescribed input variables, temporal averaging, and the number of internal parametersNnn, NNs can predict instantaneous values of log10KTwith correlation coefficientsRas high as 0.6–0.65, root-mean-square error less than an order of magnitude, and long-term averages ofKTwithin a factor of 2 between predictions and observations. Prescribing smallNnncompared to the training dataset size results in poor representation of the distribution’s tails. Conversely, prescribing largeNnncauses overfitting degrading the instantaneous predictability. NNs reproduce the observed spread inKTof multiple orders of magnitude at given gradient Richardson number Ri, unlike commonly used physics-based parameterizations which are single-valued functions of Ri. Predictions of the log10of vertical turbulent heat fluxJqare qualitatively similar to those of log10KTbut with poorer correlation because of differences between the observed distributions. Tests for spatial generalizability show that when training on two of three equatorial locations, each having DCLs, with multiyear records (140°, 23°, and 10°W), predictions at the third location are less accurate than when training from the same site. Significance StatementBy enhancing thermodynamic mixing, ocean turbulence transports heat from the surface to the ocean interior. Directly quantifying this transport requires careful, small-scale, long-term observations. Since such observations are rare in both space and time, it is necessary to infer turbulence parameters from conventional measurements like temperature or current velocity. Machine learning predictive algorithms, trained using existing long-term observations, show promise as a means to achieve this goal. A challenge to overcome is how to characterize the lognormal-like distribution of turbulence levels, in which values vary over orders of magnitude and a few extreme values dominate the arithmetic mean. Reproducing this distribution with neural networks is not trivial, and identifying how to do so is a focus of this paper.more » « less
-
Metal–organic frameworks (MOFs) are promising materials with various applications, and machine learning (ML) techniques can enable their design and understanding of structure–property relationships. In this paper, we use machine learning (ML) to cluster the MOFs using two different approaches. For the first set of clusters, we decompose the data using the textural properties and cluster the resulting components. We separately cluster the MOF space with respect to their topology. The feature data from each of the clusters were then fed into separate neural networks (NNs) for direct learning on an adsorption task (methane or hydrogen). The resulting NNs were then used in transfer learning (TL) where only the last NN layer was retrained. The results show significant differences in TL performance based on which cluster is chosen for direct learning. We find TL performance depends on the Euclidean distance in the decomposed feature space between the clusters involved in the direct and TL. Similar results were found when TL was performed simultaneously across both types of clusters and adsorption tasks. We note that methane adsorption was a better source task than hydrogen adsorption. Overall, the approach was able to identify MOFs with the most transferable information, leading to valuable insights and a more comprehensive understanding of the MOF landscape. This highlights the method's potential to generate a deeper understanding of complex systems and provides an opportunity for its application in alternative datasets.more » « less
-
Abstract Neural networks (NNs) are increasingly used for data‐driven subgrid‐scale parameterizations in weather and climate models. While NNs are powerful tools for learning complex non‐linear relationships from data, there are several challenges in using them for parameterizations. Three of these challenges are (a) data imbalance related to learning rare, often large‐amplitude, samples; (b) uncertainty quantification (UQ) of the predictions to provide an accuracy indicator; and (c) generalization to other climates, for example, those with different radiative forcings. Here, we examine the performance of methods for addressing these challenges using NN‐based emulators of the Whole Atmosphere Community Climate Model (WACCM) physics‐based gravity wave (GW) parameterizations as a test case. WACCM has complex, state‐of‐the‐art parameterizations for orography‐, convection‐, and front‐driven GWs. Convection‐ and orography‐driven GWs have significant data imbalance due to the absence of convection or orography in most grid points. We address data imbalance using resampling and/or weighted loss functions, enabling the successful emulation of parameterizations for all three sources. We demonstrate that three UQ methods (Bayesian NNs, variational auto‐encoders, and dropouts) provide ensemble spreads that correspond to accuracy during testing, offering criteria for identifying when an NN gives inaccurate predictions. Finally, we show that the accuracy of these NNs decreases for a warmer climate (4 × CO2). However, their performance is significantly improved by applying transfer learning, for example, re‐training only one layer using ∼1% new data from the warmer climate. The findings of this study offer insights for developing reliable and generalizable data‐driven parameterizations for various processes, including (but not limited to) GWs.more » « less
An official website of the United States government

