skip to main content


This content will become publicly available on December 10, 2024

Title: ClimSim: A large multi-scale dataset for hybrid physics-ML climate emulation
Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise predictions of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep Moore's Law by outsourcing compute-hungry, short, high-resolution simulations to ML emulators. However, this hybrid ML-physics simulation approach requires domain-specific treatment and has been inaccessible to ML experts because of lack of training data and relevant, easy-to-use workflows. We present ClimSim, the largest-ever dataset designed for hybrid ML-physics research. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and ML researchers. It consists of 5.7 billion pairs of multivariate input and output vectors that isolate the influence of locally-nested, high-resolution, high-fidelity physics on a host climate simulator's macro-scale physical state.The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators. We implement a range of deterministic and stochastic regression baselines to highlight the ML challenges and their scoring. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res) and code (https://leap-stc.github.io/ClimSim) are released openly to support the development of hybrid ML-physics and high-fidelity climate simulations for the benefit of science and society.  more » « less
Award ID(s):
2218197
NSF-PAR ID:
10521152
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Publisher / Repository:
Neurips
Date Published:
ISSN:
1049-5258
ISBN:
9781713829546
Format(s):
Medium: X
Location:
New Orleans
Sponsoring Org:
National Science Foundation
More Like this
  1. Quantum Computing has attracted much research attention because of its potential to achieve fundamental speed and efficiency improvements in various domains. Among different quantum algorithms, Parameterized Quantum Circuits (PQC) for Quantum Machine Learning (QML) show promises to realize quantum advantages on the current Noisy Intermediate-Scale Quantum (NISQ) Machines. Therefore, to facilitate the QML and PQC research, a recent python library called TorchQuantum has been released. It can construct, simulate, and train PQC for machine learning tasks with high speed and convenient debugging supports. Besides quantum for ML, we want to raise the community's attention on the reversed direction: ML for quantum. Specifically, the TorchQuantum library also supports using data-driven ML models to solve problems in quantum system research, such as predicting the impact of quantum noise on circuit fidelity and improving the quantum circuit compilation efficiency. This paper presents a case study of the ML for quantum part in TorchQuantum. Since estimating the noise impact on circuit reliability is an essential step toward understanding and mitigating noise, we propose to leverage classical ML to predict noise impact on circuit fidelity. Inspired by the natural graph representation of quantum circuits, we propose to leverage a graph transformer model to predict the noisy circuit fidelity. We firstly collect a large dataset with a variety of quantum circuits and obtain their fidelity on noisy simulators and real machines. Then we embed each circuit into a graph with gate and noise properties as node features, and adopt a graph transformer to predict the fidelity. We can avoid exponential classical simulation cost and efficiently estimate fidelity with polynomial complexity. Evaluated on 5 thousand random and algorithm circuits, the graph transformer predictor can provide accurate fidelity estimation with RMSE error 0.04 and outperform a simple neural network-based model by 0.02 on average. It can achieve 0.99 and 0.95 R2 scores for random and algorithm circuits, respectively. Compared with circuit simulators, the predictor has over 200× speedup for estimating the fidelity. The datasets and predictors can be accessed in the TorchQuantum library. 
    more » « less
  2. We introduce a hybrid model that synergistically combines machine learning (ML) with semiconductor device physics to simulate nanoscale transistors. This approach integrates a physics-based ballistic transistor model with an ML model that predicts ballisticity, enabling flexibility to interface the model with device data. The inclusion of device physics not only enhances the interpretability of the ML model but also streamlines its training process, reducing the necessity for extensive training data. The model's effectiveness is validated on both silicon nanotransistors and carbon nanotube FETs, demonstrating high model accuracy with a simplified ML component. We assess the impacts of various ML models—Multilayer Perceptron (MLP), Recurrent Neural Network (RNN), and RandomForestRegressor (RFR)—on predictive accuracy and training data requirements. Notably, hybrid models incorporating these components can maintain high accuracy with a small training dataset, with the RNN-based model exhibiting better accuracy compared to the MLP and RFR models. The trained hybrid model provides significant speedup compared to device simulations, and can be applied to predict circuit characteristics based on the modeled nanotransistors. 
    more » « less
  3. Abstract Climate emulators are a powerful instrument for climate modeling, especially in terms of reducing the computational load for simulating spatiotemporal processes associated with climate systems. The most important type of emulators are statistical emulators trained on the output of an ensemble of simulations from various climate models. However, such emulators oftentimes fail to capture the “physics” of a system that can be detrimental for unveiling critical processes that lead to climate tipping points. Historically, statistical mechanics emerged as a tool to resolve the constraints on physics using statistics. We discuss how climate emulators rooted in statistical mechanics and machine learning can give rise to new climate models that are more reliable and require less observational and computational resources. Our goal is to stimulate discussion on how statistical climate emulators can further be improved with the help of statistical mechanics which, in turn, may reignite the interest of statistical community in statistical mechanics of complex systems. 
    more » « less
  4. Abstract

    This paper describes an implementation of the combined hybrid‐parallel prediction (CHyPP) approach of Wikner et al. (2020),https://doi.org/10.1063/5.0005541on a low‐resolution atmospheric global circulation model (AGCM). The CHyPP approach combines a physics‐based numerical model of a dynamical system (e.g., the atmosphere) with a computationally efficient type of machine learning (ML) called reservoir computing to construct a hybrid model. This hybrid atmospheric model produces more accurate forecasts of most atmospheric state variables than the host AGCM for the first 7–8 forecast days, and for even longer times for the temperature and humidity near the earth's surface. It also produces more accurate forecasts than a model based only on ML, or a model that combines linear regression, rather than ML, with the AGCM. The potential of the CHyPP approach for climate research is demonstrated by a 10‐year long hybrid model simulation of the atmospheric general circulation, which shows that the hybrid model can simulate the general circulation with substantially smaller systematic errors and more realistic variability than the host AGCM.

     
    more » « less
  5. Accurate and computationally-viable representations of clouds and turbulence are a long-standing challenge for climate model development. Traditional parameterizations that crudely but efficiently approximate these processes are a leading source of uncertainty in long-term projected warming and precipitation patterns. Machine Learning (ML)-based parameterizations have long been hailed as a promising alternative with the potential to yield higher accuracy at a fraction of the cost of more explicit simulations. However, these ML variants are often unpredictably unstable and inaccurate in \textit{coupled} testing (i.e. in a downstream hybrid simulation task where they are dynamically interacting with the large-scale climate model). These issues are exacerbated in out-of-distribution climates. Certain design decisions such as ``climate-invariant" feature transformation for moisture inputs, input vector expansion, and temporal history incorporation have been shown to improve coupled performance, but they may be insufficient for coupled out-of-distribution generalization. If feature selection and transformations can inoculate hybrid physics-ML climate models from non-physical, out-of-distribution extrapolation in a changing climate, there is far greater potential in extrapolating from observational data. Otherwise, training on multiple simulated climates becomes an inevitable necessity. While our results show generalization benefits from these design decisions, the obtained improvment does not sufficiently preclude the necessity of using multi-climate simulated training data. 
    more » « less