- Award ID(s):
- 1822221
- NSF-PAR ID:
- 10431069
- Date Published:
- Journal Name:
- Environmental Data Science
- Volume:
- 2
- ISSN:
- 2634-4602
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
null (Ed.)This paper shows that skillful week 3–4 predictions of a large-scale pattern of 2 m temperature over the US can be made based on the Nino3.4 index alone, where skillful is defined to be better than climatology. To find more skillful regression models, this paper explores various machine learning strategies (e.g., ridge regression and lasso), including those trained on observations and on climate model output. It is found that regression models trained on climate model output yield more skillful predictions than regression models trained on observations, presumably because of the larger training sample. Nevertheless, the skill of the best machine learning models are only modestly better than ordinary least squares based on the Nino3.4 index. Importantly, this fact is difficult to infer from the parameters of the machine learning model because very different parameter sets can produce virtually identical predictions. For this reason, attempts to interpret the source of predictability from the machine learning model can be very misleading. The skill of machine learning models also are compared to those of a fully coupled dynamical model, CFSv2. The results depend on the skill measure: for mean square error, the dynamical model is slightly worse than the machine learning models; for correlation skill, the dynamical model is only modestly better than machine learning models or the Nino3.4 index. In summary, the best predictions of the large-scale pattern come from machine learning models trained on long climate simulations, but the skill is only modestly better than predictions based on the Nino3.4 index alone.more » « less
-
This study explores the feasibility of predicting subdaily variations and the climatological spatial patterns of rain in the tropical Pacific from atmospheric profiles using a set of generalized linear models: logistic regression for rain occurrence and gamma regression for rain amount. The prediction is separated into different rain types from TRMM satellite radar observations (stratiform, deep convective, and shallow convective) and CAM5 simulations (large-scale and convective). Environmental variables from MERRA-2 and CAM5 are used as predictors for TRMM and CAM5 rainfall, respectively. The statistical models are trained using environmental fields at 0000 UTC and rainfall from 0000 to 0600 UTC during 2003. The results are used to predict 2004 rain occurrence and rate for MERRA-2/TRMM and CAM5 separately. The first EOF profile of humidity and the second EOF profile of temperature contribute most to the prediction for both statistical models in each case. The logistic regression generally performs well for all rain types, but does better in the east Pacific compared to the west Pacific. The gamma regression produces reasonable geographical rain amount distributions but rain rate probability distributions are not predicted as well, suggesting the need for a different, higher-order model to predict rain rates. The results of this study suggest that statistical models applied to TRMM radar observations and MERRA-2 environmental parameters can predict the spatial patterns and amplitudes of tropical rainfall in the time-averaged sense. Comparing the observationally trained models to models that are trained using CAM5 simulations points to possible deficiencies in the convection parameterization used in CAM5.
-
Despite major improvements in weather and climate modelling and substantial increases in remotely sensed observations, drought prediction remains a major challenge. After a review of the existing methods, we discuss major research gaps and opportunities to improve drought prediction. We argue that current approaches are top-down, assuming that the process(es) and/or driver(s) are known—i.e. starting with a model and then imposing it on the observed events (reality). With the help of an experiment, we show that there are opportunities to develop bottom-up drought prediction models—i.e. starting from the reality (here, observed events) and searching for model(s) and driver(s) that work. Recent advances in artificial intelligence and machine learning provide significant opportunities for developing bottom-up drought forecasting models. Regardless of the type of drought forecasting model (e.g. machine learning, dynamical simulations, analogue based), we need to shift our attention to robustness of theories and outputs rather than event-based verification. A shift in our focus towards quantifying the stability of uncertainty in drought prediction models, rather than the goodness of fit or reproducing the past, could be the first step towards this goal. Finally, we highlight the advantages of hybrid dynamical and statistical models for improving current drought prediction models. This article is part of the Royal Society Science+ meeting issue ‘Drought risk in the Anthropocene’.more » « less
-
null (Ed.)A deep neural network is trained to predict sea surface temperature variations at two important regions of the Atlantic ocean, using 800 years of simulated climate dynamics based on the first-principles physics models. This model is then tested against 60 years of historical data. Our statistical model learns to approximate the physical laws governing the simulation, providing significant improvement over simple statistical forecasts and comparable to most state-of-the-art dynamical/conventional forecast models for a fraction of the computational cost.more » « less
-
Lossy compressors are increasingly adopted in scientific research, tackling volumes of data from experiments or parallel numerical simulations and facilitating data storage and movement. In contrast with the notion of entropy in lossless compression, no theoretical or data-based quantification of lossy compressibility exists for scientific data. Users rely on trial and error to assess lossy compression performance. As a strong data-driven effort toward quantifying lossy compressibility of scientific datasets, we provide a statistical framework to predict compression ratios of lossy compressors. Our method is a two-step framework where (i) compressor-agnostic predictors are computed and (ii) statistical prediction models relying on these predictors are trained on observed compression ratios. Proposed predictors exploit spatial correlations and notions of entropy and lossyness via the quantized entropy. We study 8+ compressors on 6 scientific datasets and achieve a median percentage prediction error less than 12%, which is substantially smaller than that of other methods while achieving at least a 8.8× speedup for searching for a specific compression ratio and 7.8× speedup for determining the best compressor out of a collection.