skip to main content


Title: Statistical and machine learning methods applied to the prediction of different tropical rainfall types
Abstract

Predicting rain from large-scale environmental variables remains a challenging problem for climate models and it is unclear how well numerical methods can predict the true characteristics of rainfall without smaller (storm) scale information. This study explores the ability of three statistical and machine learning methods to predict 3-hourly rain occurrence and intensity at 0.5° resolution over the tropical Pacific Ocean using rain observations the Global Precipitation Measurement (GPM) satellite radar and large-scale environmental profiles of temperature and moisture from the MERRA-2 reanalysis. We also separated the rain into different types (deep convective, stratiform, and shallow convective) because of their varying kinematic and thermodynamic structures that might respond to the large-scale environment in different ways. Our expectation was that the popular machine learning methods (i.e., the neural network and random forest) would outperform a standard statistical method (a generalized linear model) because of their more flexible structures, especially in predicting the highly skewed distribution of rain rates for each rain type. However, none of the methods obviously distinguish themselves from one another and each method still has issues with predicting rain too often and not fully capturing the high end of the rain rate distributions, both of which are common problems in climate models. One implication of this study is that machine learning tools must be carefully assessed and are not necessarily applicable to solving all big data problems. Another implication is that traditional climate model approaches are not sufficient to predict extreme rain events and that other avenues need to be pursued.

 
more » « less
Award ID(s):
1806063
NSF-PAR ID:
10304556
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
IOP Publishing
Date Published:
Journal Name:
Environmental Research Communications
Volume:
3
Issue:
11
ISSN:
2515-7620
Page Range / eLocation ID:
Article No. 111001
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This study explores the feasibility of predicting subdaily variations and the climatological spatial patterns of rain in the tropical Pacific from atmospheric profiles using a set of generalized linear models: logistic regression for rain occurrence and gamma regression for rain amount. The prediction is separated into different rain types from TRMM satellite radar observations (stratiform, deep convective, and shallow convective) and CAM5 simulations (large-scale and convective). Environmental variables from MERRA-2 and CAM5 are used as predictors for TRMM and CAM5 rainfall, respectively. The statistical models are trained using environmental fields at 0000 UTC and rainfall from 0000 to 0600 UTC during 2003. The results are used to predict 2004 rain occurrence and rate for MERRA-2/TRMM and CAM5 separately. The first EOF profile of humidity and the second EOF profile of temperature contribute most to the prediction for both statistical models in each case. The logistic regression generally performs well for all rain types, but does better in the east Pacific compared to the west Pacific. The gamma regression produces reasonable geographical rain amount distributions but rain rate probability distributions are not predicted as well, suggesting the need for a different, higher-order model to predict rain rates. The results of this study suggest that statistical models applied to TRMM radar observations and MERRA-2 environmental parameters can predict the spatial patterns and amplitudes of tropical rainfall in the time-averaged sense. Comparing the observationally trained models to models that are trained using CAM5 simulations points to possible deficiencies in the convection parameterization used in CAM5.

     
    more » « less
  2. Abstract

    By utilizing functional relationships based on observations at plot or field scales, water quality models first compute surface runoff and then use it as the primary governing variable to estimate sediment and nutrient transport. When these models are applied at watershed scales, this serial model structure, coupling a surface runoff sub‐model with a water quality sub‐model, may be inappropriate because dominant hydrological processes differ among scales. A parallel modeling approach is proposed to evaluate how best to combine dominant hydrological processes for predicting water quality at watershed scales. In the parallel scheme, dominant variables of water quality models are identified based entirely on their statistical significance using time series analysis. Four surface runoff models of different model complexity were assessed using both the serial and parallel approaches to quantify the uncertainty on forcing variables used to predict water quality. The eight alternative model structures were tested against a 25‐year high‐resolution data set of streamflow, suspended sediment discharge, and phosphorous discharge at weekly time steps. Models using the parallel approach consistently performed better than serial‐based models, by having less error in predictions of watershed scale streamflow, sediment and phosphorus, which suggests model structures of water quantity and quality models at watershed scales should be reformulated by incorporating the dominant variables. The implication is that hydrological models should be constructed in a way that avoids stacking one sub‐model with one set of scale assumptions onto the front end of another sub‐model with a different set of scale assumptions.

     
    more » « less
  3. Abstract

    “Supermodeling” climate by allowing different models to assimilate data from one another in run time has been shown to give results superior to those of any one model and superior to any weighted average of model outputs. The only free parameters, connection strengths between corresponding variables in each pair of models, are determined using some form of machine learning. It is demonstrated that supermodeling succeeds because near critical states, interscale interactions are important but unresolved processes cannot be effectively represented diagnostically in any single parameterization scheme. In two examples, a pair of toy quasigeostrophic (QG) channel models of the midlatitudes and a pair of ECHAM5 models of the tropical Pacific atmosphere with a common ocean, supermodels dynamically combine parameterization schemes so as to capture criticality, associated critical structures, and the supporting scale interactions. The QG supermodeling scheme extends a previous configuration in which two such models synchronize with intermodel connections only between medium-scale components of the flow; here the connections are trained against a third “real” model. Intermittent blocking patterns characterize the critical behavior thus obtained, even where such patterns are missing in the constituent models. In the ECHAM-based climate supermodel, the corresponding critical structure is the single ITCZ pattern, a pattern that occurs in neither of the constituent models. For supermodels of both types, power spectra indicate enhanced interscale interactions in frequency or energy ranges of physical interest, in agreement with observed data, and supporting a generalized form of the self-organized criticality hypothesis.

    Significance Statement

    In a “supermodel” of Earth’s climate, alternative models (climate simulations), which differ in the way they represent processes on the smallest scales, are trained to exchange information as they run, adjusting to one another much as weather prediction models adjust to new observations. They form a consensus, capturing atmospheric behaviors that have eluded all the separate models. We demonstrate that simplified supermodels succeed, where no single approach can, by correctly representingcritical phenomenainvolving sudden qualitative transitions, such as occur in El Niño events, that depend on interactions among atmospheric processes on many different scales in space and time. The correct reproduction of critical phenomena is vital both for predicting weather and for projecting the effects of climate change.

     
    more » « less
  4. Accurate prediction of precipitation intensity is crucial for both human and natural systems, especially in a warming climate more prone to extreme precipitation. Yet, climate models fail to accurately predict precipitation intensity, particularly extremes. One missing piece of information in traditional climate model parameterizations is subgrid-scale cloud structure and organization, which affects precipitation intensity and stochasticity at coarse resolution. Here, using global storm-resolving simulations and machine learning, we show that, by implicitly learning subgrid organization, we can accurately predict precipitation variability and stochasticity with a low-dimensional set of latent variables. Using a neural network to parameterize coarse-grained precipitation, we find that the overall behavior of precipitation is reasonably predictable using large-scale quantities only; however, the neural network cannot predict the variability of precipitation ( R 2 ∼ 0.45) and underestimates precipitation extremes. The performance is significantly improved when the network is informed by our organization metric, correctly predicting precipitation extremes and spatial variability ( R 2 ∼ 0.9). The organization metric is implicitly learned by training the algorithm on a high-resolution precipitable water field, encoding the degree of subgrid organization. The organization metric shows large hysteresis, emphasizing the role of memory created by subgrid-scale structures. We demonstrate that this organization metric can be predicted as a simple memory process from information available at the previous time steps. These findings stress the role of organization and memory in accurate prediction of precipitation intensity and extremes and the necessity of parameterizing subgrid-scale convective organization in climate models to better project future changes of water cycle and extremes. 
    more » « less
  5. Abstract

    Existing precipitation-type algorithms have difficulty discerning the occurrence of freezing rain and ice pellets. These inherent biases are not only problematic in operational forecasting but also complicate the development of model-based precipitation-type climatologies. To address these issues, this paper introduces a novel light gradient-boosting machine (LightGBM)-based machine learning precipitation-type algorithm that utilizes reanalysis and surface observations. By comparing it with the Bourgouin precipitation-type algorithm as a baseline, we demonstrate that our algorithm improves the critical success index (CSI) for all examined precipitation types. Moreover, when compared with the precipitation-type diagnosis in reanalysis, our algorithm exhibits increased F1 scores for snow, freezing rain, and ice pellets. Subsequently, we utilize the algorithm to compute a freezing-rain climatology over the eastern United States. The resulting climatology pattern aligns well with observations; however, a significant mean bias is observed. We interpret this bias to be influenced by both the algorithm itself and assumptions regarding precipitation processes, which include biases associated with freezing drizzle, precipitation occurrence, and regional synoptic weather patterns. To mitigate the overall bias, we propose increasing the precipitation cutoff from 0.04 to 0.25 mm h−1, as it better reflects the precision of precipitation observations. This adjustment yields a substantial reduction in the overall bias. Finally, given the strong performance of LightGBM in predicting mixed precipitation episodes, we anticipate that the algorithm can be effectively utilized in operational settings and for diagnosing precipitation types in climate model outputs.

    Significance Statement

    Freezing rain can have significant impacts on transportation and infrastructure, making accurate prediction of precipitation types crucial. In this study, we use a machine learning method known as LightGBM to predict precipitation types. We show that the new algorithm performs better than the existing methods for all precipitation types examined. Additionally, we compute a freezing-rain climatology over the eastern United States. Although the resulting climatology pattern corresponds well to observations, the algorithm overpredicts freezing-rain occurrence. We argue that this bias can be substantially reduced by increasing the precipitation cutoff from 0.04 to 0.25 mm h−1. Overall, this work highlights the potential of the LightGBM algorithm for both weather forecasting and diagnosing precipitation types in climate models.

     
    more » « less