skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, June 12 until 2:00 AM ET on Friday, June 13 due to maintenance. We apologize for the inconvenience.


Title: Statistical and machine learning methods applied to the prediction of different tropical rainfall types
Abstract Predicting rain from large-scale environmental variables remains a challenging problem for climate models and it is unclear how well numerical methods can predict the true characteristics of rainfall without smaller (storm) scale information. This study explores the ability of three statistical and machine learning methods to predict 3-hourly rain occurrence and intensity at 0.5° resolution over the tropical Pacific Ocean using rain observations the Global Precipitation Measurement (GPM) satellite radar and large-scale environmental profiles of temperature and moisture from the MERRA-2 reanalysis. We also separated the rain into different types (deep convective, stratiform, and shallow convective) because of their varying kinematic and thermodynamic structures that might respond to the large-scale environment in different ways. Our expectation was that the popular machine learning methods (i.e., the neural network and random forest) would outperform a standard statistical method (a generalized linear model) because of their more flexible structures, especially in predicting the highly skewed distribution of rain rates for each rain type. However, none of the methods obviously distinguish themselves from one another and each method still has issues with predicting rain too often and not fully capturing the high end of the rain rate distributions, both of which are common problems in climate models. One implication of this study is that machine learning tools must be carefully assessed and are not necessarily applicable to solving all big data problems. Another implication is that traditional climate model approaches are not sufficient to predict extreme rain events and that other avenues need to be pursued.  more » « less
Award ID(s):
1806063
PAR ID:
10304556
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
IOP Publishing
Date Published:
Journal Name:
Environmental Research Communications
Volume:
3
Issue:
11
ISSN:
2515-7620
Page Range / eLocation ID:
Article No. 111001
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Accurate prediction of precipitation intensity is crucial for both human and natural systems, especially in a warming climate more prone to extreme precipitation. Yet, climate models fail to accurately predict precipitation intensity, particularly extremes. One missing piece of information in traditional climate model parameterizations is subgrid-scale cloud structure and organization, which affects precipitation intensity and stochasticity at coarse resolution. Here, using global storm-resolving simulations and machine learning, we show that, by implicitly learning subgrid organization, we can accurately predict precipitation variability and stochasticity with a low-dimensional set of latent variables. Using a neural network to parameterize coarse-grained precipitation, we find that the overall behavior of precipitation is reasonably predictable using large-scale quantities only; however, the neural network cannot predict the variability of precipitation ( R 2 ∼ 0.45) and underestimates precipitation extremes. The performance is significantly improved when the network is informed by our organization metric, correctly predicting precipitation extremes and spatial variability ( R 2 ∼ 0.9). The organization metric is implicitly learned by training the algorithm on a high-resolution precipitable water field, encoding the degree of subgrid organization. The organization metric shows large hysteresis, emphasizing the role of memory created by subgrid-scale structures. We demonstrate that this organization metric can be predicted as a simple memory process from information available at the previous time steps. These findings stress the role of organization and memory in accurate prediction of precipitation intensity and extremes and the necessity of parameterizing subgrid-scale convective organization in climate models to better project future changes of water cycle and extremes. 
    more » « less
  2. Abstract While guided wave structural health monitoring (SHM) is widely researched for ensuring safety, estimating performance deterioration, and detecting damage in structures, it experiences setbacks in accuracy due to varying environmental, sensor, and material factors. To combat these challenges, environmentally variable guided wave data is often stretched with temperature compensation methods, such as the scale transform and optimal signal stretch, to match a baseline signal and enable accurate damage detection. Yet, these methods fail for large environmental changes. This paper addresses this challenge by demonstrating a machine learning method to predict stretch factors. This is accomplished with feed-forward neural networks that approximate the complex velocity change function. We demonstrate that our machine learning approach outperforms the prior art on simulated Lamb wave data and is robust with extreme velocity variations. While our machine learning models do not conduct temperature compensation, their accurate stretch factor predictions serve as a proof of concept that a better model is plausible. 
    more » « less
  3. Abstract Convective organization has a large impact on precipitation and feeds back on larger‐scale circulations in the tropics. The degree of this convective organization changes with modes of climate variability like the El Niño–Southern Oscillation (ENSO), but because organization is not represented in current climate models, a quantitative assessment of these shifts has not been possible. Here, we construct multidecade satellite climatologies of occurrence of tropical convective organization and its properties and assess changes with ENSO phase. The occurrence of organized deep convection becomes more concentrated, increasing threefold in the eastern and central Pacific during El Niño and decreasing twofold outside of these regions. Both horizontal extent of the cold cloud shield and convective depth increase in regions of positive sea surface temperature anomaly (SSTa); however, the regions of greatest convective deepening are those of large‐scale ascent, rather than those of warmest SSTa. Extent decreases with SSTa at a rate of about 20 km/K, while the SSTa dependence of depth is only about 0.2 K/K. We introduce two values to describe convective changes with ENSO more succinctly: (1) an information entropy metric to quantify the clustering of convective system occurrences and (2) a growth metric to quantify deepening relative to spreading over the system lifetime. Finally, with collocated precipitation data, we see that rainfall attributable to convective organization jumps up to 5% with warming. Rain intensity and amount increase for a given system size during El Niño, but a given rain amount may actually fall with higher intensity during La Niña. 
    more » « less
  4. Proteins perform their biological functions through motion. Although high throughput prediction of the three-dimensional static structures of proteins has proved feasible using deep-learning-based methods, predicting the conformational motions remains a challenge. Purely data-driven machine learning methods encounter difficulty for addressing such motions because available laboratory data on conformational motions are still limited. In this work, we develop a method for generating protein allosteric motions by integrating physical energy landscape information into deep-learning-based methods. We show that local energetic frustration, which represents a quantification of the local features of the energy landscape governing protein allosteric dynamics, can be utilized to empower AlphaFold2 (AF2) to predict protein conformational motions. Starting from ground state static structures, this integrative method generates alternative structures as well as pathways of protein conformational motions, using a progressive enhancement of the energetic frustration features in the input multiple sequence alignment sequences. For a model protein adenylate kinase, we show that the generated conformational motions are consistent with available experimental and molecular dynamics simulation data. Applying the method to another two proteins KaiB and ribose-binding protein, which involve large-amplitude conformational changes, can also successfully generate the alternative conformations. We also show how to extract overall features of the AF2 energy landscape topography, which has been considered by many to be black box. Incorporating physical knowledge into deep-learning-based structure prediction algorithms provides a useful strategy to address the challenges of dynamic structure prediction of allosteric proteins. 
    more » « less
  5. We train five models using two machine learning (ML) regression algorithms (i.e., linear regression and XGBoost) to predict hydrothermal upflow in the Great Basin. Feature data are extracted from datasets supporting the INnovative Geothermal Exploration through Novel Investigations Of Undiscovered Systems project (INGENIOUS). The label data (the reported convective signals) are extracted from measured thermal gradients in wells by comparing the total estimated heat flow at the wells to the modeled background conductive heat flow. That is, the reported convective signal is the difference between the background conductive heat flow and the well heat flow. The reported convective signals contain outliers that may affect upflow prediction, so the influence of outliers is tested by constructing models for two cases: 1) using all the data (i.e., -91 to 11,105 mW/m2), and 2) truncating the range of labels to include only reported convective signals between -25 and 200 mW/m2. Because hydrothermal systems are sparse, models that predict high convective signal in smaller areas better match the natural frequency of hydrothermal systems. Early results demonstrate that XGBoost outperforms linear regression. For XGBoost using the truncated range of labels, half of the high reported signals are within < 3 % of the highest predictions. For XGBoost using the entire range of labels, half of the high reported signals are in < 13 % of the highest predictions. While this implies that the truncated regression is superior, the all-data model better predicts the locations of power-producing systems (i.e., the operating power plants are in a smaller fraction of the study area given by the highest predictions). Even though the models generally predict greater hydrothermal upflow for higher reported convective signals than for lower reported convective signals, both XGBoost models consistently underpredict the magnitude of higher signals. This behavior is attributed to low resolution/granularity of input features compared with the scale of a hydrothermal upflow zone (a few km or less across). Trouble estimating exact values while still reliably predicting high versus low convective signals suggests that a future strategy such as ranked ordinal regression (e.g., classifying into ordered bins for low, medium, high, and very high convective signal) might fit better models, since doing so reduces problems introduced by outliers while preserving the property of larger versus smaller signals. 
    more » « less