skip to main content

Title: Understanding Predictability of Daily Southeast U.S. Precipitation Using Explainable Machine Learning

We investigate the predictability of the sign of daily southeastern U.S. (SEUS) precipitation anomalies associated with simultaneous predictors of large-scale climate variability using machine learning models. Models using index-based climate predictors and gridded fields of large-scale circulation as predictors are utilized. Logistic regression (LR) and fully connected neural networks using indices of climate phenomena as predictors produce neither accurate nor reliable predictions, indicating that the indices themselves are not good predictors. Using gridded fields as predictors, an LR and convolutional neural network (CNN) are more accurate than the index-based models. However, only the CNN can produce reliable predictions that can be used to identify forecasts of opportunity. Using explainable machine learning we identify which variables and grid points of the input fields are most relevant for confident and correct predictions in the CNN. Our results show that the local circulation is most important as represented by maximum relevance of 850-hPa geopotential heights and zonal winds to making skillful, high-probability predictions. Corresponding composite anomalies identify connections with El Niño–Southern Oscillation during winter and the Atlantic multidecadal oscillation and North Atlantic subtropical high during summer.

 ;  ;  
Publication Date:
Journal Name:
Artificial Intelligence for the Earth Systems
American Meteorological Society
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Heatwaves are extreme near-surface temperature events that can have substantial impacts on ecosystems and society. Early warning systems help to reduce these impacts by helping communities prepare for hazardous climate-related events. However, state-of-the-art prediction systems can often not make accurate forecasts of heatwaves more than two weeks in advance, which are required for advance warnings. We therefore investigate the potential of statistical and machine learning methods to understand and predict central European summer heatwaves on time scales of several weeks. As a first step, we identify the most important regional atmospheric and surface predictors based on previous studies and supported by a correlation analysis: 2-m air temperature, 500-hPa geopotential, precipitation, and soil moisture in central Europe, as well as Mediterranean and North Atlantic sea surface temperatures, and the North Atlantic jet stream. Based on these predictors, we apply machine learning methods to forecast two targets: summer temperature anomalies and the probability of heatwaves for 1–6 weeks lead time at weekly resolution. For each of these two target variables, we use both a linear and a random forest model. The performance of these statistical models decays with lead time, as expected, but outperforms persistence and climatology at all lead times.more »For lead times longer than two weeks, our machine learning models compete with the ensemble mean of the European Centre for Medium-Range Weather Forecast’s hindcast system. We thus show that machine learning can help improve subseasonal forecasts of summer temperature anomalies and heatwaves.

    Significance Statement

    Heatwaves (prolonged extremely warm temperatures) cause thousands of fatalities worldwide each year. These damaging events are becoming even more severe with climate change. This study aims to improve advance predictions of summer heatwaves in central Europe by using statistical and machine learning methods. Machine learning models are shown to compete with conventional physics-based models for forecasting heatwaves more than two weeks in advance. These early warnings can be used to activate effective and timely response plans targeting vulnerable communities and regions, thereby reducing the damage caused by heatwaves.

    « less
  2. Abstract Precipitation is one of the most difficult variables to estimate using large-scale predictors. Over South America (SA), this task is even more challenging, given the complex topography of the Andes. Empirical–statistical downscaling (ESD) models can be used for this purpose, but such models, applicable for all of SA, have not yet been developed. To address this issue, we construct an ESD model using multiple-linear-regression techniques for the period 1982–2016 that is based on large-scale circulation indices representing tropical Pacific Ocean, Atlantic Ocean, and South American climate variability, to estimate austral summer [December–February (DJF)] precipitation over SA. Statistical analyses show that the ESD model can reproduce observed precipitation anomalies over the tropical Andes (Ecuador, Colombia, Peru, and Bolivia), the eastern equatorial Amazon basin, and the central part of the western Argentinian Andes. On a smaller scale, the ESD model also shows good results over the Western Cordillera of the Peruvian Andes. The ESD model reproduces anomalously dry conditions over the eastern equatorial Amazon and the wet conditions over southeastern South America (SESA) during the three extreme El Niños: 1982/83, 1997/98, and 2015/16. However, it overestimates the observed intensities over SESA. For the central Peruvian Andes as a case study, resultsmore »further show that the ESD model can correctly reproduce DJF precipitation anomalies over the entire Mantaro basin during the three extreme El Niño episodes. Moreover, multiple experiments with varying predictor combinations of the ESD model corroborate the hypothesis that the interaction between the South Atlantic convergence zone and the equatorial Atlantic Ocean provoked the Amazon drought in 2015/16.« less
  3. Unmanned aerial vehicles (UAVs) equipped with multispectral sensors offer high spatial and temporal resolution imagery for monitoring crop stress at early stages of development. Analysis of UAV-derived data with advanced machine learning models could improve real-time management in agricultural systems, but guidance for this integration is currently limited. Here we compare two deep learning-based strategies for early warning detection of crop stress, using multitemporal imagery throughout the growing season to predict field-scale yield in irrigated rice in eastern Arkansas. Both deep learning strategies showed improvements upon traditional statistical learning approaches including linear regression and gradient boosted decision trees. First, we explicitly accounted for variation across developmental stages using a 3D convolutional neural network (CNN) architecture that captures both spatial and temporal dimensions of UAV images from multiple time points throughout one growing season. 3D-CNNs achieved low prediction error on the test set, with a Root Mean Squared Error (RMSE) of 8.8% of the mean yield. For the second strategy, a 2D-CNN, we considered only spatial relationships among pixels for image features acquired during a single flyover. 2D-CNNs trained on images from a single day were most accurate when images were taken during booting stage or later, with RMSE ranging frommore »7.4 to 8.2% of the mean yield. A primary benefit of convolutional autoencoder-like models (based on analyses of prediction maps and feature importance) is the spatial denoising effect that corrects yield predictions for individual pixels based on the values of vegetation index and thermal features for nearby pixels. Our results highlight the promise of convolutional autoencoders for UAV-based yield prediction in rice.« less
  4. Abstract

    Tropical cyclone (TC) landfalls over the U.S. mid-Atlantic region, which include the so-called Sandy-like, or westward-curving, tracks, are among the most infrequent landfalls along the U.S. East Coast. However, when these events do occur, the resulting economic and societal consequences can be devastating. A recent example is Hurricane Sandy in 2012. Multimodel ensemble seasonal hindcasts conducted with a high-atmospheric-resolution coupled prediction system based on the ECMWF operational model (Project Minerva) are used here to compile the statistics of these rare events. Minerva hindcasts are found to exhibit skill in reproducing climatological characteristics of the mid-Atlantic TC landfalls particularly at the highest atmospheric horizontal spectral resolution of T1279 (16-km grid spacing). Historical forecasts are further interrogated to identify regional and large-scale environmental conditions associated with these rare TC tracks to better quantify their predictability on synoptic time scales, and their dependence on model resolution. Evolution of the large-scale atmospheric flow patterns leading to mid-Atlantic TC landfalls is analyzed using local finite-amplitude wave activity (LWA). We have identified large-amplitude quasi-stationary features in the LWA and sea surface temperature (SST) anomaly distributions that persist up to about a week leading to these land-falling events. A statistical model utilizing indices based on themore »LWA and SST anomalies as predictors is developed that exhibits skill (mostly at T1279) in predicting mid-Atlantic TC landfalls several days in advance. Implications of these results for longer time-scale predictions of mid-Atlantic TC landfalls including climate change projections are discussed.

    « less
  5. Tropical cyclones (TCs) are an important source of precipitation for much of the eastern United States. However, our understanding of the spatiotemporal variability of tropical cyclone precipitation (TCP) and the connections to large-scale atmospheric circulation is limited by irregularly distributed rain gauges and short records of satellite measurements. To address this, we developed a new gridded (0.25° × 0.25°) publicly available dataset of TCP (1948–2015; Tropical Cyclone Precipitation Dataset, or TCPDat) using TC tracks to identify TCP within an existing gridded precipitation dataset. TCPDat was used to characterize total June–November TCP and percentage contribution to total June–November precipitation. TCP totals and contributions had maxima on the Louisiana, North Carolina, and Texas coasts, substantially decreasing farther inland at rates of approximately 6.2–6.7 mm km−1. Few statistically significant trends were discovered in either TCP totals or percentage contribution. TCP is positively related to an index of the position and strength of the western flank of the North Atlantic subtropical high (NASH), with the strongest correlations concentrated in the southeastern United States. Weaker inverse correlations between TCP and El Niño–Southern Oscillation are seen throughout the study site. Ultimately, spatial variations of TCP are more closely linked to variations in the NASH flank positionmore »or strength than to the ENSO index. The TCP dataset developed in this study is an important step in understanding hurricane–climate interactions and the impacts of TCs on communities, water resources, and ecosystems in the eastern United States.

    « less