skip to main content


Title: A Machine Learning Based Ensemble Forecasting Optimization Algorithm for Preseason Prediction of Atlantic Hurricane Activity
In this study, nine different statistical models are constructed using different combinations of predictors, including models with and without projected predictors. Multiple machine learning (ML) techniques are employed to optimize the ensemble predictions by selecting the top performing ensemble members and determining the weights for each ensemble member. The ML-Optimized Ensemble (ML-OE) forecasts are evaluated against the Simple-Averaging Ensemble (SAE) forecasts. The results show that for the response variables that are predicted with significant skill by individual ensemble members and SAE, such as Atlantic tropical cyclone counts, the performance of SAE is comparable to the best ML-OE results. However, for response variables that are poorly modeled by individual ensemble members, such as Atlantic and Gulf of Mexico major hurricane counts, ML-OE predictions often show higher skill score than individual model forecasts and the SAE predictions. However, neither SAE nor ML-OE was able to improve the forecasts of the response variables when all models show consistent bias. The results also show that increasing the number of ensemble members does not necessarily lead to better ensemble forecasts. The best ensemble forecasts are from the optimally combined subset of models.  more » « less
Award ID(s):
1747555
NSF-PAR ID:
10325803
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Atmosphere
Volume:
12
Issue:
4
ISSN:
2073-4433
Page Range / eLocation ID:
522
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract A primary goal of the National Oceanic and Atmospheric Administration Warn-on-Forecast (WoF) project is to provide rapidly updating probabilistic guidance to human forecasters for short-term (e.g., 0–3 h) severe weather forecasts. Postprocessing is required to maximize the usefulness of probabilistic guidance from an ensemble of convection-allowing model forecasts. Machine learning (ML) models have become popular methods for postprocessing severe weather guidance since they can leverage numerous variables to discover useful patterns in complex datasets. In this study, we develop and evaluate a series of ML models to produce calibrated, probabilistic severe weather guidance from WoF System (WoFS) output. Our dataset includes WoFS ensemble forecasts available every 5 min out to 150 min of lead time from the 2017–19 NOAA Hazardous Weather Testbed Spring Forecasting Experiments (81 dates). Using a novel ensemble storm-track identification method, we extracted three sets of predictors from the WoFS forecasts: intrastorm state variables, near-storm environment variables, and morphological attributes of the ensemble storm tracks. We then trained random forests, gradient-boosted trees, and logistic regression algorithms to predict which WoFS 30-min ensemble storm tracks will overlap a tornado, severe hail, and/or severe wind report. To provide rigorous baselines against which to evaluate the skill of the ML models, we extracted the ensemble probabilities of hazard-relevant WoFS variables exceeding tuned thresholds from each ensemble storm track. The three ML algorithms discriminated well for all three hazards and produced more reliable probabilities than the baseline predictions. Overall, the results suggest that ML-based postprocessing of dynamical ensemble output can improve short-term, storm-scale severe weather probabilistic guidance. 
    more » « less
  2. Abstract

    Heatwaves are extreme near-surface temperature events that can have substantial impacts on ecosystems and society. Early warning systems help to reduce these impacts by helping communities prepare for hazardous climate-related events. However, state-of-the-art prediction systems can often not make accurate forecasts of heatwaves more than two weeks in advance, which are required for advance warnings. We therefore investigate the potential of statistical and machine learning methods to understand and predict central European summer heatwaves on time scales of several weeks. As a first step, we identify the most important regional atmospheric and surface predictors based on previous studies and supported by a correlation analysis: 2-m air temperature, 500-hPa geopotential, precipitation, and soil moisture in central Europe, as well as Mediterranean and North Atlantic sea surface temperatures, and the North Atlantic jet stream. Based on these predictors, we apply machine learning methods to forecast two targets: summer temperature anomalies and the probability of heatwaves for 1–6 weeks lead time at weekly resolution. For each of these two target variables, we use both a linear and a random forest model. The performance of these statistical models decays with lead time, as expected, but outperforms persistence and climatology at all lead times. For lead times longer than two weeks, our machine learning models compete with the ensemble mean of the European Centre for Medium-Range Weather Forecast’s hindcast system. We thus show that machine learning can help improve subseasonal forecasts of summer temperature anomalies and heatwaves.

    Significance Statement

    Heatwaves (prolonged extremely warm temperatures) cause thousands of fatalities worldwide each year. These damaging events are becoming even more severe with climate change. This study aims to improve advance predictions of summer heatwaves in central Europe by using statistical and machine learning methods. Machine learning models are shown to compete with conventional physics-based models for forecasting heatwaves more than two weeks in advance. These early warnings can be used to activate effective and timely response plans targeting vulnerable communities and regions, thereby reducing the damage caused by heatwaves.

     
    more » « less
  3. Solar flare prediction is a central problem in space weather forecasting and has captivated the attention of a wide spectrum of researchers due to recent advances in both remote sensing as well as machine learning and deep learning approaches. The experimental findings based on both machine and deep learning models reveal significant performance improvements for task specific datasets. Along with building models, the practice of deploying such models to production environments under operational settings is a more complex and often time-consuming process which is often not addressed directly in research settings. We present a set of new heuristic approaches to train and deploy an operational solar flare prediction system for ≥M1.0-class flares with two prediction modes: full-disk and active region-based. In full-disk mode, predictions are performed on full-disk line-of-sight magnetograms using deep learning models whereas in active region-based models, predictions are issued for each active region individually using multivariate time series data instances. The outputs from individual active region forecasts and full-disk predictors are combined to a final full-disk prediction result with a meta-model. We utilized an equal weighted average ensemble of two base learners’ flare probabilities as our baseline meta learner and improved the capabilities of our two base learners by training a logistic regression model. The major findings of this study are: 1) We successfully coupled two heterogeneous flare prediction models trained with different datasets and model architecture to predict a full-disk flare probability for next 24 h, 2) Our proposed ensembling model, i.e., logistic regression, improves on the predictive performance of two base learners and the baseline meta learner measured in terms of two widely used metrics True Skill Statistic (TSS) and Heidke Skill Score (HSS), and 3) Our result analysis suggests that the logistic regression-based ensemble (Meta-FP) improves on the full-disk model (base learner) by ∼9% in terms TSS and ∼10% in terms of HSS. Similarly, it improves on the AR-based model (base learner) by ∼17% and ∼20% in terms of TSS and HSS respectively. Finally, when compared to the baseline meta model, it improves on TSS by ∼10% and HSS by ∼15%. 
    more » « less
  4. Abstract

    Tropical cyclone (TC) landfalls over the U.S. mid-Atlantic region, which include the so-called Sandy-like, or westward-curving, tracks, are among the most infrequent landfalls along the U.S. East Coast. However, when these events do occur, the resulting economic and societal consequences can be devastating. A recent example is Hurricane Sandy in 2012. Multimodel ensemble seasonal hindcasts conducted with a high-atmospheric-resolution coupled prediction system based on the ECMWF operational model (Project Minerva) are used here to compile the statistics of these rare events. Minerva hindcasts are found to exhibit skill in reproducing climatological characteristics of the mid-Atlantic TC landfalls particularly at the highest atmospheric horizontal spectral resolution of T1279 (16-km grid spacing). Historical forecasts are further interrogated to identify regional and large-scale environmental conditions associated with these rare TC tracks to better quantify their predictability on synoptic time scales, and their dependence on model resolution. Evolution of the large-scale atmospheric flow patterns leading to mid-Atlantic TC landfalls is analyzed using local finite-amplitude wave activity (LWA). We have identified large-amplitude quasi-stationary features in the LWA and sea surface temperature (SST) anomaly distributions that persist up to about a week leading to these land-falling events. A statistical model utilizing indices based on the LWA and SST anomalies as predictors is developed that exhibits skill (mostly at T1279) in predicting mid-Atlantic TC landfalls several days in advance. Implications of these results for longer time-scale predictions of mid-Atlantic TC landfalls including climate change projections are discussed.

     
    more » « less
  5. Abstract

    We investigate the predictability of the sign of daily southeastern U.S. (SEUS) precipitation anomalies associated with simultaneous predictors of large-scale climate variability using machine learning models. Models using index-based climate predictors and gridded fields of large-scale circulation as predictors are utilized. Logistic regression (LR) and fully connected neural networks using indices of climate phenomena as predictors produce neither accurate nor reliable predictions, indicating that the indices themselves are not good predictors. Using gridded fields as predictors, an LR and convolutional neural network (CNN) are more accurate than the index-based models. However, only the CNN can produce reliable predictions that can be used to identify forecasts of opportunity. Using explainable machine learning we identify which variables and grid points of the input fields are most relevant for confident and correct predictions in the CNN. Our results show that the local circulation is most important as represented by maximum relevance of 850-hPa geopotential heights and zonal winds to making skillful, high-probability predictions. Corresponding composite anomalies identify connections with El Niño–Southern Oscillation during winter and the Atlantic multidecadal oscillation and North Atlantic subtropical high during summer.

     
    more » « less