skip to main content


Title: Subseasonal Prediction of Central European Summer Heatwaves with Linear and Random Forest Machine Learning Models
Abstract

Heatwaves are extreme near-surface temperature events that can have substantial impacts on ecosystems and society. Early warning systems help to reduce these impacts by helping communities prepare for hazardous climate-related events. However, state-of-the-art prediction systems can often not make accurate forecasts of heatwaves more than two weeks in advance, which are required for advance warnings. We therefore investigate the potential of statistical and machine learning methods to understand and predict central European summer heatwaves on time scales of several weeks. As a first step, we identify the most important regional atmospheric and surface predictors based on previous studies and supported by a correlation analysis: 2-m air temperature, 500-hPa geopotential, precipitation, and soil moisture in central Europe, as well as Mediterranean and North Atlantic sea surface temperatures, and the North Atlantic jet stream. Based on these predictors, we apply machine learning methods to forecast two targets: summer temperature anomalies and the probability of heatwaves for 1–6 weeks lead time at weekly resolution. For each of these two target variables, we use both a linear and a random forest model. The performance of these statistical models decays with lead time, as expected, but outperforms persistence and climatology at all lead times. For lead times longer than two weeks, our machine learning models compete with the ensemble mean of the European Centre for Medium-Range Weather Forecast’s hindcast system. We thus show that machine learning can help improve subseasonal forecasts of summer temperature anomalies and heatwaves.

Significance Statement

Heatwaves (prolonged extremely warm temperatures) cause thousands of fatalities worldwide each year. These damaging events are becoming even more severe with climate change. This study aims to improve advance predictions of summer heatwaves in central Europe by using statistical and machine learning methods. Machine learning models are shown to compete with conventional physics-based models for forecasting heatwaves more than two weeks in advance. These early warnings can be used to activate effective and timely response plans targeting vulnerable communities and regions, thereby reducing the damage caused by heatwaves.

 
more » « less
Award ID(s):
2115072
NSF-PAR ID:
10406713
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
American Meteorological Society
Date Published:
Journal Name:
Artificial Intelligence for the Earth Systems
Volume:
2
Issue:
2
ISSN:
2769-7525
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Prediction systems to enable Earth system predictability research on the subseasonal time scale have been developed with the Community Earth System Model, version 2 (CESM2) using two configurations that differ in their atmospheric components. One system uses the Community Atmosphere Model, version 6 (CAM6) with its top near 40 km, referred to as CESM2(CAM6). The other employs the Whole Atmosphere Community Climate Model, version 6 (WACCM6) whose top extends to ∼140 km, and it includes fully interactive tropospheric and stratospheric chemistry [CESM2(WACCM6)]. Both systems are utilized to carry out subseasonal reforecasts for the 1999–2020 period following the Subseasonal Experiment’s (SubX) protocol. Subseasonal prediction skill from both systems is compared to those of the National Oceanic and Atmospheric Administration CFSv2 and European Centre for Medium-Range Weather Forecasts (ECMWF) operational models. CESM2(CAM6) and CESM2(WACCM6) show very similar subseasonal prediction skill of 2-m temperature, precipitation, the Madden–Julian oscillation, and North Atlantic Oscillation to its previous version and to the NOAA CFSv2 model. Overall, skill of CESM2(CAM6) and CESM2(WACCM6) is a little lower than that of the ECMWF system. In addition to typical output provided by subseasonal prediction systems, CESM2 reforecasts provide comprehensive datasets for predictability research of multiple Earth system components, including three-dimensional output for many variables, and output specific to the mesosphere and lower-thermosphere (MLT) region from CESM2(WACCM6). It is shown that sudden stratosphere warming events, and the associated variability in the MLT, can be predicted ∼10 days in advance. Weekly real-time forecasts and reforecasts with CESM2(CAM6) and CESM2(WACCM6) are freely available.

    Significance Statement

    We describe here the design and prediction skill of two subseasonal prediction systems based on two configurations of the Community Earth System Model, version 2 (CESM2): CESM2 with the Community Atmosphere Model, version 6 [CESM2(CAM6)] and CESM 2 with Whole Atmosphere Community Climate Model, version 6 [CESM2(WACCM6)] as its atmospheric component. These two systems provide a foundation for community-model based subseasonal prediction research. The CESM2(WACCM6) system provides a novel capability to explore the predictability of the stratosphere, mesosphere, and lower thermosphere. Both CESM2(CAM6) and CESM2(WACCM6) demonstrate subseasonal surface prediction skill comparable to that of the NOAA CFSv2 model, and a little lower than that of the ECMWF forecasting system. CESM2 reforecasts provide a comprehensive dataset for predictability research of multiple aspects of the Earth system, including the whole atmosphere up to 140 km, land, and sea ice. Weekly real-time forecasts, reforecasts, and models are publicly available.

     
    more » « less
  2. Abstract

    Summertime heavy rainfall and its resultant floods are among the most harmful natural hazards in the US Midwest, one of the world's primary crop production areas. However, seasonal forecasts of heavy rain, currently based on preseason sea surface temperature anomalies (SSTAs), remain unsatisfactory. Here, we present evidence that sea surface salinity anomalies (SSSAs) over the tropical western Pacific and subtropical North Atlantic are skillful predictors of summer time heavy rainfall one season ahead. A one standard deviation change in tropical western Pacific SSSA is associated with a 1.8 mm day−1increase in local precipitation, which excites a teleconnection pattern to extratropical North Pacific. Via extratropical air‐sea interaction and long memory of midlatitude SSTA, a wave train favorable for US Midwest heavy rain is induced. Combined with soil moisture feedbacks bridging the springtime North Atlantic salinity, the SSSA‐based statistical prediction model improves Midwest heavy rainfall forecasts by 92%, complementing existing SSTA‐based frameworks.

     
    more » « less
  3. Background:

    Short-term forecasts of infectious disease burden can contribute to situational awareness and aid capacity planning. Based on best practice in other fields and recent insights in infectious disease epidemiology, one can maximise the predictive performance of such forecasts if multiple models are combined into an ensemble. Here, we report on the performance of ensembles in predicting COVID-19 cases and deaths across Europe between 08 March 2021 and 07 March 2022.

    Methods:

    We used open-source tools to develop a public European COVID-19 Forecast Hub. We invited groups globally to contribute weekly forecasts for COVID-19 cases and deaths reported by a standardised source for 32 countries over the next 1–4 weeks. Teams submitted forecasts from March 2021 using standardised quantiles of the predictive distribution. Each week we created an ensemble forecast, where each predictive quantile was calculated as the equally-weighted average (initially the mean and then from 26th July the median) of all individual models’ predictive quantiles. We measured the performance of each model using the relative Weighted Interval Score (WIS), comparing models’ forecast accuracy relative to all other models. We retrospectively explored alternative methods for ensemble forecasts, including weighted averages based on models’ past predictive performance.

    Results:

    Over 52 weeks, we collected forecasts from 48 unique models. We evaluated 29 models’ forecast scores in comparison to the ensemble model. We found a weekly ensemble had a consistently strong performance across countries over time. Across all horizons and locations, the ensemble performed better on relative WIS than 83% of participating models’ forecasts of incident cases (with a total N=886 predictions from 23 unique models), and 91% of participating models’ forecasts of deaths (N=763 predictions from 20 models). Across a 1–4 week time horizon, ensemble performance declined with longer forecast periods when forecasting cases, but remained stable over 4 weeks for incident death forecasts. In every forecast across 32 countries, the ensemble outperformed most contributing models when forecasting either cases or deaths, frequently outperforming all of its individual component models. Among several choices of ensemble methods we found that the most influential and best choice was to use a median average of models instead of using the mean, regardless of methods of weighting component forecast models.

    Conclusions:

    Our results support the use of combining forecasts from individual models into an ensemble in order to improve predictive performance across epidemiological targets and populations during infectious disease epidemics. Our findings further suggest that median ensemble methods yield better predictive performance more than ones based on means. Our findings also highlight that forecast consumers should place more weight on incident death forecasts than incident case forecasts at forecast horizons greater than 2 weeks.

    Funding:

    AA, BH, BL, LWa, MMa, PP, SV funded by National Institutes of Health (NIH) Grant 1R01GM109718, NSF BIG DATA Grant IIS-1633028, NSF Grant No.: OAC-1916805, NSF Expeditions in Computing Grant CCF-1918656, CCF-1917819, NSF RAPID CNS-2028004, NSF RAPID OAC-2027541, US Centers for Disease Control and Prevention 75D30119C05935, a grant from Google, University of Virginia Strategic Investment Fund award number SIF160, Defense Threat Reduction Agency (DTRA) under Contract No. HDTRA1-19-D-0007, and respectively Virginia Dept of Health Grant VDH-21-501-0141, VDH-21-501-0143, VDH-21-501-0147, VDH-21-501-0145, VDH-21-501-0146, VDH-21-501-0142, VDH-21-501-0148. AF, AMa, GL funded by SMIGE - Modelli statistici inferenziali per governare l'epidemia, FISR 2020-Covid-19 I Fase, FISR2020IP-00156, Codice Progetto: PRJ-0695. AM, BK, FD, FR, JK, JN, JZ, KN, MG, MR, MS, RB funded by Ministry of Science and Higher Education of Poland with grant 28/WFSN/2021 to the University of Warsaw. BRe, CPe, JLAz funded by Ministerio de Sanidad/ISCIII. BT, PG funded by PERISCOPE European H2020 project, contract number 101016233. CP, DL, EA, MC, SA funded by European Commission - Directorate-General for Communications Networks, Content and Technology through the contract LC-01485746, and Ministerio de Ciencia, Innovacion y Universidades and FEDER, with the project PGC2018-095456-B-I00. DE., MGu funded by Spanish Ministry of Health / REACT-UE (FEDER). DO, GF, IMi, LC funded by Laboratory Directed Research and Development program of Los Alamos National Laboratory (LANL) under project number 20200700ER. DS, ELR, GG, NGR, NW, YW funded by National Institutes of General Medical Sciences (R35GM119582; the content is solely the responsibility of the authors and does not necessarily represent the official views of NIGMS or the National Institutes of Health). FB, FP funded by InPresa, Lombardy Region, Italy. HG, KS funded by European Centre for Disease Prevention and Control. IV funded by Agencia de Qualitat i Avaluacio Sanitaries de Catalunya (AQuAS) through contract 2021-021OE. JDe, SMo, VP funded by Netzwerk Universitatsmedizin (NUM) project egePan (01KX2021). JPB, SH, TH funded by Federal Ministry of Education and Research (BMBF; grant 05M18SIA). KH, MSc, YKh funded by Project SaxoCOV, funded by the German Free State of Saxony. Presentation of data, model results and simulations also funded by the NFDI4Health Task Force COVID-19 (https://www.nfdi4health.de/task-force-covid-19-2) within the framework of a DFG-project (LO-342/17-1). LP, VE funded by Mathematical and Statistical modelling project (MUNI/A/1615/2020), Online platform for real-time monitoring, analysis and management of epidemic situations (MUNI/11/02202001/2020); VE also supported by RECETOX research infrastructure (Ministry of Education, Youth and Sports of the Czech Republic: LM2018121), the CETOCOEN EXCELLENCE (CZ.02.1.01/0.0/0.0/17-043/0009632), RECETOX RI project (CZ.02.1.01/0.0/0.0/16-013/0001761). NIB funded by Health Protection Research Unit (grant code NIHR200908). SAb, SF funded by Wellcome Trust (210758/Z/18/Z).

     
    more » « less
  4. Abstract

    We investigate the predictability of the sign of daily southeastern U.S. (SEUS) precipitation anomalies associated with simultaneous predictors of large-scale climate variability using machine learning models. Models using index-based climate predictors and gridded fields of large-scale circulation as predictors are utilized. Logistic regression (LR) and fully connected neural networks using indices of climate phenomena as predictors produce neither accurate nor reliable predictions, indicating that the indices themselves are not good predictors. Using gridded fields as predictors, an LR and convolutional neural network (CNN) are more accurate than the index-based models. However, only the CNN can produce reliable predictions that can be used to identify forecasts of opportunity. Using explainable machine learning we identify which variables and grid points of the input fields are most relevant for confident and correct predictions in the CNN. Our results show that the local circulation is most important as represented by maximum relevance of 850-hPa geopotential heights and zonal winds to making skillful, high-probability predictions. Corresponding composite anomalies identify connections with El Niño–Southern Oscillation during winter and the Atlantic multidecadal oscillation and North Atlantic subtropical high during summer.

     
    more » « less
  5. Abstract Composite analysis is used to examine the physical processes that drive the growth and decay of the surface air temperature anomaly pattern associated with the North Atlantic Oscillation (NAO). Using the thermodynamic energy equation that the European Centre for Medium-Range Weather Forecasts implements in their reanalysis model, we show that advection of the climatological temperature field by the anomalous wind drives the surface air temperature anomaly pattern for both NAO phases. Diabatic processes exist in strong opposition to this temperature advection and eventually cause the surface air temperature anomalies to return to their climatological values. Specifically, over Greenland, Europe, and the United States, longwave heating/cooling opposes horizontal temperature advection while over northern Africa vertical mixing opposes horizontal temperature advection. Despite the pronounced spatial correspondence between the skin temperature and surface air temperature anomaly patterns, the physical processes that drive these two temperature anomalies associated with the NAO are found to be distinct. The skin temperature anomaly pattern is driven by downward longwave radiation whereas stated above, the surface air temperature anomaly pattern is driven by horizontal temperature advection. This implies that the surface energy budget, although a useful diagnostic tool for understanding skin temperature changes, should not be used to understand surface air temperature changes. 
    more » « less