skip to main content

Title: Graph-Guided Regularized Regression of Pacific Ocean Climate Variables to Increase Predictive Skill of Southwestern U.S. Winter Precipitation
Abstract Understanding the physical drivers of seasonal hydroclimatic variability and improving predictive skill remains a challenge with important socioeconomic and environmental implications for many regions around the world. Physics-based deterministic models show limited ability to predict precipitation as the lead time increases, due to imperfect representation of physical processes and incomplete knowledge of initial conditions. Similarly, statistical methods drawing upon established climate teleconnections have low prediction skill due to the complex nature of the climate system. Recently, promising data-driven approaches have been proposed, but they often suffer from overparameterization and overfitting due to the short observational record, and they often do not account for spatiotemporal dependencies among covariates (i.e., predictors such as sea surface temperatures). This study addresses these challenges via a predictive model based on a graph-guided regularizer that simultaneously promotes similarity of predictive weights for highly correlated covariates and enforces sparsity in the covariate domain. This approach both decreases the effective dimensionality of the problem and identifies the most predictive features without specifying them a priori. We use large ensemble simulations from a climate model to construct this regularizer, reducing the structural uncertainty in the estimation. We apply the learned model to predict winter precipitation in the southwestern more » United States using sea surface temperatures over the entire Pacific basin, and demonstrate its superiority compared to other regularization approaches and statistical models informed by known teleconnections. Our results highlight the potential to combine optimally the space–time structure of predictor variables learned from climate models with new graph-based regularizers to improve seasonal prediction. « less
Authors:
; ; ; ; ; ; ;
Award ID(s):
1928724 1839441 1839336
Publication Date:
NSF-PAR ID:
10209944
Journal Name:
Journal of Climate
Volume:
34
Issue:
2
Page Range or eLocation-ID:
737 to 754
ISSN:
0894-8755
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Subseasonal-to-seasonal (S2S) precipitation prediction in boreal spring and summer months, which contains a significant number of high-signal events, is scientifically challenging and prediction skill has remained poor for years. Tibetan Plateau (TP) spring observed surface ­temperatures show a lag correlation with summer precipitation in several remote regions, but current global land–atmosphere coupled models are unable to represent this behavior due to significant errors in producing observed TP surface temperatures. To address these issues, the Global Energy and Water Exchanges (GEWEX) program launched the “Impact of Initialized Land Temperature and Snowpack on Subseasonal-to-Seasonal Prediction” (LS4P) initiative as a community effort to test the impact of land temperature in high-mountain regions on S2S prediction by climate models: more than 40 institutions worldwide are participating in this project. After using an innovative new land state initialization approach based on observed surface 2-m temperature over the TP in the LS4P experiment, results from a multimodel ensemble provide evidence for a causal relationship in the observed association between the Plateau spring land temperature and summer precipitation over several regions across the world through teleconnections. The influence is underscored by an out-of-phase oscillation between the TP and Rocky Mountain surface temperatures. This study reveals formore »the first time that high-mountain land temperature could be a substantial source of S2S precipitation predictability, and its effect is probably as large as ocean surface temperature over global “hotspot” regions identified here; the ensemble means in some “hotspots” produce more than 40% of the observed anomalies. This LS4P approach should stimulate more follow-on explorations.« less
  2. Abstract

    While most spatial data can be modeled with the assumption that distant points are uncorrelated, some problems require dependence at both far and short distances. We introduce a model to directly incorporate dependence in phenomena that influence a distant response. Spatial climate problems often have such modeling needs as data are influenced by local factors in addition to remote phenomena, known as teleconnections. Teleconnections arise from complex interactions between the atmosphere and ocean, of which the El Niño–Southern Oscillation teleconnection is a well‐known example. Our model extends the standard geostatistical modeling framework to account for effects of covariates observed on a spatially remote domain. We frame our model as an extension of spatially varying coefficient models. Connections to existing methods are highlighted, and further modeling needs are addressed by additionally drawing on spatial basis functions and predictive processes. Notably, our approach allows users to model teleconnected data without prespecifying teleconnection indices, which other methods often require. We adopt a hierarchical Bayesian framework to conduct inference and make predictions. The method is demonstrated by predicting precipitation in Colorado while accounting for local factors and teleconnection effects with Pacific Ocean sea surface temperatures. We show how the proposed model improves uponmore »standard methods for estimating teleconnection effects and discuss its utility for climate applications.

    « less
  3. Abstract We assess to what extent seven state-of-the-art dynamical prediction systems can retrospectively predict winter sea surface temperature (SST) in the subpolar North Atlantic and the Nordic seas in the period 1970–2005. We focus on the region where warm water flows poleward (i.e., the Atlantic water pathway to the Arctic) and on interannual-to-decadal time scales. Observational studies demonstrate predictability several years in advance in this region, but we find that SST skill is low with significant skill only at a lead time of 1–2 years. To better understand why the prediction systems have predictive skill or lack thereof, we assess the skill of the systems to reproduce a spatiotemporal SST pattern based on observations. The physical mechanism underlying this pattern is a propagation of oceanic anomalies from low to high latitudes along the major currents, the North Atlantic Current and the Norwegian Atlantic Current. We find that the prediction systems have difficulties in reproducing this pattern. To identify whether the misrepresentation is due to incorrect model physics, we assess the respective uninitialized historical simulations. These simulations also tend to misrepresent the spatiotemporal SST pattern, indicating that the physical mechanism is not properly simulated. However, the representation of the pattern ismore »slightly degraded in the predictions compared to historical runs, which could be a result of initialization shocks and forecast drift effects. Ways to enhance predictions could include improved initialization and better simulation of poleward circulation of anomalies. This might require model resolutions in which flow over complex bathymetry and the physics of mesoscale ocean eddies and their interactions with the atmosphere are resolved. Significance Statement In this study, we find that dynamical prediction systems and their respective climate models struggle to realistically represent ocean surface temperature variability in the eastern subpolar North Atlantic and Nordic seas on interannual-to-decadal time scales. In previous studies, ocean advection is proposed as a key mechanism in propagating temperature anomalies along the Atlantic water pathway toward the Arctic Ocean. Our analysis suggests that the predicted temperature anomalies are not properly circulated to the north; this is a result of model errors that seems to be exacerbated by the effect of initialization shocks and forecast drift. Better climate predictions in the study region will thus require improving the initialization step, as well as enhancing process representation in the climate models.« less
  4. Abstract

    Heatwaves are extreme near-surface temperature events that can have substantial impacts on ecosystems and society. Early warning systems help to reduce these impacts by helping communities prepare for hazardous climate-related events. However, state-of-the-art prediction systems can often not make accurate forecasts of heatwaves more than two weeks in advance, which are required for advance warnings. We therefore investigate the potential of statistical and machine learning methods to understand and predict central European summer heatwaves on time scales of several weeks. As a first step, we identify the most important regional atmospheric and surface predictors based on previous studies and supported by a correlation analysis: 2-m air temperature, 500-hPa geopotential, precipitation, and soil moisture in central Europe, as well as Mediterranean and North Atlantic sea surface temperatures, and the North Atlantic jet stream. Based on these predictors, we apply machine learning methods to forecast two targets: summer temperature anomalies and the probability of heatwaves for 1–6 weeks lead time at weekly resolution. For each of these two target variables, we use both a linear and a random forest model. The performance of these statistical models decays with lead time, as expected, but outperforms persistence and climatology at all lead times.more »For lead times longer than two weeks, our machine learning models compete with the ensemble mean of the European Centre for Medium-Range Weather Forecast’s hindcast system. We thus show that machine learning can help improve subseasonal forecasts of summer temperature anomalies and heatwaves.

    Significance Statement

    Heatwaves (prolonged extremely warm temperatures) cause thousands of fatalities worldwide each year. These damaging events are becoming even more severe with climate change. This study aims to improve advance predictions of summer heatwaves in central Europe by using statistical and machine learning methods. Machine learning models are shown to compete with conventional physics-based models for forecasting heatwaves more than two weeks in advance. These early warnings can be used to activate effective and timely response plans targeting vulnerable communities and regions, thereby reducing the damage caused by heatwaves.

    « less
  5. Abstract

    Forecasting the El Niño-Southern Oscillation (ENSO) has been a subject of vigorous research due to the important role of the phenomenon in climate dynamics and its worldwide socioeconomic impacts. Over the past decades, numerous models for ENSO prediction have been developed, among which statistical models approximating ENSO evolution by linear dynamics have received significant attention owing to their simplicity and comparable forecast skill to first-principles models at short lead times. Yet, due to highly nonlinear and chaotic dynamics (particularly during ENSO initiation), such models have limited skill for longer-term forecasts beyond half a year. To resolve this limitation, here we employ a new nonparametric statistical approach based on analog forecasting, called kernel analog forecasting (KAF), which avoids assumptions on the underlying dynamics through the use of nonlinear kernel methods for machine learning and dimension reduction of high-dimensional datasets. Through a rigorous connection with Koopman operator theory for dynamical systems, KAF yields statistically optimal predictions of future ENSO states as conditional expectations, given noisy and potentially incomplete data at forecast initialization. Here, using industrial-era Indo-Pacific sea surface temperature (SST) as training data, the method is shown to successfully predict the Niño 3.4 index in a 1998–2017 verification period out tomore »a 10-month lead, which corresponds to an increase of 3–8 months (depending on the decade) over a benchmark linear inverse model (LIM), while significantly improving upon the ENSO predictability “spring barrier”. In particular, KAF successfully predicts the historic 2015/16 El Niño at initialization times as early as June 2015, which is comparable to the skill of current dynamical models. An analysis of a 1300-yr control integration of a comprehensive climate model (CCSM4) further demonstrates that the enhanced predictability afforded by KAF holds over potentially much longer leads, extending to 24 months versus 18 months in the benchmark LIM. Probabilistic forecasts for the occurrence of El Niño/La Niña events are also performed and assessed via information-theoretic metrics, showing an improvement of skill over LIM approaches, thus opening an avenue for environmental risk assessment relevant in a variety of contexts.

    « less