ABSTRACT The prediction of extreme events in time series is a fundamental problem arising in many financial, scientific, engineering, and other applications. We begin by establishing a general Neyman–Pearson‐type characterization of optimal extreme event predictors in terms of density ratios. This yields new insights and several closed‐form optimal extreme event predictors for additive models. These results naturally extend to time series, where we study optimal extreme event prediction for both light‐ and heavy‐tailed autoregressive and moving average models. Using a uniform law of large numbers for ergodic time series, we establish the asymptotic optimality of an empirical version of the optimal predictor for autoregressive models. Using multivariate regular variation, we obtain an expression for the optimal extremal precision in heavy‐tailed infinite moving averages, which provides theoretical bounds on the ability to predict extremes in this general class of models. We address the important problem of predicting solar flares by applying our theory and methodology to a state‐of‐the‐art time series consisting of solar soft x‐ray flux measurements. Our results demonstrate the success and limitations in solar flare forecasting of long‐memory autoregressive models and long‐range‐dependent, heavy‐tailed FARIMA models.
more »
« less
Limits to extreme event forecasting in chaotic systems
Predicting extreme events in chaotic systems, characterized by rare but intensely fluctuating properties, is of great importance due to their impact on the performance and reliability of a wide range of systems. Some examples include weather forecasting, traffic management, power grid operations, and financial market analysis, to name a few. Methods of increasing sophistication have been developed to forecast events in these systems. However, the boundaries that define the maximum accuracy of forecasting tools are still largely unexplored from a theoretical standpoint. Here, we address the question: What is the minimum possible error in the prediction of extreme events in complex, chaotic systems? We derive the minimum probability of error in extreme event forecasting along with its information-theoretic lower and upper bounds. These bounds are universal for a given problem, in that they hold regardless of the modeling approach for extreme event prediction: from traditional linear regressions to sophisticated neural network models. The limits in predictability are obtained from the cost-sensitive Fano’s and Hellman’s inequalities using the Rényi entropy. The results are also connected to Takens’ embedding theorem using the information can’t hurt inequality. Finally, the probability of error for a forecasting model is decomposed into three sources: uncertainty in the initial conditions, hidden variables, and suboptimal modeling assumptions. The latter allows us to assess whether prediction models are operating near their maximum theoretical performance or if further improvements are possible. The bounds are applied to the prediction of extreme events in the Rössler system and the Kolmogorov flow.
more »
« less
- Award ID(s):
- 2140775
- PAR ID:
- 10541403
- Publisher / Repository:
- Elsevier
- Date Published:
- Journal Name:
- Physica D: Nonlinear Phenomena
- Volume:
- 467
- Issue:
- C
- ISSN:
- 0167-2789
- Page Range / eLocation ID:
- 134246
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Aerosols are important modulators of the precipitation-generating process, with their concentrations potentially affecting the precipitation process in extreme events. Existing literature suggests that, through microphysical processes, additional aerosols lead to a larger number of smaller cloud droplets, which eventually redistributes the latent heat and the precipitation process. This research addresses the question of how sensitive the spatial and temporal patterns of heavy precipitation events are to aerosol concentration. National Centers for Environmental Prediction (NCEP) Global Data Assimilation System (GDAS) final (FNL) data were used as input to the Weather Research and Forecasting (WRF) model, to simulate the case study of the catastrophic 2016 flood in Louisiana, USA, for three aerosol loading scenarios: virtually clean, average, and very dirty, corresponding to 0.1×, 1×, and 10× the climatological aerosol concentration. Overall, for the extreme precipitation event in Baton Rouge, Louisiana, in August 2016, increasing aerosol concentrations were associated with 1) a shifted peak precipitation period; 2) a more intense and extreme precipitation event in a more confined area; 3) greater maximum precipitation. Results are important in improving forecast models of extreme precipitation events, thereby further protecting life and property, and more comprehensively understanding the role of aerosols in heavy precipitation events.more » « less
-
Breathing in fine particulate matter of diameter less than 2.5 µm (PM2.5) greatly increases an individual’s risk of cardiovascular and respiratory diseases. As climate change progresses, extreme weather events, including wildfires, are expected to increase, exacerbating air pollution. However, models often struggle to capture extreme pollution events due to the rarity of high PM2.5 levels in training datasets. To address this, we implemented cluster-based undersampling and trained Transformer models to improve extreme event prediction using various cutoff thresholds (12.1 µg/m3 and 35.5 µg/m3) and partial sampling ratios (10/90, 20/80, 30/70, 40/60, 50/50). Our results demonstrate that the 35.5 µg/m3 threshold, paired with a 20/80 partial sampling ratio, achieved the best performance, with an RMSE of 2.080, MAE of 1.386, and R2 of 0.914, particularly excelling in forecasting high PM2.5 events. Overall, models trained on augmented data significantly outperformed those trained on original data, highlighting the importance of resampling techniques in improving air quality forecasting accuracy, especially for high-pollution scenarios. These findings provide critical insights into optimizing air quality forecasting models, enabling more reliable predictions of extreme pollution events. By advancing the ability to forecast high PM2.5 levels, this study contributes to the development of more informed public health and environmental policies to mitigate the impacts of air pollution, and advanced the technology for building better air quality digital twins.more » « less
-
Abstract Solar energetic particle (SEP) events, in particular high-energy-range SEP events, pose significant risks to space missions, astronauts, and technological infrastructure. Accurate prediction of these high-impact events is crucial for mitigating potential hazards. In this study, we present an end-to-end ensemble machine learning (ML) framework for the prediction of high-impact ∼100 MeV SEP events. Our approach leverages diverse data modalities sourced from the Solar and Heliospheric Observatory and the Geostationary Operational Environmental Satellite integrating extracted active region polygons from solar extreme ultraviolet (EUV) imagery, time-series proton flux measurements, sunspot activity data, and detailed active region characteristics. To quantify the predictive contribution of each data modality (e.g., EUV or time series), we independently evaluate them using a range of ML models to assess their performance in forecasting SEP events. Finally, to enhance the SEP predictive performance, we train an ensemble learning model that combines all the models trained on individual data modalities, leveraging the strengths of each data modality. Our proposed ensemble approach shows promising performance, achieving a recall of 0.80 and 0.75 in balanced and imbalanced settings, respectively, underscoring the effectiveness of multimodal data integration for robust SEP event prediction and enhanced forecasting capabilities.more » « less
-
Abstract Extreme weather events have significant consequences, dominating the impact of climate on society. While high‐resolution weather models can forecast many types of extreme events on synoptic timescales, long‐term climatological risk assessment is an altogether different problem. A once‐in‐a‐century event takes, on average, 100 years of simulation time to appear just once, far beyond the typical integration length of a weather forecast model. Therefore, this task is left to cheaper, but less accurate, low‐resolution or statistical models. But there is untapped potential in weather model output: despite being short in duration, weather forecast ensembles are produced multiple times a week. Integrations are launched with independent perturbations, causing them to spread apart over time and broadly sample phase space. Collectively, these integrations add up to thousands of years of data. We establish methods to extract climatological information from these short weather simulations. Using ensemble hindcasts by the European Center for Medium‐range Weather Forecasting archived in the subseasonal‐to‐seasonal (S2S) database, we characterize sudden stratospheric warming (SSW) events with multi‐centennial return times. Consistent results are found between alternative methods, including basic counting strategies and Markov state modeling. By carefully combining trajectories together, we obtain estimates of SSW frequencies and their seasonal distributions that are consistent with reanalysis‐derived estimates for moderately rare events, but with much tighter uncertainty bounds, and which can be extended to events of unprecedented severity that have not yet been observed historically. These methods hold potential for assessing extreme events throughout the climate system, beyond this example of stratospheric extremes.more » « less
An official website of the United States government

