Abstract Solar energetic particles (SEPs) are associated with extreme solar events that can cause major damage to space- and ground-based life and infrastructure. High-intensity SEP events, particularly ∼100 MeV SEP events, can pose severe health risks for astronauts owing to radiation exposure and affect Earth’s orbiting satellites (e.g., Landsat and the International Space Station). A major challenge in the SEP event prediction task is the lack of adequate SEP data because of the rarity of these events. In this work, we aim to improve the prediction of ∼30, ∼60, and ∼100 MeV SEP events by synthetically increasing the number of SEP samples. We explore the use of a univariate and multivariate time series of proton flux data as input to machine-learning-based prediction methods, such as time series forest (TSF). Our study covers solar cycles 22, 23, and 24. Our findings show that using data augmentation methods, such as the synthetic minority oversampling technique, remarkably increases the accuracy and F1-score of the classifiers used in this research, especially for TSF, where the average accuracy increased by 20%, reaching around 90% accuracy in the ∼100 MeV SEP prediction task. We also achieved higher prediction accuracy when using the multivariate time series data of the proton flux. Finally, we build a pipeline framework for our best-performing model, TSF, and provide a comprehensive hierarchical classification of the ∼100, ∼60, and ∼30 MeV and non-SEP prediction scenarios.
more »
« less
This content will become publicly available on February 7, 2026
Predicting Solar Energetic Particle Events with Time Series Shapelets
Abstract Solar energetic particle (SEP) events pose significant risks to both space and ground-level infrastructure, as well as to human health in space. Understanding and predicting these events are critical for mitigating their potential impacts. In this paper, we address the challenge of predicting SEP events using proton flux data. We leverage some of the most recent advances in time series data mining, such as shapelets and the matrix profile, to propose a simple and easily understandable prediction approach. Our objective is to mitigate the interpretability challenges inherent to most machine learning models and to show that other methods exist that can not only yield accurate forecasts but also facilitate exploration and insight generation within the data domain. For this purpose, we construct a multivariate time series data set consisting of proton flux data recorded by the National Oceanic and Atmospheric Administration's geosynchronous orbit Earth-observing satellite. Then, we use our proposed approach to mine shapelets and make predictions using a random forest classifier. We demonstrate that our approach rivals state-of-the-art SEP prediction, offering superior interpretability and the ability to predict SEP events before their parent eruptive flares.
more »
« less
- PAR ID:
- 10582814
- Publisher / Repository:
- American Astronomical Society (AAS)
- Date Published:
- Journal Name:
- The Astrophysical Journal
- Volume:
- 980
- Issue:
- 1
- ISSN:
- 0004-637X
- Page Range / eLocation ID:
- 128
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Solar energetic particle (SEP) events, in particular high-energy-range SEP events, pose significant risks to space missions, astronauts, and technological infrastructure. Accurate prediction of these high-impact events is crucial for mitigating potential hazards. In this study, we present an end-to-end ensemble machine learning (ML) framework for the prediction of high-impact ∼100 MeV SEP events. Our approach leverages diverse data modalities sourced from the Solar and Heliospheric Observatory and the Geostationary Operational Environmental Satellite integrating extracted active region polygons from solar extreme ultraviolet (EUV) imagery, time-series proton flux measurements, sunspot activity data, and detailed active region characteristics. To quantify the predictive contribution of each data modality (e.g., EUV or time series), we independently evaluate them using a range of ML models to assess their performance in forecasting SEP events. Finally, to enhance the SEP predictive performance, we train an ensemble learning model that combines all the models trained on individual data modalities, leveraging the strengths of each data modality. Our proposed ensemble approach shows promising performance, achieving a recall of 0.80 and 0.75 in balanced and imbalanced settings, respectively, underscoring the effectiveness of multimodal data integration for robust SEP event prediction and enhanced forecasting capabilities.more » « less
-
Abstract Solar energetic particle (SEP) events, originating from solar flares and Coronal Mass Ejections, present significant hazards to space exploration and technology on Earth. Accurate prediction of these high‐energy events is essential for safeguarding astronauts, spacecraft, and electronic systems. In this study, we conduct an in‐depth investigation into the application of multimodal data fusion techniques for the prediction of high‐energy SEP events, particularly ∼100 MeV events. Our research utilizes six machine learning (ML) models, each finely tuned for time series analysis, including Univariate Time Series (UTS), Image‐based model (Image), Univariate Feature Concatenation (UFC), Univariate Deep Concatenation (UDC), Univariate Deep Merge (UDM), and Univariate Score Concatenation (USC). By combining time series proton flux data with solar X‐ray images, we exploit complementary insights into the underlying solar phenomena responsible for SEP events. Rigorous evaluation metrics, including accuracy, F1‐score, and other established measures, are applied, along withK‐fold cross‐validation, to ensure the robustness and generalization of our models. Additionally, we explore the influence of observation window sizes on classification accuracy.more » « less
-
Abstract Solar energetic particle (SEP) events and their major subclass, solar proton events (SPEs), can have unfavorable consequences on numerous aspects of life and technology, making them one of the most harmful effects of solar activity. Garnering knowledge preceding such events by studying operational data flows is essential for their forecasting. Considering only solar cycle (SC) 24 in our previous study, we found that it may be sufficient to only utilize proton and soft X-ray (SXR) parameters for SPE forecasts. Here, we report a catalog recording ≥10 MeV ≥10 particle flux unit SPEs with their properties, spanning SCs 22–24, using NOAA’s Geostationary Operational Environmental Satellite flux data. We report an additional catalog of daily proton and SXR flux statistics for this period, employing it to test the application of machine learning (ML) on the prediction of SPEs using a support vector machine (SVM) and extreme gradient boosting (XGBoost). We explore the effects of training models with data from oneandtwo SCs, evaluating how transferable a model might be across different time periods. XGBoost proved to be more accurate than SVMs for almost every test considered, while also outperforming operational SWPC NOAA predictions and a persistence forecast. Interestingly, training done with SC 24 produces weaker true skill statistic and Heidke skill scores2, even when paired with SC 22 or SC 23, indicating transferability issues. This work contributes toward validating forecasts using long-spanning data—an understudied area in SEP research that should be considered to verify the cross cycle robustness of ML-driven forecasts.more » « less
-
Abstract The high energy particles originating from the Sun, known as solar energetic particles (SEPs), contribute significantly to the space radiation environment, posing serious threats to astronauts and scientific instruments on board spacecraft. The mechanism that accelerates the SEPs to the observed energy ranges, their transport in the inner heliosphere, and the influence of suprathermal seed particle spectrum are open questions in heliophysics. Accurate predictions of the occurrences of SEP events well in advance are necessary to mitigate their adverse effects but prediction based on first principle models still remains a challenge. In this scenario, adopting a machine learning approach to SEP modeling and prediction is desirable. However, the lack of a balanced database of SEP events restrains this approach. We addressed this limitation by generating large data sets of synthetic SEP events sampled from the physics‐based model, Energetic Particle Radiation Environment Module (EPREM). Using this data, we developed neural networks‐based surrogate models to study the seed population parameter space. Our models, EPREM‐S, run thousands to millions of times faster (depending on computer hardware), making simulation‐based inference workflows practicable in SEP studies while providing predictive uncertainty estimates using a deep ensemble approach.more » « less
An official website of the United States government
