skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.

Title: Improving Solar Energetic Particle Event Prediction through Multivariate Time Series Data Augmentation

Solar energetic particles (SEPs) are associated with extreme solar events that can cause major damage to space- and ground-based life and infrastructure. High-intensity SEP events, particularly ∼100 MeV SEP events, can pose severe health risks for astronauts owing to radiation exposure and affect Earth’s orbiting satellites (e.g., Landsat and the International Space Station). A major challenge in the SEP event prediction task is the lack of adequate SEP data because of the rarity of these events. In this work, we aim to improve the prediction of ∼30, ∼60, and ∼100 MeV SEP events by synthetically increasing the number of SEP samples. We explore the use of a univariate and multivariate time series of proton flux data as input to machine-learning-based prediction methods, such as time series forest (TSF). Our study covers solar cycles 22, 23, and 24. Our findings show that using data augmentation methods, such as the synthetic minority oversampling technique, remarkably increases the accuracy and F1-score of the classifiers used in this research, especially for TSF, where the average accuracy increased by 20%, reaching around 90% accuracy in the ∼100 MeV SEP prediction task. We also achieved higher prediction accuracy when using the multivariate time series data of the proton flux. Finally, we build a pipeline framework for our best-performing model, TSF, and provide a comprehensive hierarchical classification of the ∼100, ∼60, and ∼30 MeV and non-SEP prediction scenarios.

more » « less
Award ID(s):
2240022 2301397 2305781
Author(s) / Creator(s):
; ;
Publisher / Repository:
DOI PREFIX: 10.3847
Date Published:
Journal Name:
The Astrophysical Journal Supplement Series
Medium: X Size: Article No. 31
["Article No. 31"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Solar energetic particle (SEP) events and their major subclass, solar proton events (SPEs), can have unfavorable consequences on numerous aspects of life and technology, making them one of the most harmful effects of solar activity. Garnering knowledge preceding such events by studying operational data flows is essential for their forecasting. Considering only solar cycle (SC) 24 in our previous study, we found that it may be sufficient to only utilize proton and soft X-ray (SXR) parameters for SPE forecasts. Here, we report a catalog recording ≥10 MeV ≥10 particle flux unit SPEs with their properties, spanning SCs 22–24, using NOAA’s Geostationary Operational Environmental Satellite flux data. We report an additional catalog of daily proton and SXR flux statistics for this period, employing it to test the application of machine learning (ML) on the prediction of SPEs using a support vector machine (SVM) and extreme gradient boosting (XGBoost). We explore the effects of training models with data from oneandtwo SCs, evaluating how transferable a model might be across different time periods. XGBoost proved to be more accurate than SVMs for almost every test considered, while also outperforming operational SWPC NOAA predictions and a persistence forecast. Interestingly, training done with SC 24 produces weaker true skill statistic and Heidke skill scores2, even when paired with SC 22 or SC 23, indicating transferability issues. This work contributes toward validating forecasts using long-spanning data—an understudied area in SEP research that should be considered to verify the cross cycle robustness of ML-driven forecasts.

    more » « less
  2. Abstract

    The sustained gamma-ray emission (SGRE) from the Sun is a prolonged enhancement of >100 MeV gamma-ray emission that extends beyond the flare impulsive phase. The origin of the >300 MeV protons resulting in SGRE is debated, with both flares and shocks driven by coronal mass ejections (CMEs) being the suggested sites of proton acceleration. We compared the near-Sun acceleration and space speed of CMEs with “Prompt” and “Delayed” (SGRE) gamma-ray components. We found that “Delayed”-component-associated CMEs have higher initial accelerations and space speeds than “Prompt Only”-component-associated CMEs. We selected halo CMEs (HCMEs) associated with type II radio bursts (shock-driving HCMEs) and compared the average acceleration and space speed between HCME populations with or without SGRE events, major solar energetic particle (SEP) events, metric, or decameter-hectometric (DH) type II radio bursts. We found that the SGRE-producing HCMEs associated with a DH type II radio burst and/or a major SEP event have higher space speeds and especially initial accelerations than those without an SGRE event. We estimated the radial distances and speeds of the CME-driven shocks at the end time of the 2012 January 23 and March 7 SGRE events using white-light images of STEREO Heliospheric Imagers and radio dynamic spectra of Wind WAVES. The shocks were at the radial distances of 0.6–0.8 au and their speeds were high enough (≈975 km s−1and ≈750 km s−1, respectively) for high-energy particle acceleration. Therefore, we conclude that our findings support the CME-driven shock as the source of >300 MeV protons.

    more » « less
  3. Abstract

    The flux of energetic particles originating from the Sun fluctuates during the solar cycles. It depends on the number and properties of active regions (ARs) present in a single day and associated solar activities, such as solar flares and coronal mass ejections. Observational records of the Space Weather Prediction Center NOAA enable the creation of time-indexed databases containing information about ARs and particle flux enhancements, most widely known as solar energetic particle (SEP) events. In this work, we utilize the data available for solar cycles 21–24 and the initial phase of cycle 25 to perform a statistical analysis of the correlation between SEPs and properties of ARs inferred from the McIntosh and Hale classifications. We find that the complexity of the magnetic field, longitudinal location, area, and penumbra type of the largest sunspot of ARs are most correlated with the production of SEPs. It is found that most SEPs (≈60%, or 108 out of 181 considered events) were generated from an AR classified with the “k” McIntosh subclass as the second component, and these ARs are more likely to produce SEPs if they fall in a Hale class containing aδcomponent. The resulting database containing information about SEP events and ARs is publicly available and can be used for the development of machine learning models to predict the occurrence of SEPs.

    more » « less
  4. Abstract

    Photospheric magnetic field parameters are frequently used to analyze and predict solar events. Observation of these parameters over time, i.e., representing solar events by multivariate time-series (MVTS) data, can determine relationships between magnetic field states in active regions and extreme solar events, e.g., solar flares. We can improve our understanding of these events by selecting the most relevant parameters that give the highest predictive performance. In this study, we propose a two-step incremental feature selection method for MVTS data using a deep-learning model based on long short-term memory (LSTM) networks. First, each MVTS feature (magnetic field parameter) is evaluated individually by a univariate sequence classifier utilizing an LSTM network. Then, the top performing features are combined to produce input for an LSTM-based multivariate sequence classifier. Finally, we tested the discrimination ability of the selected features by training downstream classifiers, e.g., Minimally Random Convolutional Kernel Transform and support vector machine. We performed our experiments using a benchmark data set for flare prediction known as Space Weather Analytics for Solar Flares. We compared our proposed method with three other baseline feature selection methods and demonstrated that our method selects more discriminatory features compared to other methods. Due to the imbalanced nature of the data, primarily caused by the rarity of minority flare classes (e.g., the X and M classes), we used the true skill statistic as the evaluation metric. Finally, we reported the set of photospheric magnetic field parameters that give the highest discrimination performance in predicting flare classes.

    more » « less
  5. Aims : This paper presents a H2020 project aimed at developing an advanced space weather forecasting tool, combining the MagnetoHydroDynamic (MHD) solar wind and coronal mass ejection (CME) evolution modelling with solar energetic particle (SEP) transport and acceleration model(s). The EUHFORIA 2.0 project will address the geoeffectiveness of impacts and mitigation to avoid (part of the) damage, including that of extreme events, related to solar eruptions, solar wind streams, and SEPs, with particular emphasis on its application to forecast geomagnetically induced currents (GICs) and radiation on geospace. Methods : We will apply innovative methods and state-of-the-art numerical techniques to extend the recent heliospheric solar wind and CME propagation model EUHFORIA with two integrated key facilities that are crucial for improving its predictive power and reliability, namely (1) data-driven flux-rope CME models, and (2) physics-based, self-consistent SEP models for the acceleration and transport of particles along and across the magnetic field lines. This involves the novel coupling of advanced space weather models. In addition, after validating the upgraded EUHFORIA/SEP model, it will be coupled to existing models for GICs and atmospheric radiation transport models. This will result in a reliable prediction tool for radiation hazards from SEP events, affecting astronauts, passengers and crew in high-flying aircraft, and the impact of space weather events on power grid infrastructure, telecommunication, and navigation satellites. Finally, this innovative tool will be integrated into both the Virtual Space Weather Modeling Centre (VSWMC, ESA) and the space weather forecasting procedures at the ESA SSCC in Ukkel (Belgium), so that it will be available to the space weather community and effectively used for improved predictions and forecasts of the evolution of CME magnetic structures and their impact on Earth. Results : The results of the first six months of the EU H2020 project are presented here. These concern alternative coronal models, the application of adaptive mesh refinement techniques in the heliospheric part of EUHFORIA, alternative flux-rope CME models, evaluation of data-assimilation based on Karman filtering for the solar wind modelling, and a feasibility study of the integration of SEP models. 
    more » « less