skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on March 10, 2026

Title: The Impacts of Magnetogram Projection Effects on Solar Flare Forecasting
Abstract This work explores the impacts of magnetogram projection effects on machine-learning-based solar flare forecasting models. Utilizing a methodology proposed by D. A. Falconer et al., we correct for projection effects present in Georgia State University’s Space Weather Analytics for Solar Flares benchmark data set. We then train and test a support vector machine classifier on the corrected and uncorrected data, comparing differences in performance. Additionally, we provide insight into several other methodologies that mitigate projection effects, such as stacking ensemble classifiers and active region location-informed models. Our analysis shows that data corrections slightly increase both the true-positive (correctly predicted flaring samples) and false-positive (nonflaring samples predicted as flaring) prediction rates, averaging a few percent. Similarly, changes in performance metrics are minimal for the stacking ensemble and location-based model. This suggests that a more complicated correction methodology may be needed to see improvements. It may also indicate inherent limitations when using magnetogram data for flare forecasting.  more » « less
Award ID(s):
1936361
PAR ID:
10635367
Author(s) / Creator(s):
; ;
Publisher / Repository:
The Astrophysical Journal
Date Published:
Journal Name:
The Astrophysical Journal
Volume:
981
Issue:
2
ISSN:
0004-637X
Page Range / eLocation ID:
200
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract This study explores the behavior of machine-learning-based flare forecasting models deployed in a simulated operational environment. Using Georgia State University’s Space Weather Analytics for Solar Flares benchmark data set, we examine the impacts of training methodology and the solar cycle on decision tree, support vector machine, and multilayer perceptron performance. We implement our classifiers using three temporal training windows: stationary, rolling, and expanding. The stationary window trains models using a single set of data available before the first forecasting instance, which remains constant throughout the solar cycle. The rolling window trains models using data from a constant time interval before the forecasting instance, which moves with the solar cycle. Finally, the expanding window trains models using all available data before the forecasting instance. For each window, a number of input features (1, 5, 10, 25, 50, and 120) and temporal sizes (5, 8, 11, 14, 17, and 20 months) were tested. To our surprise, we found that, for a window of 20 months, skill scores were comparable regardless of the window type, feature count, and classifier selected. Furthermore, reducing the size of this window only marginally decreased stationary and rolling window performance. This implies that, given enough data, a stationary window can be chosen over other window types, eliminating the need for model retraining. Finally, a moderately strong positive correlation was found to exist between a model’s false-positive rate and the solar X-ray background flux. This suggests that the solar cycle phase has a considerable influence on forecasting. 
    more » « less
  2. Abstract Timely and accurate prediction of solar flares is a crucial task due to the danger they pose to human life and infrastructure beyond Earth’s atmosphere. Although various machine learning algorithms have been employed to improve solar flare prediction, there has been limited focus on improving performance using outlier detection. In this study, we propose the use of a tree-based outlier detection algorithm, Isolation Forest (iForest), to identify multivariate time-series instances within the flare-forecasting benchmark data set, Space Weather Analytics for Solar Flares (SWAN-SF). By removing anomalous samples from the nonflaring class (N-class) data, we observe a significant improvement in both the true skill score and the updated Heidke skill score in two separate experiments. We focus on analyzing outliers detected by iForest at a 2.4% contamination rate, considered the most effective overall. Our analysis reveals a co-occurrence between the outliers we discovered and strong flares. Additionally, we investigated the similarity between the outliers and the strong-flare data and quantified it using Kullback–Leibler divergence. This analysis demonstrates a higher similarity between our outliers and strong-flare data when compared to the similarity between the outliers and the rest of the N-class data, supporting our rationale for using outlier detection to enhance SWAN-SF data for flare prediction. Furthermore, we explore a novel approach by treating our outliers as if they belong to flaring-class data in the training phase of our machine learning, resulting in further enhancements to our models’ performance. 
    more » « less
  3. Solar flares are transient space weather events that pose a significant threat to space and ground-based technological systems, making their precise and reliable prediction crucial for mitigating potential impacts. This paper contributes to the growing body of research on deep learning methods for solar flare prediction, primarily focusing on highly overlooked near-limb flares and utilizing the attribution methods to provide a post hoc qualitative explanation of the model’s predictions. We present a solar flare prediction model, which is trained using hourly full-disk line-of-sight magnetogram images and employs a binary prediction mode to forecast ≥M-class flares that may occur within the following 24-hour period. To address the class imbalance, we employ a fusion of data augmentation and class weighting techniques; and evaluate the overall performance of our model using the true skill statistic (TSS) and Heidke skill score (HSS). Moreover, we applied three attribution methods, namely Guided Gradient-weighted Class Activation Mapping, Integrated Gradients, and Deep Shapley Additive Explanations, to interpret and cross-validate our model’s predictions with the explanations. Our analysis revealed that full-disk prediction of solar flares aligns with characteristics related to active regions (ARs). In particular, the key findings of this study are: (1) our deep learning models achieved an average TSS∼0.51 and HSS∼0.35, and the results further demonstrate a competent capability to predict near-limb solar flares and (2) the qualitative analysis of the model’s explanation indicates that our model identifies and uses features associated with ARs in central and near-limb locations from full-disk magnetograms to make corresponding predictions. In other words, our models learn the shape and texture-based characteristics of flaring ARs even when they are at near-limb areas, which is a novel and critical capability that has significant implications for operational forecasting. 
    more » « less
  4. Over the past two decades, machine learning and deep learning techniques for forecasting solar flares have generated great impact due to their ability to learn from a high dimensional data space. However, lack of high quality data from flaring phenomena becomes a constraining factor for such tasks. One of the methods to tackle this complex problem is utilizing trained classifiers with multivariate time series of magnetic field parameters. In this work, we compare the exceedingly popular multivariate time series classifiers applying deep learning techniques with commonly used machine learning classifiers (i.e., SVM). We intend to explore the role of data augmentation on time series oriented flare prediction techniques, specifically the deep learning-based ones. We utilize four time series data augmentation techniques and couple them with selected multivariate time series classifiers to understand how each of them affects the outcome. In the end, we show that the deep learning algorithms as well as augmentation techniques improve our classifiers performance. The resulting classifiers’ performance after augmentation outplayed the traditional flare forecasting techniques. 
    more » « less
  5. null (Ed.)
    Current operational forecasts of solar eruptions are made by human experts using a combination of qualitative shape-based classification systems and historical data about flaring frequencies. In the past decade, there has been a great deal of interest in crafting machine-learning (ML) flare-prediction methods to extract underlying patterns from a training set – e.g. a set of solar magnetogram images, each characterized by features derived from the magnetic field and labeled as to whether it was an eruption precursor. These patterns, captured by various methods (neural nets, support vector machines, etc.), can then be used to classify new images. A major challenge with any ML method is the featurization of the data: pre-processing the raw images to extract higher-level properties, such as characteristics of the magnetic field, that can streamline the training and use of these methods. It is key to choose features that are informative, from the standpoint of the task at hand. To date, the majority of ML-based solar eruption methods have used physics-based magnetic and electric field features such as the total unsigned magnetic flux, the gradients of the fields, the vertical current density, etc. In this paper, we extend the relevant feature set to include characteristics of the magnetic field that are based purely on the geometry and topology of 2D magnetogram images and show that this improves the prediction accuracy of a neural-net based flare-prediction method. 
    more » « less