skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Time-Series Feature Selection for Solar Flare Forecasting
Solar flares are significant occurrences in solar physics, impacting space weather and terrestrial technologies. Accurate classification of solar flares is essential for predicting space weather and minimizing potential disruptions to communication, navigation, and power systems. This study addresses the challenge of selecting the most relevant features from multivariate time-series data, specifically focusing on solar flares. We employ methods such as Mutual Information (MI), Minimum Redundancy Maximum Relevance (mRMR), and Euclidean Distance to identify key features for classification. Recognizing the performance variability of different feature selection techniques, we introduce an ensemble approach to compute feature weights. By combining outputs from multiple methods, our ensemble method provides a more comprehensive understanding of the importance of features. Our results show that the ensemble approach significantly improves classification performance, achieving values 0.15 higher in True Skill Statistic (TSS) values compared to individual feature selection methods. Additionally, our method offers valuable insights into the underlying physical processes of solar flares, leading to more effective space weather forecasting and enhanced mitigation strategies for communication, navigation, and power system disruptions.  more » « less
Award ID(s):
2240022 2204363 2301397
PAR ID:
10573356
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
MDPI
Date Published:
Journal Name:
Universe
Volume:
10
Issue:
9
ISSN:
2218-1997
Page Range / eLocation ID:
373
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Photospheric magnetic field parameters are frequently used to analyze and predict solar events. Observation of these parameters over time, i.e., representing solar events by multivariate time-series (MVTS) data, can determine relationships between magnetic field states in active regions and extreme solar events, e.g., solar flares. We can improve our understanding of these events by selecting the most relevant parameters that give the highest predictive performance. In this study, we propose a two-step incremental feature selection method for MVTS data using a deep-learning model based on long short-term memory (LSTM) networks. First, each MVTS feature (magnetic field parameter) is evaluated individually by a univariate sequence classifier utilizing an LSTM network. Then, the top performing features are combined to produce input for an LSTM-based multivariate sequence classifier. Finally, we tested the discrimination ability of the selected features by training downstream classifiers, e.g., Minimally Random Convolutional Kernel Transform and support vector machine. We performed our experiments using a benchmark data set for flare prediction known as Space Weather Analytics for Solar Flares. We compared our proposed method with three other baseline feature selection methods and demonstrated that our method selects more discriminatory features compared to other methods. Due to the imbalanced nature of the data, primarily caused by the rarity of minority flare classes (e.g., the X and M classes), we used the true skill statistic as the evaluation metric. Finally, we reported the set of photospheric magnetic field parameters that give the highest discrimination performance in predicting flare classes. 
    more » « less
  2. Abstract The accurate prediction of solar flares is crucial due to their risks to astronauts, space equipment, and satellite communication systems. Our research enhances solar flare prediction by employing sophisticated data preprocessing and sampling techniques for the Space Weather Analytics for Solar Flares (SWAN-SF) data set, a rich source of multivariate time series data of solar active regions. Our study adopts a multifaceted approach encompassing four key methodologies. Initially, we address over 10 million missing values in the SWAN-SF data set through our innovative imputation technique called fast Pearson correlation-based k-nearest neighbors imputation. Subsequently, we propose a precise normalization technique, called LSBZM normalization, tailored for time series data, merging various strategies (log, square root, Box–Cox, Z-score, and min–max) to uniformly scale the data set's 24 attributes (photospheric magnetic field parameters), addressing issues such as skewness. We also explore the “near decision boundary sample removal” technique to enhance the classification performance of the data set by effectively resolving the challenge of class overlap. Finally, a pivotal aspect of our research is a thorough evaluation of diverse oversampling and undersampling methods, including SMOTE, ADASYN, Gaussian noise injection, TimeGAN, Tomek links, and random undersampling, to counter the severe imbalance in the SWAN-SF data set, notably a 60:1 ratio of major (X and M) to minor (C, B, and FQ) flaring events in binary classification. To demonstrate the effectiveness of our methods, we use eight classification algorithms, including advanced deep-learning-based architectures. Our analysis shows significant true skill statistic scores, underscoring the importance of data preprocessing and sampling in time-series-based solar flare prediction. 
    more » « less
  3. To guide the selection of probabilistic solar power forecasting methods for day-ahead power grid operations, the performance of four methods, i.e., Bayesian model averaging (BMA), Analog ensemble (AnEn), ensemble learning method (ELM), and persistence ensemble (PerEn) is compared in this paper. A real-world hourly solar generation dataset from a rooftop solar plant is used to train and validate the methods under clear, partially cloudy, and overcast weather conditions. Comparisons have been made on a one-year testing set using popular performance metrics for probabilistic forecasts. It is found that the ELM method outperforms other methods by offering better reliability, higher resolution, and narrower prediction interval width under all weather conditions with a slight compromise in accuracy. The BMA method performs well under overcast and partially cloudy weather conditions, although it is outperformed by the ELM method under clear conditions. 
    more » « less
  4. The purpose of this study is to provide a comprehensive resource for the selection of data representations for machine learning-oriented models and components in solar flare prediction tasks. Major solar flares occurring in the solar corona and heliosphere can bring potential destructive consequences, posing significant risks to astronauts, space stations, electronics, communication systems, and numerous technological infrastructures. For this reason, the accurate detection of major flares is essential for mitigating these hazards and ensuring the safety of our technology-dependent society. In response, leveraging machine learning techniques for predicting solar flares has emerged as a significant application within the realm of data science, relying on sensor data collected from solar active region photospheric magnetic fields by space- and ground-based observatories. In this research, three distinct solar flare prediction strategies utilizing the photospheric magnetic field parameter-based multivariate time series dataset are evaluated, with a focus on data representation techniques. Specifically, we examine vector-based, time series-based, and graph-based approaches to identify the most effective data representation for capturing key characteristics of the dataset. The vector-based approach condenses multivariate time series into a compressed vector form, the time series representation leverages temporal patterns, and the graph-based method models interdependencies between magnetic field parameters. The results demonstrate that the vector representation approach exhibits exceptional robustness in predicting solar flares, consistently yielding strong and reliable classification outcomes by effectively encapsulating the intricate relationships within photospheric magnetic field data when coupled with appropriate downstream machine learning classifiers. 
    more » « less
  5. Barambones, Oscar (Ed.)
    Accurate quantification of uncertainty in solar photovoltaic (PV) generation forecasts is imperative for the efficient and reliable operation of the power grid. In this paper, a data-driven non-parametric probabilistic method based on the Naïve Bayes (NB) classification algorithm and Dempster–Shafer theory (DST) of evidence is proposed for day-ahead probabilistic PV power forecasting. This NB-DST method extends traditional deterministic solar PV forecasting methods by quantifying the uncertainty of their forecasts by estimating the cumulative distribution functions (CDFs) of their forecast errors and forecast variables. The statistical performance of this method is compared with the analog ensemble method and the persistence ensemble method under three different weather conditions using real-world data. The study results reveal that the proposed NB-DST method coupled with an artificial neural network model outperforms the other methods in that its estimated CDFs have lower spread, higher reliability, and sharper probabilistic forecasts with better accuracy. 
    more » « less