Supervised Machine Learning (ML) models for solar flare prediction rely on accurate labels for a given input data set, commonly obtained from the GOES/XRS X-ray flare catalog. With increasing interest in utilizing ultraviolet (UV) and extreme ultraviolet (EUV) image data as input to these models, we seek to understand if flaring activity can be defined and quantified using EUV data alone. This would allow us to move away from the GOES single pixel measurement definition of flares and use the same data we use for flare prediction for label creation. In this work, we present a Solar Dynamics Observatory (SDO) Atmospheric Imaging Assembly (AIA)-based flare catalog covering flare of GOES X-ray magnitudes C, M and X from 2010 to 2017. We use active region (AR) cutouts of full disk AIA images to match the corresponding SDO/Helioseismic and Magnetic Imager (HMI) SHARPS (Space weather HMI Active Region Patches) that have been extensively used in ML flare prediction studies, thus allowing for labeling of AR number as well as flare magnitude and timing. Flare start, peak, and end times are defined using a peak-finding algorithm on AIA time series data obtained by summing the intensity across the AIA cutouts. An extremely randomized trees (ERT) regression model is used to map SDO/AIA flare magnitudes to GOES X-ray magnitude, achieving a low-variance regression. We find an accurate overlap on 85% of M/X flares between our resulting AIA catalog and the GOES flare catalog. However, we also discover a number of large flares unrecorded or mislabeled in the GOES catalog.
more »
« less
FlareDB: A Database of Significant Flares in Solar Cycles 24 and 25 with SDO/HMI and SDO/AIA Observations
Abstract We present FlareDB, a database that provides comprehensive magnetic field information, ultraviolet/extreme ultraviolet (UV/EUV) emissions, and white light continuum images for solar active regions (ARs) associated with 151 significant flares from May 2010 to May 2025. The data, sourced from the Solar Dynamics Observatory (SDO) via the Joint Science Operations Center (JSOC), were processed with SunPy and stored in standardized JSOC FITS format. FlareDB includes all M5.0 and larger flares within 50° of the solar disk center. Key features include (1) Atmospheric Imaging Assembly (AIA) AR patches in Helioprojective Cartesian(HPC) and Lambert Cylindrical Equal-Area (CEA) projections, aligned with corresponding HMI magnetogram patches; (2) quick-look movies with uniform value ranges that ensure consistent visualization, maintain data uniformity, and enhance readiness for machine learning studies; (3) a supplementary web interface that allows the entire dataset of a flare to be downloaded for large flare analysis. One of FlareDB’s primary objectives is to support scientists in predicting and understanding the onset of solar eruptions, including flares and coronal mass ejections. The data set is machine-learning ready for this purpose.
more »
« less
- PAR ID:
- 10675222
- Publisher / Repository:
- Springer Nature
- Date Published:
- Journal Name:
- Scientific Data
- Volume:
- 13
- Issue:
- 1
- ISSN:
- 2052-4463
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Solar flares, especially the M- and X-class flares, are often associated with coronal mass ejections. They are the most important sources of space weather effects, which can severely impact the near-Earth environment. Thus it is essential to forecast flares (especially the M- and X-class ones) to mitigate their destructive and hazardous consequences. Here, we introduce several statistical and machine-learning approaches to the prediction of an active region’s (AR) flare index (FI) that quantifies the flare productivity of an AR by taking into account the number of different class flares within a certain time interval. Specifically, our sample includes 563 ARs that appeared on the solar disk from 2010 May to 2017 December. The 25 magnetic parameters, provided by the Space-weather HMI Active Region Patches (SHARP) from the Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory, characterize coronal magnetic energy stored in ARs by proxy and are used as the predictors. We investigate the relationship between these SHARP parameters and the FI of ARs with a machine-learning algorithm (spline regression) and the resampling method (Synthetic Minority Oversampling Technique for Regression with Gaussian Noise). Based on the established relationship, we are able to predict the value of FIs for a given AR within the next 1 day period. Compared with other four popular machine-learning algorithms, our methods improve the accuracy of FI prediction, especially for a large FI. In addition, we sort the importance of SHARP parameters by the Borda count method calculated from the ranks that are rendered by nine different machine-learning methods.more » « less
-
Abstract Solar eruptions, including flares and coronal mass ejections (CMEs), have a significant impact on Earth. Some flares are associated with CMEs, and some flares are not. The association between flares and CMEs is not always obvious. In this study, we propose a new deep learning method, specifically a hybrid neural network (HNN) that combines a vision transformer with long short-term memory, to predict associations between flares and CMEs. HNN finds spatio-temporal patterns in the time series of line-of-sight magnetograms of solar active regions collected by the Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory and uses the patterns to predict whether a flare projected to occur within the next 24 hr will be eruptive (i.e., CME-associated) or confined (i.e., not CME-associated). Our experimental results demonstrate the good performance of the HNN method. Furthermore, the results show that magnetic flux cancellation in polarity inversion line regions may well play a role in triggering flare-associated CMEs, a finding consistent with the literature.more » « less
-
Magnetic polarity inversion lines (PILs) in solar active regions are key to triggering flares and eruptions. Recently, engineered PIL features have been used for predicting solar eruptions. Derived from the original PIL dataset, using line-of-sight (LoS) magnetograms provided by the Solar Dynamics Observatory's (SDO) Helioseismic and Magnetic Imager (HMI) Active Region Patches (HARPs), we provide a publicly available comprehensive dataset in a supervised format, where each instance includes a raster of Polarity Inversion Lines (PILs), one of the polarity convex hull, and a multivariate time-series of properties related to PILs. Using SDO-GOES integrated flares historical data covering May 2010 to January 2019, we have assigned each of the instances their corresponding class of flare, FQ, C, M or X. By integrating these diverse data modalities, our approach aims to improve the accuracy of solar flare predictions. Initial findings suggest that the multimodal approach can uncover new patterns and relationships, potentially leading to breakthroughs in predictive accuracy and more effective mitigation strategies against the impacts of solar activities.more » « less
-
Abstract Solar energetic particles (SEPs) are an essential source of space radiation, and are hazardous for humans in space, spacecraft, and technology in general. In this paper, we propose a deep-learning method, specifically a bidirectional long short-term memory (biLSTM) network, to predict if an active region (AR) would produce an SEP event given that (i) the AR will produce an M- or X-class flare and a coronal mass ejection (CME) associated with the flare, or (ii) the AR will produce an M- or X-class flare regardless of whether or not the flare is associated with a CME. The data samples used in this study are collected from the Geostationary Operational Environmental Satellite's X-ray flare catalogs provided by the National Centers for Environmental Information. We select M- and X-class flares with identified ARs in the catalogs for the period between 2010 and 2021, and find the associations of flares, CMEs, and SEPs in the Space Weather Database of Notifications, Knowledge, Information during the same period. Each data sample contains physical parameters collected from the Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory. Experimental results based on different performance metrics demonstrate that the proposed biLSTM network is better than related machine-learning algorithms for the two SEP prediction tasks studied here. We also discuss extensions of our approach for probabilistic forecasting and calibration with empirical evaluation.more » « less
An official website of the United States government

