skip to main content

This content will become publicly available on August 12, 2023

Title: Towards coupling full-disk and active region-based flare prediction for operational space weather forecasting
Solar flare prediction is a central problem in space weather forecasting and has captivated the attention of a wide spectrum of researchers due to recent advances in both remote sensing as well as machine learning and deep learning approaches. The experimental findings based on both machine and deep learning models reveal significant performance improvements for task specific datasets. Along with building models, the practice of deploying such models to production environments under operational settings is a more complex and often time-consuming process which is often not addressed directly in research settings. We present a set of new heuristic approaches to train and deploy an operational solar flare prediction system for ≥M1.0-class flares with two prediction modes: full-disk and active region-based. In full-disk mode, predictions are performed on full-disk line-of-sight magnetograms using deep learning models whereas in active region-based models, predictions are issued for each active region individually using multivariate time series data instances. The outputs from individual active region forecasts and full-disk predictors are combined to a final full-disk prediction result with a meta-model. We utilized an equal weighted average ensemble of two base learners’ flare probabilities as our baseline meta learner and improved the capabilities of our two base more » learners by training a logistic regression model. The major findings of this study are: 1) We successfully coupled two heterogeneous flare prediction models trained with different datasets and model architecture to predict a full-disk flare probability for next 24 h, 2) Our proposed ensembling model, i.e., logistic regression, improves on the predictive performance of two base learners and the baseline meta learner measured in terms of two widely used metrics True Skill Statistic (TSS) and Heidke Skill Score (HSS), and 3) Our result analysis suggests that the logistic regression-based ensemble (Meta-FP) improves on the full-disk model (base learner) by ∼9% in terms TSS and ∼10% in terms of HSS. Similarly, it improves on the AR-based model (base learner) by ∼17% and ∼20% in terms of TSS and HSS respectively. Finally, when compared to the baseline meta model, it improves on TSS by ∼10% and HSS by ∼15%. « less
; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
Frontiers in Astronomy and Space Sciences
Sponsoring Org:
National Science Foundation
More Like this
  1. Lossio-Ventura J.A. ; Valverde-Rebaza J. ; Diaz E. ; Muñante D. ; Gavidia-Calderon C. ; Baria Valejo A.D. ; Alatrista-Salas H. (Ed.)
    The efforts in solar flare prediction have been engendered by the advancements in machine learning and deep learning methods. We present a new approach to flare prediction using full-disk compressed magnetogram images with Convolutional Neural Networks. We selected three prediction modes, among which two are binary for predicting the occurrence of ≥M1.0 and ≥C4.0 class flares and one is a multi-class mode for predicting the occurrence of more »achieves an average TSS of 0.36 and average HSS of 0.31. Similarly, for binary prediction in (i) ≥C4.0 mode: we achieve an average TSS score of 0.47 and HSS score of 0.46, (ii) ≥M1.0 mode: we achieve an average TSS score of 0.55 and HSS score of 0.43.« less
  2. Propensity score methods account for selection bias in observational studies. However, the consistency of the propensity score estimators strongly depends on a correct specification of the propensity score model. Logistic regression and, with increasing popularity, machine learning tools are used to estimate propensity scores. We introduce a stacked generalization ensemble learning approach to improve propensity score estimation by fitting a meta learner on the predictions of a suitable set of diverse base learners. We perform a comprehensive Monte Carlo simulation study, implementing a broad range of scenarios that mimic characteristics of typical data sets in educational studies. The population average treatment effect is estimated using the propensity score in Inverse Probability of Treatment Weighting. Our proposed stacked ensembles, especially using gradient boosting machines as a meta learner trained on a set of 12 base learner predictions, led to superior reduction of bias compared to the current state-of-the-art in propensity score estimation. Further, our simulations imply that commonly used balance measures (averaged standardized absolute mean differences) might be misleading as propensity score model selection criteria. We apply our proposed model - which we call GBM-Stack - to assess the population average treatment effect of a Supplemental Instruction (SI) program in anmore »introductory psychology (PSY 101) course at San Diego State University. Our analysis provides evidence that moving the whole population to SI attendance would on average lead to 1.69 times higher odds to pass the PSY 101 class compared to not offering SI, with a 95% bootstrap confidence interval of (1.31, 2.20).« less
  3. Abstract

    Solar flares, especially the M- and X-class flares, are often associated with coronal mass ejections. They are the most important sources of space weather effects, which can severely impact the near-Earth environment. Thus it is essential to forecast flares (especially the M- and X-class ones) to mitigate their destructive and hazardous consequences. Here, we introduce several statistical and machine-learning approaches to the prediction of an active region’s (AR) flare index (FI) that quantifies the flare productivity of an AR by taking into account the number of different class flares within a certain time interval. Specifically, our sample includes 563 ARs that appeared on the solar disk from 2010 May to 2017 December. The 25 magnetic parameters, provided by the Space-weather HMI Active Region Patches (SHARP) from the Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory, characterize coronal magnetic energy stored in ARs by proxy and are used as the predictors. We investigate the relationship between these SHARP parameters and the FI of ARs with a machine-learning algorithm (spline regression) and the resampling method (Synthetic Minority Oversampling Technique for Regression with Gaussian Noise). Based on the established relationship, we are able to predict the value of FIs formore »a given AR within the next 1 day period. Compared with other four popular machine-learning algorithms, our methods improve the accuracy of FI prediction, especially for a large FI. In addition, we sort the importance of SHARP parameters by the Borda count method calculated from the ranks that are rendered by nine different machine-learning methods.

    « less
  4. Interactive learning environments facilitate learning by providing hints to fill the gaps in the understanding of a concept. Studies suggest that hints are not used optimally by learners. Either they are used unnecessarily or not used at all. It has been shown that learning outcomes can be improved by providing hints when needed. An effective hinttaking prediction model can be used by a learning environment to make adaptive decisions on whether to withhold or provide hints. Past work on student behavior modeling has focused extensively on the task of modeling a learner’s state of knowledge over time, referred to as knowledge tracing. The other aspects of a learner’s behavior such as tendency to use hints has garnered limited attention. Past knowledge tracing models either ignore the questions where a hint was taken or label hints taken as an incorrect response. We propose a multi-task memory-augmented deep learning model to jointly predict the hint-taking and the knowledge tracing task. The model incorporates the effect of past responses as well as hints taken on both the tasks. We apply the model on two datasets – ASSISTments 2009-10 skill builder dataset and Junyi Academy Math Practicing Log. The results show that deep learningmore »models efficiently leverage the sequential information present in a learner’s responses. The proposed model significantly out-performs the past work on hint prediction by at least 12% points. Moreover, we demonstrate that jointly modeling the two tasks improves performance consistently across the tasks and the datasets, albeit by a small amount.« less
  5. Abstract

    Solar energetic particles (SEPs) are an essential source of space radiation, and are hazardous for humans in space, spacecraft, and technology in general. In this paper, we propose a deep-learning method, specifically a bidirectional long short-term memory (biLSTM) network, to predict if an active region (AR) would produce an SEP event given that (i) the AR will produce an M- or X-class flare and a coronal mass ejection (CME) associated with the flare, or (ii) the AR will produce an M- or X-class flare regardless of whether or not the flare is associated with a CME. The data samples used in this study are collected from the Geostationary Operational Environmental Satellite's X-ray flare catalogs provided by the National Centers for Environmental Information. We select M- and X-class flares with identified ARs in the catalogs for the period between 2010 and 2021, and find the associations of flares, CMEs, and SEPs in the Space Weather Database of Notifications, Knowledge, Information during the same period. Each data sample contains physical parameters collected from the Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory. Experimental results based on different performance metrics demonstrate that the proposed biLSTM network is better than relatedmore »machine-learning algorithms for the two SEP prediction tasks studied here. We also discuss extensions of our approach for probabilistic forecasting and calibration with empirical evaluation.

    « less