skip to main content


Search for: All records

Award ID contains: 1931555

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Magnetic polarity inversion lines (PILs) detected in solar active regions have long been recognized as arguably the most essential feature for triggering instabilities such as flares and eruptive events (i.e., eruptive flares and coronal mass ejections). In recent years, efforts have been focused on using features engineered from PILs for solar eruption prediction. However, PIL rasters and metadata are often generated as by-products and are not accessible for public use, which limits their utilization in data-intensive space weather analytics applications. We introduce a large-scale publicly available PIL data set covering practically the entire solar cycle 24 for applying to various space weather forecasting and analytics tasks. The data set is created using both radial magnetic field (B_r) and line-of-sight (B_LoS) magnetograms from the Solar Dynamics Observatory’s Helioseismic and Magnetic Imager Active Region Patches (HARP) that involve 4090 HARP series ranging from 2010 May to 2019 March. This data set includes three PIL-related binary masks of rasters: the actual PILs as per the spatial analysis of the magnetograms, the region of polarity inversion, and the convex hull of PILs, along with time-series-structured metadata extracted from these masks. We also provide a preliminary exploratory analysis of selected features aiming to correlate time series of feature metadata and eruptive activity originating from active regions. We envision that this comprehensive PIL data set will complement existing data sets used for space weather forecasting and benefit research in related areas, specifically in better understanding the PIL structure, evolution, and role in eruptions.

     
    more » « less
  2. Abstract

    We introduce and make openly accessible a comprehensive, multivariate time series (MVTS) dataset extracted from solar photospheric vector magnetograms in Spaceweather HMI Active Region Patch (SHARP) series. Our dataset also includes a cross-checked NOAA solar flare catalog that immediately facilitates solar flare prediction efforts. We discuss methods used for data collection, cleaning and pre-processing of the solar active region and flare data, and we further describe a novel data integration and sampling methodology. Our dataset covers 4,098 MVTS data collections from active regions occurring between May 2010 and December 2018, includes 51 flare-predictive parameters, and integrates over 10,000 flare reports. Potential directions toward expansion of the time series, either “horizontally” – by adding more prediction-specific parameters, or “vertically” – by generalizing flare into integrated solar eruption prediction, are also explained. The immediate tasks enabled by the disseminated dataset include: optimization of solar flare prediction and detailed investigation for elusive flare predictors or precursors, with both operational (research-to-operations), and basic research (operations-to-research) benefits potentially following in the future.

     
    more » « less
  3. Free, publicly-accessible full text available October 1, 2024
  4. Bifet A. ; Lorena A.C ; Ribeiro R.P. ; Gama J. ; Abreu p.H. (Ed.)
    This paper presents a post hoc analysis of a deep learning-based full-disk solar flare prediction model. We used hourly full-disk line-of-sight magnetogram images and selected binary prediction mode to predict the occurrence of ≥M1.0-class flares within 24 h. We leveraged custom data augmentation and sample weighting to counter the inherent class-imbalance problem and used true skill statistic and Heidke skill score as evaluation metrics. Recent advancements in gradient-based attention methods allow us to interpret models by sending gradient signals to assign the burden of the decision on the input features. We interpret our model using three post hoc attention methods: (i) Guided Gradient-weighted Class Activation Mapping, (ii) Deep Shapley Additive Explanations, and (iii) Integrated Gradients. Our analysis shows that full-disk predictions of solar flares align with characteristics related to the active regions. The key findings of this study are: (1) We demonstrate that our full disk model can tangibly locate and predict near-limb solar flares, which is a critical feature for operational flare forecasting, (2) Our candidate model achieves an average TSS=0.51±0.05 and HSS=0.38±0.08, and (3) Our evaluation suggests that these models can learn conspicuous features corresponding to active regions from full-disk magnetograms. 
    more » « less
    Free, publicly-accessible full text available October 1, 2024
  5. Gianmarco De Francisci Morales ; Claudia Perlich ; Natali Ruchansky ; Nicolas Kourtellis ; Elena Baralis ; Francesco Bonchi (Ed.)
    Free, publicly-accessible full text available September 17, 2024
  6. Rutkowski L. ; Scherer R. ; Korytkowski M. ; Pedrycz W. ; Tadeusiewicz R. ; Zurada J. (Ed.)
    In this work, we investigate the impact of class imbalance on the accuracy and diversity of synthetic samples generated by conditional generative adversarial networks (CGAN) models. Though many studies utilizing GANs have seen extraordinary success in producing realistic image samples, these studies generally assume the use of well-processed and balanced benchmark image datasets, including MNIST and CIFAR-10. However, well-balanced data is uncommon in real world applications such as detecting fraud, diagnosing diabetes, and predicting solar flares. It is well known that when class labels are not distributed uniformly, the predictive ability of classification algorithms suffers significantly, a phenomenon known as the "class-imbalance problem." We show that the imbalance in the training set can also impact sample generation of CGAN models. We utilize the well known MNIST datasets, controlling the imbalance ratio of certain classes within the data through sampling. We are able to show that both the quality and diversity of generated samples suffer in the presence of class imbalances and propose a novel framework named Two-stage CGAN to produce high-quality synthetic samples in such cases. Our results indicate that the proposed framework provides a significant improvement over typical oversampling and undersampling techniques utilized for class imbalance remediation. 
    more » « less
    Free, publicly-accessible full text available September 14, 2024
  7. Free, publicly-accessible full text available September 1, 2024
  8. Amini, MR. ; Canu, S. ; Fischer, A. ; Guns, T. ; Kralj Novak, P. ; Tsoumakas, G. (Ed.)
    Quantifying the similarity or distance between time series, processes, signals, and trajectories is a task-specific problem and remains a challenge for many applications. The simplest measure, meaning the Euclidean distance, is often dismissed because of its sensitivity to noise and the curse of dimensionality. Therefore, elastic mappings (such as DTW, LCSS, ED) are often utilized instead. However, these measures are not metric functions, and more importantly, they must deal with the challenges intrinsic to point-to-point mappings, such as pathological alignment. In this paper, we adopt an object-similarity measure, namely Multiscale Intersection over Union (MIoU), for measuring the distance/similarity between time series. We call the new measure TS-MIoU. Unlike the most popular time series similarity measures, TS-MIoU does not rely on a point-to-point mapping, and therefore, circumvents all respective challenges. We show that TS-MIoU is indeed a metric function, especially that it holds the triangle inequality axiom, and therefore can take advantage of indexing algorithms without a lower bounding. We further show that its sensitivity to noise is adjustable, which makes it a strong alternative to the Euclidean distance while not suffering from the curse of dimensionality. Our proof-of-concept experiments on over 100 UCR datasets show that TS-MIoU can fill the gap between the unforgiving strictness of the ℓp-norm measures, and the mapping challenges of elastic measures. 
    more » « less
  9. Rutkowski, L. ; Scherer, R. ; Korytkowski, M. ; Pedrycz W. ; Tadeusiewicz R. ; Zurada J. (Ed.)
    Solar flares not only pose risks to outer space technologies and astronauts’ well being, but also cause disruptions on earth to our high-tech, interconnected infrastructure our lives highly depend on. While a number of machine-learning methods have been proposed to improve flare prediction, none of them, to the best of our knowledge, have investigated the impact of outliers on the reliability and robustness of those models’ performance. In this study, we investigate the impact of outliers in a multivariate time series benchmark dataset, namely SWAN-SF, on flare prediction models, and test our hypothesis. That is, there exist outliers in SWAN-SF, removal of which enhances the performance of the prediction models on unseen datasets. We employ Isolation Forest to detect the outliers among the weaker flare instances. Several experiments are carried out using a large range of contamination rates which determine the percentage of present outliers. We assess the quality of each dataset in terms of its actual contamination using TimeSeriesSVC. In our best findings, we achieve a 279% increase in True Skill Statistic and 68% increase in Heidke Skill Score. The results show that overall a significant improvement can be achieved for flare prediction if outliers are detected and removed properly. 
    more » « less