Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Solar energetic particle (SEP) events, in particular high-energy-range SEP events, pose significant risks to space missions, astronauts, and technological infrastructure. Accurate prediction of these high-impact events is crucial for mitigating potential hazards. In this study, we present an end-to-end ensemble machine learning (ML) framework for the prediction of high-impact ∼100 MeV SEP events. Our approach leverages diverse data modalities sourced from the Solar and Heliospheric Observatory and the Geostationary Operational Environmental Satellite integrating extracted active region polygons from solar extreme ultraviolet (EUV) imagery, time-series proton flux measurements, sunspot activity data, and detailed active region characteristics. To quantify the predictive contribution of each data modality (e.g., EUV or time series), we independently evaluate them using a range of ML models to assess their performance in forecasting SEP events. Finally, to enhance the SEP predictive performance, we train an ensemble learning model that combines all the models trained on individual data modalities, leveraging the strengths of each data modality. Our proposed ensemble approach shows promising performance, achieving a recall of 0.80 and 0.75 in balanced and imbalanced settings, respectively, underscoring the effectiveness of multimodal data integration for robust SEP event prediction and enhanced forecasting capabilities.more » « lessFree, publicly-accessible full text available March 17, 2026
-
Abstract Solar energetic particle (SEP) events pose significant risks to both space and ground-level infrastructure, as well as to human health in space. Understanding and predicting these events are critical for mitigating their potential impacts. In this paper, we address the challenge of predicting SEP events using proton flux data. We leverage some of the most recent advances in time series data mining, such as shapelets and the matrix profile, to propose a simple and easily understandable prediction approach. Our objective is to mitigate the interpretability challenges inherent to most machine learning models and to show that other methods exist that can not only yield accurate forecasts but also facilitate exploration and insight generation within the data domain. For this purpose, we construct a multivariate time series data set consisting of proton flux data recorded by the National Oceanic and Atmospheric Administration's geosynchronous orbit Earth-observing satellite. Then, we use our proposed approach to mine shapelets and make predictions using a random forest classifier. We demonstrate that our approach rivals state-of-the-art SEP prediction, offering superior interpretability and the ability to predict SEP events before their parent eruptive flares.more » « lessFree, publicly-accessible full text available February 7, 2026
-
Abstract Solar energetic particle (SEP) events, originating from solar flares and Coronal Mass Ejections, present significant hazards to space exploration and technology on Earth. Accurate prediction of these high‐energy events is essential for safeguarding astronauts, spacecraft, and electronic systems. In this study, we conduct an in‐depth investigation into the application of multimodal data fusion techniques for the prediction of high‐energy SEP events, particularly ∼100 MeV events. Our research utilizes six machine learning (ML) models, each finely tuned for time series analysis, including Univariate Time Series (UTS), Image‐based model (Image), Univariate Feature Concatenation (UFC), Univariate Deep Concatenation (UDC), Univariate Deep Merge (UDM), and Univariate Score Concatenation (USC). By combining time series proton flux data with solar X‐ray images, we exploit complementary insights into the underlying solar phenomena responsible for SEP events. Rigorous evaluation metrics, including accuracy, F1‐score, and other established measures, are applied, along withK‐fold cross‐validation, to ensure the robustness and generalization of our models. Additionally, we explore the influence of observation window sizes on classification accuracy.more » « less
-
Abstract With increasing demands for precise water resource management, there is a growing need for advanced techniques in mapping water bodies. The currently deployed satellites provide complementary data that are either of high spatial or high temporal resolutions. As a result, there is a clear trade‐off between space and time when considering a single data source. For the efficient monitoring of multiple environmental resources, various Earth science applications need data at high spatial and temporal resolutions. To address this need, many data fusion methods have been described in the literature, that rely on combining data snapshots from multiple sources. Traditional methods face limitations due to sensitivity to atmospheric disturbances and other environmental factors, resulting in noise, outliers, and missing data. This paper introduces Hydrological Generative Adversarial Network (Hydro‐GAN), a novel machine learning‐based method that utilizes modified GANs to enhance boundary accuracy when mapping low‐resolution MODIS data to high‐resolution Landsat‐8 images. We propose a new non‐saturating loss function for the Hydro‐GAN generator, which maximizes the log of discriminator probabilities to promote stable updates and aid convergence. By focusing on reducing squared differences between real and synthetic images, our approach enhances training stability and overall performance. We specifically focus on mapping water bodies using MODIS and Landsat‐8 imagery due to their relevance in water resource management tasks. Our experimental results demonstrate the effectiveness of Hydro‐GAN in generating high‐resolution water body maps, outperforming traditional methods in terms of boundary accuracy and overall quality.more » « less
-
Abstract Photospheric magnetic field parameters are frequently used to analyze and predict solar events. Observation of these parameters over time, i.e., representing solar events by multivariate time-series (MVTS) data, can determine relationships between magnetic field states in active regions and extreme solar events, e.g., solar flares. We can improve our understanding of these events by selecting the most relevant parameters that give the highest predictive performance. In this study, we propose a two-step incremental feature selection method for MVTS data using a deep-learning model based on long short-term memory (LSTM) networks. First, each MVTS feature (magnetic field parameter) is evaluated individually by a univariate sequence classifier utilizing an LSTM network. Then, the top performing features are combined to produce input for an LSTM-based multivariate sequence classifier. Finally, we tested the discrimination ability of the selected features by training downstream classifiers, e.g., Minimally Random Convolutional Kernel Transform and support vector machine. We performed our experiments using a benchmark data set for flare prediction known as Space Weather Analytics for Solar Flares. We compared our proposed method with three other baseline feature selection methods and demonstrated that our method selects more discriminatory features compared to other methods. Due to the imbalanced nature of the data, primarily caused by the rarity of minority flare classes (e.g., the X and M classes), we used the true skill statistic as the evaluation metric. Finally, we reported the set of photospheric magnetic field parameters that give the highest discrimination performance in predicting flare classes.more » « less
-
Abstract Solar energetic particles (SEPs) are associated with extreme solar events that can cause major damage to space- and ground-based life and infrastructure. High-intensity SEP events, particularly ∼100 MeV SEP events, can pose severe health risks for astronauts owing to radiation exposure and affect Earth’s orbiting satellites (e.g., Landsat and the International Space Station). A major challenge in the SEP event prediction task is the lack of adequate SEP data because of the rarity of these events. In this work, we aim to improve the prediction of ∼30, ∼60, and ∼100 MeV SEP events by synthetically increasing the number of SEP samples. We explore the use of a univariate and multivariate time series of proton flux data as input to machine-learning-based prediction methods, such as time series forest (TSF). Our study covers solar cycles 22, 23, and 24. Our findings show that using data augmentation methods, such as the synthetic minority oversampling technique, remarkably increases the accuracy and F1-score of the classifiers used in this research, especially for TSF, where the average accuracy increased by 20%, reaching around 90% accuracy in the ∼100 MeV SEP prediction task. We also achieved higher prediction accuracy when using the multivariate time series data of the proton flux. Finally, we build a pipeline framework for our best-performing model, TSF, and provide a comprehensive hierarchical classification of the ∼100, ∼60, and ∼30 MeV and non-SEP prediction scenarios.more » « less
-
Solar flares are significant occurrences in solar physics, impacting space weather and terrestrial technologies. Accurate classification of solar flares is essential for predicting space weather and minimizing potential disruptions to communication, navigation, and power systems. This study addresses the challenge of selecting the most relevant features from multivariate time-series data, specifically focusing on solar flares. We employ methods such as Mutual Information (MI), Minimum Redundancy Maximum Relevance (mRMR), and Euclidean Distance to identify key features for classification. Recognizing the performance variability of different feature selection techniques, we introduce an ensemble approach to compute feature weights. By combining outputs from multiple methods, our ensemble method provides a more comprehensive understanding of the importance of features. Our results show that the ensemble approach significantly improves classification performance, achieving values 0.15 higher in True Skill Statistic (TSS) values compared to individual feature selection methods. Additionally, our method offers valuable insights into the underlying physical processes of solar flares, leading to more effective space weather forecasting and enhanced mitigation strategies for communication, navigation, and power system disruptions.more » « less
-
Solar wind modeling is classified into two main types: empirical models and physics-based models, each designed to forecast solar wind properties in various regions of the heliosphere. Empirical models, which are cost-effective, have demonstrated significant accuracy in predicting solar wind at the L1 Lagrange point. On the other hand, physics-based models rely on magnetohydrodynamics (MHD) principles and demand more computational resources. In this research paper, we build upon our recent novel approach that merges empirical and physics-based models. Our recent proposal involves the creation of a new physics-informed neural network that leverages time series data from solar wind predictors to enhance solar wind prediction. This innovative method aims to combine the strengths of both modeling approaches to achieve more accurate and efficient solar wind predictions. In this work, we show the variability of the proposed physics-informed loss across multiple deep learning models. We also study the effect of training the models on different solar cycles on the model’s performance. This work represents the first effort to predict solar wind by integrating deep learning approaches with physics constraints and analyzing the results across three solar cycles. Our findings demonstrate the superiority of our physics-constrained model over other unconstrained deep learning predictive models.more » « less
-
Streamflow prediction is crucial for planning future developments and safety measures along river basins, especially in the face of changing climate patterns. In this study, we utilized monthly streamflow data from the United States Bureau of Reclamation and meteorological data (snow water equivalent, temperature, and precipitation) from the various weather monitoring stations of the Snow Telemetry Network within the Upper Colorado River Basin to forecast monthly streamflow at Lees Ferry, a specific location along the Colorado River in the basin. Four machine learning models—Random Forest Regression, Long short-term memory, Gated Recurrent Unit, and Seasonal AutoRegresive Integrated Moving Average—were trained using 30 years of monthly data (1991–2020), split into 80% for training (1991–2014) and 20% for testing (2015–2020). Initially, only historical streamflow data were used for predictions, followed by including meteorological factors to assess their impact on streamflow. Subsequently, sequence analysis was conducted to explore various input-output sequence window combinations. We then evaluated the influence of each factor on streamflow by testing all possible combinations to identify the optimal feature combination for prediction. Our results indicate that the Random Forest Regression model consistently outperformed others, especially after integrating all meteorological factors with historical streamflow data. The best performance was achieved with a 24-month look-back period to predict 12 months of streamflow, yielding a Root Mean Square Error of 2.25 and R-squared (R2) of 0.80. Finally, to assess model generalizability, we tested the best model at other locations—Greenwood Springs (Colorado River), Maybell (Yampa River), and Archuleta (San Juan) in the basin.more » « less
An official website of the United States government
