skip to main content


Title: Data‐Driven Forecasting of Low‐Latitude Ionospheric Total Electron Content Using the Random Forest and LSTM Machine Learning Methods
Abstract

In this research, we present data‐driven forecasting of ionospheric total electron content (TEC) using the Long‐Short Term Memory (LSTM) deep recurrent neural network method. The random forest machine learning method was used to perform a regression analysis and estimate the variable importance of the input parameters. The input data are obtained from satellite and ground based measurements characterizing the solar‐terrestrial environment. We estimate the relative importance of 34 different parameters, including the solar flux, solar wind density, and speed the three components of interplanetary magnetic field, Lyman‐alpha, the Kp, Dst, and Polar Cap (PC) indices. The TEC measurements are taken with 15‐s cadence from an equatorial GPS station located at Bogota, Columbia (4.7110° N, 74.0721° W). The 2008–2017 data set, including the top five parameters estimated using the random forest, is used for training the machine learning models, and the 2018 data set is used for independent testing of the LSTM forecasting. The LSTM method as applied to forecast the TEC up to 5 h ahead, with 30‐min cadence. The results indicate that very good forecasts with low root mean square (RMS) error (high correlation) can be made in the near future and the RMS errors increase as we forecast further into the future. The data sources are satellite and ground based measurements characterizing the solar‐terrestrial environment.

 
more » « less
Award ID(s):
1933056
NSF-PAR ID:
10450805
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
DOI PREFIX: 10.1029
Date Published:
Journal Name:
Space Weather
Volume:
19
Issue:
6
ISSN:
1542-7390
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Photospheric magnetic field parameters are frequently used to analyze and predict solar events. Observation of these parameters over time, i.e., representing solar events by multivariate time-series (MVTS) data, can determine relationships between magnetic field states in active regions and extreme solar events, e.g., solar flares. We can improve our understanding of these events by selecting the most relevant parameters that give the highest predictive performance. In this study, we propose a two-step incremental feature selection method for MVTS data using a deep-learning model based on long short-term memory (LSTM) networks. First, each MVTS feature (magnetic field parameter) is evaluated individually by a univariate sequence classifier utilizing an LSTM network. Then, the top performing features are combined to produce input for an LSTM-based multivariate sequence classifier. Finally, we tested the discrimination ability of the selected features by training downstream classifiers, e.g., Minimally Random Convolutional Kernel Transform and support vector machine. We performed our experiments using a benchmark data set for flare prediction known as Space Weather Analytics for Solar Flares. We compared our proposed method with three other baseline feature selection methods and demonstrated that our method selects more discriminatory features compared to other methods. Due to the imbalanced nature of the data, primarily caused by the rarity of minority flare classes (e.g., the X and M classes), we used the true skill statistic as the evaluation metric. Finally, we reported the set of photospheric magnetic field parameters that give the highest discrimination performance in predicting flare classes.

     
    more » « less
  2. Solar energy is now the cheapest form of electricity in history. Unfortunately, significantly increasing the electric grid's fraction of solar energy remains challenging due to its variability, which makes balancing electricity's supply and demand more difficult. While thermal generators' ramp rate---the maximum rate at which they can change their energy generation---is finite, solar energy's ramp rate is essentially infinite. Thus, accurate near-term solar forecasting, or nowcasting, is important to provide advance warnings to adjust thermal generator output in response to variations in solar generation to ensure a balanced supply and demand. To address the problem, this paper develops a general model for solar nowcasting from abundant and readily available multispectral satellite data using self-supervised learning. Specifically, we develop deep auto-regressive models using convolutional neural networks (CNN) and long short-term memory networks (LSTM) that are globally trained across multiple locations to predict raw future observations of the spatio-temporal spectral data collected by the recently launched GOES-R series of satellites. Our model estimates a location's near-term future solar irradiance based on satellite observations, which we feed to a regression model trained on smaller site-specific solar data to provide near-term solar photovoltaic (PV) forecasts that account for site-specific characteristics. We evaluate our approach for different coverage areas and forecast horizons across 25 solar sites and show that it yields errors close to that of a model using ground-truth observations. 
    more » « less
  3. Abstract

    Machine learning (ML) has been applied to space weather problems with increasing frequency in recent years, driven by an influx of in-situ measurements and a desire to improve modeling and forecasting capabilities throughout the field. Space weather originates from solar perturbations and is comprised of the resulting complex variations they cause within the numerous systems between the Sun and Earth. These systems are often tightly coupled and not well understood. This creates a need for skillful models with knowledge about the confidence of their predictions. One example of such a dynamical system highly impacted by space weather is the thermosphere, the neutral region of Earth’s upper atmosphere. Our inability to forecast it has severe repercussions in the context of satellite drag and computation of probability of collision between two space objects in low Earth orbit (LEO) for decision making in space operations. Even with (assumed) perfect forecast of model drivers, our incomplete knowledge of the system results in often inaccurate thermospheric neutral mass density predictions. Continuing efforts are being made to improve model accuracy, but density models rarely provide estimates of confidence in predictions. In this work, we propose two techniques to develop nonlinear ML regression models to predict thermospheric density while providing robust and reliable uncertainty estimates: Monte Carlo (MC) dropout and direct prediction of the probability distribution, both using the negative logarithm of predictive density (NLPD) loss function. We show the performance capabilities for models trained on both local and global datasets. We show that the NLPD loss provides similar results for both techniques but the direct probability distribution prediction method has a much lower computational cost. For the global model regressed on the Space Environment Technologies High Accuracy Satellite Drag Model (HASDM) density database, we achieve errors of approximately 11% on independent test data with well-calibrated uncertainty estimates. Using an in-situ CHAllenging Minisatellite Payload (CHAMP) density dataset, models developed using both techniques provide test error on the order of 13%. The CHAMP models—on validation and test data—are within 2% of perfect calibration for the twenty prediction intervals tested. We show that this model can also be used to obtain global density predictions with uncertainties at a given epoch.

     
    more » « less
  4. Abstract

    Accurate estimation of terrestrial gross primary productivity (GPP) remains a challenge despite its importance in the global carbon cycle. Chlorophyll fluorescence (ChlF) has been recently adopted to understand photosynthesis and its response to the environment, particularly with remote sensing data. However, it remains unclear how ChlF and photosynthesis are linked at different spatial scales across the growing season. We examined seasonal relationships between ChlF and photosynthesis at the leaf, canopy, and ecosystem scales and explored how leaf‐level ChlF was linked with canopy‐scale solar‐induced chlorophyll fluorescence (SIF) in a temperate deciduous forest at Harvard Forest, Massachusetts,USA. Our results show that ChlF captured the seasonal variations of photosynthesis with significant linear relationships between ChlF and photosynthesis across the growing season over different spatial scales (R= 0.73, 0.77, and 0.86 at leaf, canopy, and satellite scales, respectively;P < 0.0001). We developed a model to estimateGPPfrom the tower‐based measurement ofSIFand leaf‐level ChlF parameters. The estimation ofGPPfrom this model agreed well with flux tower observations ofGPP(R= 0.68;P < 0.0001), demonstrating the potential ofSIFfor modelingGPP. At the leaf scale, we found that leafFq/Fm, the fraction of absorbed photons that are used for photochemistry for a light‐adapted measurement from a pulse amplitude modulation fluorometer, was the best leaf fluorescence parameter to correlate with canopySIFyield (SIF/APAR,R= 0.79;P < 0.0001). We also found that canopySIFandSIF‐derivedGPP(GPPSIF) were strongly correlated to leaf‐level biochemistry and canopy structure, including chlorophyll content (R= 0.65 for canopyGPPSIFand chlorophyll content;P < 0.0001), leaf area index (LAI) (R= 0.35 for canopyGPPSIFandLAI;P < 0.0001), and normalized difference vegetation index (NDVI) (R= 0.36 for canopyGPPSIFandNDVI;P < 0.0001). Our results suggest that ChlF can be a powerful tool to track photosynthetic rates at leaf, canopy, and ecosystem scales.

     
    more » « less
  5. Precipitation, especially convective precipitation, is highly associated with hydrological disasters (e.g., floods and drought) that have negative impacts on agricultural productivity, society, and the environment. To mitigate these negative impacts, it is crucial to monitor the precipitation status in real time. The new Advanced Baseline Imager (ABI) onboard the GOES-16 satellite provides such a precipitation product in higher spatiotemporal and spectral resolutions, especially during the daytime. This research proposes a deep neural network (DNN) method to classify rainy and non-rainy clouds based on the brightness temperature differences (BTDs) and reflectances (Ref) derived from ABI. Convective and stratiform rain clouds are also separated using similar spectral parameters expressing the characteristics of cloud properties. The precipitation events used for training and validation are obtained from the IMERG V05B data, covering the southeastern coast of the U.S. during the 2018 rainy season. The performance of the proposed method is compared with traditional machine learning methods, including support vector machines (SVMs) and random forest (RF). For rainy area detection, the DNN method outperformed the other methods, with a critical success index (CSI) of 0.71 and a probability of detection (POD) of 0.86. For convective precipitation delineation, the DNN models also show a better performance, with a CSI of 0.58 and POD of 0.72. This automatic cloud classification system could be deployed for extreme rainfall event detection, real-time forecasting, and decision-making support in rainfall-related disasters. 
    more » « less