skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: SolarCube: An Integrative Benchmark Dataset Harnessing Satellite and In-situ Observations for Large-scale Solar Energy Forecasting
Solar power is a critical source of renewable energy, offering significant potential to lower greenhouse gas emissions and mitigate climate change. However, the cloud induced-variability of solar radiation reaching the earth’s surface presents a challenge for integrating solar power into the grid (e.g., storage and backup management). The new generation of geostationary satellites such as GOES-16 has become an important data source for large-scale and high temporal frequency solar radiation forecasting. However, no machine-learning-ready dataset has integrated geostationary satellite data with fine-grained solar radiation information to support forecasting model development and benchmarking with consistent metrics. We present SolarCube, a new ML-ready benchmark dataset for solar radiation forecasting. SolarCube covers 19 study areas distributed over multiple continents: North America, South America, Asia, and Oceania. The dataset supports short (i.e., 30 minutes to 6 hours) and long-term (i.e., day-ahead or longer) solar radiation forecasting at both point-level (i.e., specific locations of monitoring stations) and area-level, by processing and integrating data from multiple sources, including geostationary satellite images, physics-derived solar radiation, and ground station observations from different monitoring networks over the globe. We also evaluated a set of forecasting models for point- and image-based time-series data to develop performance benchmarks under different testing scenarios. The dataset is available at https://doi.org/10.5281/zenodo.11498739. A Python library is available to conveniently generate different variations of the dataset based on user needs, along with baseline models at https://github.com/Ruohan-Li/SolarCube.  more » « less
Award ID(s):
2021871
PAR ID:
10620871
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Advances in Neural Information Processing Systems
Date Published:
Volume:
37
Page Range / eLocation ID:
3499-3513
Format(s):
Medium: X
Right(s):
Creative Commons Attribution 4.0 International
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Developing accurate solar performance models, which infer solar power output in real time based on the current environmental conditions, are an important prerequisite for many advanced energy analytics. Recent work has developed sophisticated data-driven techniques that generate customized models for complex rooftop solar sites by combining well-known physical models with both system and public weather station data. However, inferring solar generation from public weather station data has two drawbacks: not all solar sites are near a public weather station, and public weather data generally quantifies cloud cover-the most significant weather metric that affects solar-using highly coarse and imprecise measurements.In this paper, we develop and evaluate solar performance models that use satellite-based estimates of downward shortwave (solar) radiation (DSR) at the Earth's surface, which NOAA began publicly releasing after the launch of the GOES-R geostationary satellites in 2017. Unlike public weather data, DSR estimates are available for every 0.5km 2 area. As we show, the accuracy of solar performance modeling using satellite data and public weather station data depends on the cloud conditions, with DSR-based modeling being more accurate under clear skies and station-based modeling being more accurate under overcast skies. Surprisingly, our results show that, overall, pure satellite-based modeling yields similar accuracy as pure station-based modeling, although the relationship is a function of conditions and the local climate. We also show that a hybrid approach that combines the best of both approaches can also modestly improve accuracy. 
    more » « less
  2. Abstract Solar energetic particle (SEP) events, in particular high-energy-range SEP events, pose significant risks to space missions, astronauts, and technological infrastructure. Accurate prediction of these high-impact events is crucial for mitigating potential hazards. In this study, we present an end-to-end ensemble machine learning (ML) framework for the prediction of high-impact ∼100 MeV SEP events. Our approach leverages diverse data modalities sourced from the Solar and Heliospheric Observatory and the Geostationary Operational Environmental Satellite integrating extracted active region polygons from solar extreme ultraviolet (EUV) imagery, time-series proton flux measurements, sunspot activity data, and detailed active region characteristics. To quantify the predictive contribution of each data modality (e.g., EUV or time series), we independently evaluate them using a range of ML models to assess their performance in forecasting SEP events. Finally, to enhance the SEP predictive performance, we train an ensemble learning model that combines all the models trained on individual data modalities, leveraging the strengths of each data modality. Our proposed ensemble approach shows promising performance, achieving a recall of 0.80 and 0.75 in balanced and imbalanced settings, respectively, underscoring the effectiveness of multimodal data integration for robust SEP event prediction and enhanced forecasting capabilities. 
    more » « less
  3. Chen, Jing M (Ed.)
    Thermal radiation directionality (TRD) characterizes the anisotropic signature of most surface targets in the thermal infrared domain. It causes significant uncertainties in estimating surface upward longwave radiation (SULR) from space observations. In this regard, kernel-driven models (KDMs) are suitable to remove TRD effects from remote sensing dataset as they are computationally efficient. However, KDMs requires simultaneous multiangle observations as inputs to be well calibrated, which yields a difficulty with geostationary satellites as they can only provide a single-angle observation. To overcome this issue, we proposed a six-parameter time-evolving KDM that combines a four-parameter SULR diurnal variation model and a two-parameter TRD amplitude model to correct the TRD effect for single-angle estimated SULR dataset of geostationary satellites. The significant daytime TRD effect when solar zenith angle is within 60cm can be effectively eliminated. The modeling accuracy of the time-evolving KDM is evaluated using a simulated SULR dataset generated by the 3D Discrete Anisotropic Radiative Transfer (DART) model; the TRD correction method based on the new time-evolving KDM is validated using a two-year single-angle estimated SULR dataset derived from data of the Advanced Baseline Imager (ABI) onboard Geostationary Operational Environmental Satellite-16 (GOES-16) against in situ measurements at 20 AmeriFlux sites. Results show that the proposed time-evolving KDM has a high accuracy with an R2 > 0.999 and a small RMSE = 1.5 W/m2; the TRD correction method based on the time-evolving KDM can greatly reduce the GOES-16 SULR uncertainty caused by the TRD effect with an RMSE decrease of 4.5 W/m2 (22.1%) and mean bias error decrease of 7.9 W/m2 (62.7%). Hence, the proposed TRD correction method is practically efficient for the operational TRD correction of SULR products generated from the geostationary satellites (e.g., GOES-16, FY-4A, Himawari-8, MSG). 
    more » « less
  4. Solar energy capacity is continuing to increase. The key challenge with integrating solar into buildings and the electric grid is its high power generation variability, which is a function of many factors, including a site's location, time, weather, and numerous physical attributes. There has been significant prior work on solar performance modeling and forecasting that infers a site's current and future solar generation based on these factors. Accurate solar performance models and forecasts are also a pre-requisite for conducting a wide range of building and grid energy-efficiency research. Unfortunately, much of the prior work is not accessible to researchers, either because it has not been released as open source, is time-consuming to re-implement, or requires access to proprietary data sources. To address the problem, we present Solar-TK, a data-driven toolkit for solar performance modeling and forecasting that is simple, extensible, and publicly accessible. Solar-TK's simple approach models and forecasts a site's solar output given only its location and a small amount of historical generation data. Solar-TK's extensible design includes a small collection of independent modules that connect together to implement basic modeling and forecasting, while also enabling users to implement new energy analytics. We plan to release Solar-TK as open source to enable research that requires realistic solar models and forecasts, and to serve as a baseline for comparing new solar modeling and forecasting techniques. We compare Solar-TK's simple approach with PVlib and show that it yields comparable accuracy. We present three case studies showing how Solar-TK can advance energy-efficiency research. 
    more » « less
  5. null (Ed.)
    Developing accurate solar performance models, which infer solar output from widely available external data sources, is increasingly important as the grid's solar capacity rises. These models are important for a wide range of solar analytics, including solar forecasting, resource estimation, and fault detection. The most significant error in existing models is inaccurate estimates of clouds' effect on solar output, as cloud formations and their impact on solar radiation are highly complex. In 2018 and 2019, respectively, the National Oceanic and Atmospheric Administration (NOAA) in the U.S. began releasing multispectral data comprising 16 different light wavelengths (or channels) from the GOES-16 and GOES-17 satellites every 5 minutes. Enough channel data is now available to learn solar performance models using machine learning (ML). In this paper, we show how to develop both local and global solar performance models using ML on multispectral data, and compare their accuracy to existing physical models based on ground-level weather readings and on NOAA's estimates of downward shortwave radiation (DSR), which also derive from multispectral data but using a physical model. We show that ML-based solar performance models based on multispectral data are much more accurate than weather- or DSR-based models, improving the average MAPE across 29 solar sites by over 50% for local models and 25% for global models. 
    more » « less