skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, November 14 until 2:00 AM ET on Saturday, November 15 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on December 13, 2025

Title: AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery
Clouds in satellite imagery pose a significant challenge for downstream applica- tions. A major challenge in current cloud removal research is the absence of a comprehensive benchmark and a sufficiently large and diverse training dataset. To address this problem, we introduce the largest public dataset — AllClear for cloud removal, featuring 23,742 globally distributed regions of interest (ROIs) with diverse land-use patterns, comprising 4 million images in total. Each ROI includes complete temporal captures from the year 2022, with (1) multi-spectral optical im- agery from Sentinel-2 and Landsat 8/9, (2) synthetic aperture radar (SAR) imagery from Sentinel-1, and (3) auxiliary remote sensing products such as cloud masks and land cover maps. We validate the effectiveness of our dataset by benchmarking performance, demonstrating the scaling law — the PSNR rises from 28.47 to 33.87 with 30× more data, and conducting ablation studies on the temporal length and the importance of individual modalities. This dataset aims to provide comprehensive coverage of the Earth’s surface and promote better cloud removal results.  more » « less
Award ID(s):
2144117
PAR ID:
10566036
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
NeurIPS 2024
Date Published:
Format(s):
Medium: X
Location:
Vancouver
Sponsoring Org:
National Science Foundation
More Like this
  1. Mapping crop types and land cover in smallholder farming systems in sub-Saharan Africa remains a challenge due to data costs, high cloud cover, and poor temporal resolution of satellite data. With improvement in satellite technology and image processing techniques, there is a potential for integrating data from sensors with different spectral characteristics and temporal resolutions to effectively map crop types and land cover. In our Malawi study area, it is common that there are no cloud-free images available for the entire crop growth season. The goal of this experiment is to produce detailed crop type and land cover maps in agricultural landscapes using the Sentinel-1 (S-1) radar data, Sentinel-2 (S-2) optical data, S-2 and PlanetScope data fusion, and S-1 C2 matrix and S-1 H/α polarimetric decomposition. We evaluated the ability to combine these data to map crop types and land cover in two smallholder farming locations. The random forest algorithm, trained with crop and land cover type data collected in the field, complemented with samples digitized from Google Earth Pro and DigitalGlobe, was used for the classification experiments. The results show that the S-2 and PlanetScope fused image + S-1 covariance (C2) matrix + H/α polarimetric decomposition (an entropy-based decomposition method) fusion outperformed all other image combinations, producing higher overall accuracies (OAs) (>85%) and Kappa coefficients (>0.80). These OAs represent a 13.53% and 11.7% improvement on the Sentinel-2-only (OAs < 80%) experiment for Thimalala and Edundu, respectively. The experiment also provided accurate insights into the distribution of crop and land cover types in the area. The findings suggest that in cloud-dense and resource-poor locations, fusing high temporal resolution radar data with available optical data presents an opportunity for operational mapping of crop types and land cover to support food security and environmental management decision-making. 
    more » « less
  2. Grassland monitoring can be challenging because it is time-consuming and expensive to measure grass condition at large spatial scales. Remote sensing offers a time- and cost-effective method for mapping and monitoring grassland condition at both large spatial extents and fine temporal resolutions. Combinations of remotely sensed optical and radar imagery are particularly promising because together they can measure differences in moisture, structure, and reflectance among land cover types. We combined multi-date radar (PALSAR-2 and Sentinel-1) and optical (Sentinel-2) imagery with field data and visual interpretation of aerial imagery to classify land cover in the Masai Mara National Reserve, Kenya using machine learning (Random Forests). This study area comprises a diverse array of land cover types and changes over time due to seasonal changes in precipitation, seasonal movements of large herds of resident and migratory ungulates, fires, and livestock grazing. We classified twelve land cover types with user’s and producer’s accuracies ranging from 66%–100% and an overall accuracy of 86%. These methods were able to distinguish among short, medium, and tall grass cover at user’s accuracies of 83%, 82%, and 85%, respectively. By yielding a highly accurate, fine-resolution map that distinguishes among grasses of different heights, this work not only outlines a viable method for future grassland mapping efforts but also will help inform local management decisions and research in the Masai Mara National Reserve. 
    more » « less
  3. null (Ed.)
    Coastal mangrove forests provide important ecosystem goods and services, including carbon sequestration, biodiversity conservation, and hazard mitigation. However, they are being destroyed at an alarming rate by human activities. To characterize mangrove forest changes, evaluate their impacts, and support relevant protection and restoration decision making, accurate and up-to-date mangrove extent mapping at large spatial scales is essential. Available large-scale mangrove extent data products use a single machine learning method commonly with 30 m Landsat imagery, and significant inconsistencies remain among these data products. With huge amounts of satellite data involved and the heterogeneity of land surface characteristics across large geographic areas, finding the most suitable method for large-scale high-resolution mangrove mapping is a challenge. The objective of this study is to evaluate the performance of a machine learning ensemble for mangrove forest mapping at 20 m spatial resolution across West Africa using Sentinel-2 (optical) and Sentinel-1 (radar) imagery. The machine learning ensemble integrates three commonly used machine learning methods in land cover and land use mapping, including Random Forest (RF), Gradient Boosting Machine (GBM), and Neural Network (NN). The cloud-based big geospatial data processing platform Google Earth Engine (GEE) was used for pre-processing Sentinel-2 and Sentinel-1 data. Extensive validation has demonstrated that the machine learning ensemble can generate mangrove extent maps at high accuracies for all study regions in West Africa (92%–99% Producer’s Accuracy, 98%–100% User’s Accuracy, 95%–99% Overall Accuracy). This is the first-time that mangrove extent has been mapped at a 20 m spatial resolution across West Africa. The machine learning ensemble has the potential to be applied to other regions of the world and is therefore capable of producing high-resolution mangrove extent maps at global scales periodically. 
    more » « less
  4. This dataset describes measurements of inter-annual to sub-seasonal riverbank erosion rates on the Koyukuk River, Alaska, over the period 2016-2022. The data are used in the paper: “Geyman, E., Douglas, M., Avouac, J.-P. and Lamb, M. Permafrost slows Arctic riverbank erosion, in review (2024).” The dataset contains two sets of measurements: (1) riverbank displacement estimated from Sentinel-2 optical satellite imagery (10 meter (m) resolution) over the period 30-Aug-2016 to 13-Jul-2022, and (2) riverbank displacement estimated from Planet optical satellite imagery (3 m resolution) over the period 31-Aug-2016 to 01-Oct-2022. The first dataset is based on comparison of Sentinel-2 satellite acquisitions from the start and end of the study interval. The second dataset analyzes 65 PlanetScope image mosaics (for an average of 9 observations per year). The Matlab code used to analyze the Sentinel-2 and PlanetScope imagery, as well as to process the sub-seasonal displacement estimates, is included in the file “Code.zip”. 
    more » « less
  5. Abstract Land surface phenology (LSP) products are currently of large uncertainties due to cloud contaminations and other impacts in temporal satellite observations and they have been poorly validated because of the lack of spatially comparable ground measurements. This study provided a reference dataset of gap-free time series and phenological dates by fusing the Harmonized Landsat 8 and Sentinel-2 (HLS) observations with near-surface PhenoCam time series for 78 regions of 10 × 10 km2across ecosystems in North America during 2019 and 2020. The HLS-PhenoCam LSP (HP-LSP) reference dataset at 30 m pixels is composed of: (1) 3-day synthetic gap-free EVI2 (two-band Enhanced Vegetation Index) time series that are physically meaningful to monitor the vegetation development across heterogeneous levels, train models (e.g., machine learning) for land surface mapping, and extract phenometrics from various methods; and (2) four key phenological dates (accuracy ≤5 days) that are spatially continuous and scalable, which are applicable to validate various satellite-based phenology products (e.g., global MODIS/VIIRS LSP), develop phenological models, and analyze climate impacts on terrestrial ecosystems. 
    more » « less