skip to main content


Title: A simple cloud-filling approach for remote sensing water cover assessments
Abstract. The empirical attribution of hydrologic change presents a unique data availability challenge in terms of establishing baseline prior conditions, as one cannot go back in time to retrospectively collect the necessary data. Although global remote sensing data can alleviate this challenge, most satellite missions are too recent to capture changes that happened long ago enough to provide sufficient observations for adequate statistical inference. In that context, the 4 decades of continuous global high-resolution monitoring enabled by the Landsat missions are an unrivaled source of information. However, constructing a time series of land cover observation across Landsat missions remains a significant challenge because cloud masking and inconsistent image quality complicate the automatized interpretation of optical imagery. Focusing on the monitoring of lake water extent, we present an automatized gap-filling approach to infer the class (wet or dry) of pixels masked by clouds or sensing errors. The classification outcome of unmasked pixels is compiled across images taken on different dates to estimate the inundation frequency of each pixel, based on the assumption that different pixels are masked at different times. The inundation frequency is then used to infer the inundation status of masked pixels on individual images through supervised classification. Applied to a variety of global lakes with substantial long term or seasonal fluctuations, the approach successfully captured water extent variations obtained from in situ gauges (where applicable), or from other Landsat missions during overlapping time periods. Although sensitive to classification errors in the input imagery, the gap-filling algorithm is straightforward to implement on Google's Earth Engine platform and stands as a scalable approach to reliably monitor, and ultimately attribute, historical changes in water bodies.  more » « less
Award ID(s):
1824951
NSF-PAR ID:
10278197
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Hydrology and Earth System Sciences
Volume:
25
Issue:
5
ISSN:
1607-7938
Page Range / eLocation ID:
2373 to 2386
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Global surface water classification layers, such as the European Joint Research Centre’s (JRC) Monthly Water History dataset, provide a starting point for accurate and large scale analyses of trends in waterbody extents. On the local scale, there is an opportunity to increase the accuracy and temporal frequency of these surface water maps by using locally trained classifiers and gap-filling missing values via imputation in all available satellite images. We developed the Surface Water IMputation (SWIM) classification framework using R and the Google Earth Engine computing platform to improve water classification compared to the JRC study. The novel contributions of the SWIM classification framework include (1) a cluster-based algorithm to improve classification sensitivity to a variety of surface water conditions and produce approximately unbiased estimation of surface water area, (2) a method to gap-fill every available Landsat image for a region of interest to generate submonthly classifications at the highest possible temporal frequency, (3) an outlier detection method for identifying images that contain classification errors due to failures in cloud masking. Validation and several case studies demonstrate the SWIM classification framework outperforms the JRC dataset in spatiotemporal analyses of small waterbody dynamics with previously unattainable sensitivity and temporal frequency. Most importantly, this study shows that reliable surface water classifications can be obtained for all pixels in every available Landsat image, even those containing cloud cover, after performing gap-fill imputation. By using this technique, the SWIM framework supports monitoring water extent on a submonthly basis, which is especially applicable to assessing the impact of short-term flood and drought events. Additionally, our results contribute to addressing the challenges of training machine learning classifiers with biased ground truth data and identifying images that contain regions of anomalous classification errors. 
    more » « less
  2. In recent years, deep learning has achieved tremendous success in image segmentation for computer vision applications. The performance of these models heavily relies on the availability of large-scale high-quality training labels (e.g., PASCAL VOC 2012). Unfortunately, such large-scale high-quality training data are often unavailable in many real-world spatial or spatiotemporal problems in earth science and remote sensing (e.g., mapping the nationwide river streams for water resource management). Although extensive efforts have been made to reduce the reliance on labeled data (e.g., semi-supervised or unsupervised learning, few-shot learning), the complex nature of geographic data such as spatial heterogeneity still requires sufficient training labels when transferring a pre-trained model from one region to another. On the other hand, it is often much easier to collect lower-quality training labels with imperfect alignment with earth imagery pixels (e.g., through interpreting coarse imagery by non-expert volunteers). However, directly training a deep neural network on imperfect labels with geometric annotation errors could significantly impact model performance. Existing research that overcomes imperfect training labels either focuses on errors in label class semantics or characterizes label location errors at the pixel level. These methods do not fully incorporate the geometric properties of label location errors in the vector representation. To fill the gap, this article proposes a weakly supervised learning framework to simultaneously update deep learning model parameters and infer hidden true vector label locations. Specifically, we model label location errors in the vector representation to partially reserve geometric properties (e.g., spatial contiguity within line segments). Evaluations on real-world datasets in the National Hydrography Dataset (NHD) refinement application illustrate that the proposed framework outperforms baseline methods in classification accuracy. 
    more » « less
  3. null (Ed.)
    Urban flooding is a major natural disaster that poses a serious threat to the urban environment. It is highly demanded that the flood extent can be mapped in near real-time for disaster rescue and relief missions, reconstruction efforts, and financial loss evaluation. Many efforts have been taken to identify the flooding zones with remote sensing data and image processing techniques. Unfortunately, the near real-time production of accurate flood maps over impacted urban areas has not been well investigated due to three major issues. (1) Satellite imagery with high spatial resolution over urban areas usually has nonhomogeneous background due to different types of objects such as buildings, moving vehicles, and road networks. As such, classical machine learning approaches hardly can model the spatial relationship between sample pixels in the flooding area. (2) Handcrafted features associated with the data are usually required as input for conventional flood mapping models, which may not be able to fully utilize the underlying patterns of a large number of available data. (3) High-resolution optical imagery often has varied pixel digital numbers (DNs) for the same ground objects as a result of highly inconsistent illumination conditions during a flood. Accordingly, traditional methods of flood mapping have major limitations in generalization based on testing data. To address the aforementioned issues in urban flood mapping, we developed a patch similarity convolutional neural network (PSNet) using satellite multispectral surface reflectance imagery before and after flooding with a spatial resolution of 3 meters. We used spectral reflectance instead of raw pixel DNs so that the influence of inconsistent illumination caused by varied weather conditions at the time of data collection can be greatly reduced. Such consistent spectral reflectance data also enhance the generalization capability of the proposed model. Experiments on the high resolution imagery before and after the urban flooding events (i.e., the 2017 Hurricane Harvey and the 2018 Hurricane Florence) showed that the developed PSNet can produce urban flood maps with consistently high precision, recall, F1 score, and overall accuracy compared with baseline classification models including support vector machine, decision tree, random forest, and AdaBoost, which were often poor in either precision or recall. The study paves the way to fuse bi-temporal remote sensing images for near real-time precision damage mapping associated with other types of natural hazards (e.g., wildfires and earthquakes). 
    more » « less
  4. null (Ed.)
    Marine remote sensing provides comprehensive characterizations of the ocean surface across space and time. However, cloud cover is a significant challenge in marine satellite monitoring. Researchers have proposed various algorithms to fill data gaps “below the clouds”, but a comparison of algorithm performance across several geographic regions has not yet been conducted. We compared ten basic algorithms, including data-interpolating empirical orthogonal functions (DINEOF), geostatistical interpolation, and supervised learning methods, in two gap-filling tasks: the reconstruction of chlorophyll a in pixels covered by clouds, and the correction of regional mean chlorophyll a concentrations. For this purpose, we combined tens of cloud-free images with hundreds of cloud masks in four study areas, creating thousands of situations in which to test the algorithms. The best algorithm depended on the study area and task, and differences between the best algorithms were small. Ordinary Kriging, spatiotemporal Kriging, and DINEOF worked well across study areas and tasks. Random forests reconstructed individual pixels most accurately. We also found that high levels of cloud cover led to considerable errors in estimated regional mean chlorophyll a concentration. These errors could, however, be reduced by about 50% to 80% (depending on the study area) with prior cloud-filling. 
    more » « less
  5. Landsat 5 has produced imagery for decades that can now be viewed and manipulated in Google Earth Engine, but a general, automated way of producing a coherent time series from these images—particularly over cloudy areas in the distant past—is elusive. Here, we create a land use and land cover (LULC) time series for part of tropical Mato Grosso, Brazil, using the Bayesian Updating of Land Cover: Unsupervised (BULC-U) technique. The algorithm built backward in time from the GlobCover 2009 data set, a multi-category global LULC data set at 300 m resolution for the year 2009, combining it with Landsat time series imagery to create a land cover time series for the period 1986–2000. Despite the substantial LULC differences between the 1990s and 2009 in this area, much of the landscape remained the same: we asked whether we could harness those similarities and differences to recreate an accurate version of the earlier LULC. The GlobCover basis and the Landsat-5 images shared neither a common spatial resolution nor time frame, But BULC-U successfully combined the labels from the coarser classification with the spatial detail of Landsat. The result was an accurate fine-scale time series that quantified the expansion of deforestation in the study area, which more than doubled in size during this time. Earth Engine directly enabled the fusion of these different data sets held in its catalog: its flexible treatment of spatial resolution, rapid prototyping, and overall processing speed permitted the development and testing of this study. Many would-be users of remote sensing data are currently limited by the need to have highly specialized knowledge to create classifications of older data. The approach shown here presents fewer obstacles to participation and allows a wide audience to create their own time series of past decades. By leveraging both the varied data catalog and the processing speed of Earth Engine, this research can contribute to the rapid advances underway in multi-temporal image classification techniques. Given Earth Engine’s power and deep catalog, this research further opens up remote sensing to a rapidly growing community of researchers and managers who need to understand the long-term dynamics of terrestrial systems. 
    more » « less