skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Submonthly Surface Water Classification Framework via Gap-Fill Imputation and Random Forest Classifiers of Landsat Imagery
Global surface water classification layers, such as the European Joint Research Centre’s (JRC) Monthly Water History dataset, provide a starting point for accurate and large scale analyses of trends in waterbody extents. On the local scale, there is an opportunity to increase the accuracy and temporal frequency of these surface water maps by using locally trained classifiers and gap-filling missing values via imputation in all available satellite images. We developed the Surface Water IMputation (SWIM) classification framework using R and the Google Earth Engine computing platform to improve water classification compared to the JRC study. The novel contributions of the SWIM classification framework include (1) a cluster-based algorithm to improve classification sensitivity to a variety of surface water conditions and produce approximately unbiased estimation of surface water area, (2) a method to gap-fill every available Landsat image for a region of interest to generate submonthly classifications at the highest possible temporal frequency, (3) an outlier detection method for identifying images that contain classification errors due to failures in cloud masking. Validation and several case studies demonstrate the SWIM classification framework outperforms the JRC dataset in spatiotemporal analyses of small waterbody dynamics with previously unattainable sensitivity and temporal frequency. Most importantly, this study shows that reliable surface water classifications can be obtained for all pixels in every available Landsat image, even those containing cloud cover, after performing gap-fill imputation. By using this technique, the SWIM framework supports monitoring water extent on a submonthly basis, which is especially applicable to assessing the impact of short-term flood and drought events. Additionally, our results contribute to addressing the challenges of training machine learning classifiers with biased ground truth data and identifying images that contain regions of anomalous classification errors.  more » « less
Award ID(s):
1828942
PAR ID:
10273060
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Remote Sensing
Volume:
13
Issue:
9
ISSN:
2072-4292
Page Range / eLocation ID:
1742
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract. The empirical attribution of hydrologic change presents a unique data availability challenge in terms of establishing baseline prior conditions, as one cannot go back in time to retrospectively collect the necessary data. Although global remote sensing data can alleviate this challenge, most satellite missions are too recent to capture changes that happened long ago enough to provide sufficient observations for adequate statistical inference. In that context, the 4 decades of continuous global high-resolution monitoring enabled by the Landsat missions are an unrivaled source of information. However, constructing a time series of land cover observation across Landsat missions remains a significant challenge because cloud masking and inconsistent image quality complicate the automatized interpretation of optical imagery. Focusing on the monitoring of lake water extent, we present an automatized gap-filling approach to infer the class (wet or dry) of pixels masked by clouds or sensing errors. The classification outcome of unmasked pixels is compiled across images taken on different dates to estimate the inundation frequency of each pixel, based on the assumption that different pixels are masked at different times. The inundation frequency is then used to infer the inundation status of masked pixels on individual images through supervised classification. Applied to a variety of global lakes with substantial long term or seasonal fluctuations, the approach successfully captured water extent variations obtained from in situ gauges (where applicable), or from other Landsat missions during overlapping time periods. Although sensitive to classification errors in the input imagery, the gap-filling algorithm is straightforward to implement on Google's Earth Engine platform and stands as a scalable approach to reliably monitor, and ultimately attribute, historical changes in water bodies. 
    more » « less
  2. Abstract Surface meltwater is becoming increasingly widespread on Antarctic ice shelves. It is stored within surface ponds and streams, or within firn pore spaces, which may saturate to form slush. Slush can reduce firn air content, increasing an ice-shelf's vulnerability to break-up. To date, no study has mapped the changing extent of slush across ice shelves. Here, we use Google Earth Engine and Landsat 8 images from six ice shelves to generate training classes using a k -means clustering algorithm, which are used to train a random forest classifier to identify both slush and ponded water. Validation using expert elicitation gives accuracies of 84% and 82% for the ponded water and slush classes, respectively. Errors result from subjectivity in identifying the ponded water/slush boundary, and from inclusion of cloud and shadows. We apply our classifier to the Roi Baudouin Ice Shelf for the entire 2013–20 Landsat 8 record. On average, 64% of all surface meltwater is classified as slush and 36% as ponded water. Total meltwater areal extent is greatest between late January and mid-February. This highlights the importance of mapping slush when studying surface meltwater on ice shelves. Future research will apply the classifier across all Antarctic ice shelves. 
    more » « less
  3. Satellites provide a temporally discontinuous record of hydrological conditions along Earth’s rivers (e.g., river width, height, water quality). The degree to which archived satellite data effectively capture the overall population of river flow frequency is unknown. Here, we use the entire archives of Landsat 5, 7, and 8 to determine when a cloud-free image is available over the United States Geological Survey (USGS) river gauges located on Landsat-observable rivers. We compare the flow frequency distribution derived from the daily gauge record to the flow frequency distribution derived from ideally sampling gauged discharge based on the timing of cloud-free Landsat overpasses. Examining the patterns of flow frequency across multiple gauges, we find that there is not a statistically significant difference between the flow frequency distribution associated with observations contained within the Landsat archive and the flow frequency distribution derived from the daily gauge data (α = 0.05), except for hydrological extremes like maximum and minimum flow. At individual gauges, we find that Landsat observations span a wide range of hydrological conditions (97% of total flow variability observed in 90% of the study gauges) but the degree to which the Landsat sample can represent flow frequency distribution varies from location to location and depends on sample size. The results of this study indicate that the Landsat archive is, on average, representative of the temporal frequencies of hydrological conditions present along Earth’s large rivers with broad utility for hydrological, ecologic and biogeochemical evaluations of river systems. 
    more » « less
  4. Surface meltwater generated on ice shelves fringing the Antarctic Ice Sheet can drive ice-shelf collapse, leading to ice sheet mass loss and contributing to global sea level rise. A quantitative assessment of supraglacial lake evolution is required to understand the influence of Antarctic surface meltwater on ice-sheet and ice-shelf stability. Cloud computing platforms have made the required remote sensing analysis computationally trivial, yet a careful evaluation of image processing techniques for pan-Antarctic lake mapping has yet to be performed. This work paves the way for automating lake identification at a continental scale throughout the satellite observational record via a thorough methodological analysis. We deploy a suite of different trained supervised classifiers to map and quantify supraglacial lake areas from multispectral Landsat-8 scenes, using training data generated via manual interpretation of the results from k-means clustering. Best results are obtained using training datasets that comprise spectrally diverse unsupervised clusters from multiple regions and that include rock and cloud shadow classes. We successfully apply our trained supervised classifiers across two ice shelves with different supraglacial lake characteristics above a threshold sun elevation of 20°, achieving classification accuracies of over 90% when compared to manually generated validation datasets. The application of our trained classifiers produces a seasonal pattern of lake evolution. Cloud shadowed areas hinder large-scale application of our classifiers, as in previous work. Our results show that caution is required before deploying ‘off the shelf’ algorithms for lake mapping in Antarctica, and suggest that careful scrutiny of training data and desired output classes is essential for accurate results. Our supervised classification technique provides an alternative and independent method of lake identification to inform the development of a continent-wide supraglacial lake mapping product. 
    more » « less
  5. null (Ed.)
    Marine remote sensing provides comprehensive characterizations of the ocean surface across space and time. However, cloud cover is a significant challenge in marine satellite monitoring. Researchers have proposed various algorithms to fill data gaps “below the clouds”, but a comparison of algorithm performance across several geographic regions has not yet been conducted. We compared ten basic algorithms, including data-interpolating empirical orthogonal functions (DINEOF), geostatistical interpolation, and supervised learning methods, in two gap-filling tasks: the reconstruction of chlorophyll a in pixels covered by clouds, and the correction of regional mean chlorophyll a concentrations. For this purpose, we combined tens of cloud-free images with hundreds of cloud masks in four study areas, creating thousands of situations in which to test the algorithms. The best algorithm depended on the study area and task, and differences between the best algorithms were small. Ordinary Kriging, spatiotemporal Kriging, and DINEOF worked well across study areas and tasks. Random forests reconstructed individual pixels most accurately. We also found that high levels of cloud cover led to considerable errors in estimated regional mean chlorophyll a concentration. These errors could, however, be reduced by about 50% to 80% (depending on the study area) with prior cloud-filling. 
    more » « less