skip to main content

Title: A Submonthly Surface Water Classification Framework via Gap-Fill Imputation and Random Forest Classifiers of Landsat Imagery
Global surface water classification layers, such as the European Joint Research Centre’s (JRC) Monthly Water History dataset, provide a starting point for accurate and large scale analyses of trends in waterbody extents. On the local scale, there is an opportunity to increase the accuracy and temporal frequency of these surface water maps by using locally trained classifiers and gap-filling missing values via imputation in all available satellite images. We developed the Surface Water IMputation (SWIM) classification framework using R and the Google Earth Engine computing platform to improve water classification compared to the JRC study. The novel contributions of the SWIM classification framework include (1) a cluster-based algorithm to improve classification sensitivity to a variety of surface water conditions and produce approximately unbiased estimation of surface water area, (2) a method to gap-fill every available Landsat image for a region of interest to generate submonthly classifications at the highest possible temporal frequency, (3) an outlier detection method for identifying images that contain classification errors due to failures in cloud masking. Validation and several case studies demonstrate the SWIM classification framework outperforms the JRC dataset in spatiotemporal analyses of small waterbody dynamics with previously unattainable sensitivity and temporal frequency. Most importantly, this study shows that reliable surface water classifications can be obtained for all pixels in every available Landsat image, even those containing cloud cover, after performing gap-fill imputation. By using this technique, the SWIM framework supports monitoring water extent on a submonthly basis, which is especially applicable to assessing the impact of short-term flood and drought events. Additionally, our results contribute to addressing the challenges of training machine learning classifiers with biased ground truth data and identifying images that contain regions of anomalous classification errors.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Remote Sensing
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract. The empirical attribution of hydrologic change presents a unique data availability challenge in terms of establishing baseline prior conditions, as one cannot go back in time to retrospectively collect the necessary data. Although global remote sensing data can alleviate this challenge, most satellite missions are too recent to capture changes that happened long ago enough to provide sufficient observations for adequate statistical inference. In that context, the 4 decades of continuous global high-resolution monitoring enabled by the Landsat missions are an unrivaled source of information. However, constructing a time series of land cover observation across Landsat missions remains a significant challenge because cloud masking and inconsistent image quality complicate the automatized interpretation of optical imagery. Focusing on the monitoring of lake water extent, we present an automatized gap-filling approach to infer the class (wet or dry) of pixels masked by clouds or sensing errors. The classification outcome of unmasked pixels is compiled across images taken on different dates to estimate the inundation frequency of each pixel, based on the assumption that different pixels are masked at different times. The inundation frequency is then used to infer the inundation status of masked pixels on individual images through supervised classification. Applied to a variety of global lakes with substantial long term or seasonal fluctuations, the approach successfully captured water extent variations obtained from in situ gauges (where applicable), or from other Landsat missions during overlapping time periods. Although sensitive to classification errors in the input imagery, the gap-filling algorithm is straightforward to implement on Google's Earth Engine platform and stands as a scalable approach to reliably monitor, and ultimately attribute, historical changes in water bodies. 
    more » « less
  2. Abstract. Mixed-phase Southern Ocean clouds are challenging to simulate, and theirrepresentation in climate models is an important control on climatesensitivity. In particular, the amount of supercooled water and frozen massthat they contain in the present climate is a predictor of their planetaryfeedback in a warming climate. The recent Southern Ocean Clouds, Radiation, Aerosol Transport Experimental Study (SOCRATES) vastly increased theamount of in situ data available from mixed-phase Southern Ocean clouds usefulfor model evaluation. Bulk measurements distinguishing liquid and ice watercontent are not available from SOCRATES, so single-particle phaseclassifications from the Two-Dimensional Stereo (2D-S) probe are invaluablefor quantifying mixed-phase cloud properties. Motivated by the presence oflarge biases in existing phase discrimination algorithms, we develop a noveltechnique for single-particle phase classification of binary 2D-S images usinga random forest algorithm, which we refer to as the University of WashingtonIce–Liquid Discriminator (UWILD). UWILD uses 14 parameters computed frombinary image data, as well as particle inter-arrival time, to predict phase.We use liquid-only and ice-dominated time periods within the SOCRATES datasetas training and testing data. This novel approach to model training avoidsmajor pitfalls associated with using manually labeled data, including reducedmodel generalizability and high labor costs. We find that UWILD is wellcalibrated and has an overall accuracy of 95 % compared to72 % and 79 % for two existing phase classificationalgorithms that we compare it with. UWILD improves classifications of smallice crystals and large liquid drops in particular and has more flexibilitythan the other algorithms to identify both liquid-dominated and ice-dominatedregions within the SOCRATES dataset. UWILD misclassifies a small percentageof large liquid drops as ice. Such misclassified particles are typicallyassociated with model confidence below 75 % and can easily befiltered out of the dataset. UWILD phase classifications show that particleswith area-equivalent diameter (Deq)  < 0.17 mm are mostlyliquid at all temperatures sampled, down to −40 ∘C. Largerparticles (Deq>0.17 mm) are predominantly frozen at alltemperatures below 0 ∘C. Between 0 and 5 ∘C,there are roughly equal numbers of frozen and liquid mid-sized particles (0.170.33 mm) are mostly frozen. We also use UWILD's phaseclassifications to estimate sub-1 Hz phase heterogeneity, and we showexamples of meter-scale cloud phase heterogeneity in the SOCRATES dataset. 
    more » « less
  3. Contemporary climate change in Alaska has resulted in amplified rates of press and pulse disturbances that drive ecosystem change with significant consequences for socio‐environmental systems. Despite the vulnerability of Arctic and boreal landscapes to change, little has been done to characterize landscape change and associated drivers across northern high‐latitude ecosystems. Here we characterize the historical sensitivity of Alaska's ecosystems to environmental change and anthropogenic disturbances using expert knowledge, remote sensing data, and spatiotemporal analyses and modeling. Time‐series analysis of moderate—and high‐resolution imagery was used to characterize land‐ and water‐surface dynamics across Alaska. Some 430,000 interpretations of ecological and geomorphological change were made using historical air photos and satellite imagery, and corroborate land‐surface greening, browning, and wetness/moisture trend parameters derived from peak‐growing season Landsat imagery acquired from 1984 to 2015. The time series of change metrics, together with climatic data and maps of landscape characteristics, were incorporated into a modeling framework for mapping and understanding of drivers of change throughout Alaska. According to our analysis, approximately 13% (~174,000 ± 8700 km2) of Alaska has experienced directional change in the last 32 years (±95% confidence intervals). At the ecoregions level, substantial increases in remotely sensed vegetation productivity were most pronounced in western and northern foothills of Alaska, which is explained by vegetation growth associated with increasing air temperatures. Significant browning trends were largely the result of recent wildfires in interior Alaska, but browning trends are also driven by increases in evaporative demand and surface‐water gains that have predominately occurred over warming permafrost landscapes. Increased rates of photosynthetic activity are associated with stabilization and recovery processes following wildfire, timber harvesting, insect damage, thermokarst, glacial retreat, and lake infilling and drainage events. Our results fill a critical gap in the understanding of historical and potential future trajectories of change in northern high‐latitude regions. 
    more » « less
  4. In recent years, deep learning has achieved tremendous success in image segmentation for computer vision applications. The performance of these models heavily relies on the availability of large-scale high-quality training labels (e.g., PASCAL VOC 2012). Unfortunately, such large-scale high-quality training data are often unavailable in many real-world spatial or spatiotemporal problems in earth science and remote sensing (e.g., mapping the nationwide river streams for water resource management). Although extensive efforts have been made to reduce the reliance on labeled data (e.g., semi-supervised or unsupervised learning, few-shot learning), the complex nature of geographic data such as spatial heterogeneity still requires sufficient training labels when transferring a pre-trained model from one region to another. On the other hand, it is often much easier to collect lower-quality training labels with imperfect alignment with earth imagery pixels (e.g., through interpreting coarse imagery by non-expert volunteers). However, directly training a deep neural network on imperfect labels with geometric annotation errors could significantly impact model performance. Existing research that overcomes imperfect training labels either focuses on errors in label class semantics or characterizes label location errors at the pixel level. These methods do not fully incorporate the geometric properties of label location errors in the vector representation. To fill the gap, this article proposes a weakly supervised learning framework to simultaneously update deep learning model parameters and infer hidden true vector label locations. Specifically, we model label location errors in the vector representation to partially reserve geometric properties (e.g., spatial contiguity within line segments). Evaluations on real-world datasets in the National Hydrography Dataset (NHD) refinement application illustrate that the proposed framework outperforms baseline methods in classification accuracy. 
    more » « less
  5. This data set contains all classifications that the Gravity Spy Machine Learning model for LIGO glitches from the first three observing runs (O1, O2 and O3, where O3 is split into O3a and O3b). Gravity Spy classified all noise events identified by the Omicron trigger pipeline in which Omicron identified that the signal-to-noise ratio was above 7.5 and the peak frequency of the noise event was between 10 Hz and 2048 Hz. To classify noise events, Gravity Spy made Omega scans of every glitch consisting of 4 different durations, which helps capture the morphology of noise events that are both short and long in duration.

    There are 22 classes used for O1 and O2 data (including No_Glitch and None_of_the_Above), while there are two additional classes used to classify O3 data.

    For O1 and O2, the glitch classes were: 1080Lines, 1400Ripples, Air_Compressor, Blip, Chirp, Extremely_Loud, Helix, Koi_Fish, Light_Modulation, Low_Frequency_Burst, Low_Frequency_Lines, No_Glitch, None_of_the_Above, Paired_Doves, Power_Line, Repeating_Blips, Scattered_Light, Scratchy, Tomte, Violin_Mode, Wandering_Line, Whistle

    For O3, the glitch classes were: 1080Lines, 1400Ripples, Air_Compressor, Blip, Blip_Low_Frequency, Chirp, Extremely_Loud, Fast_Scattering, Helix, Koi_Fish, Light_Modulation, Low_Frequency_Burst, Low_Frequency_Lines, No_Glitch, None_of_the_Above, Paired_Doves, Power_Line, Repeating_Blips, Scattered_Light, Scratchy, Tomte, Violin_Mode, Wandering_Line, Whistle

    If you would like to download the Omega scans associated with each glitch, then you can use the gravitational-wave data-analysis tool GWpy. If you would like to use this tool, please install anaconda if you have not already and create a virtual environment using the following command

    ```conda create --name gravityspy-py38 -c conda-forge python=3.8 gwpy pandas psycopg2 sqlalchemy```

    After downloading one of the CSV files for a specific era and interferometer, please run the following Python script if you would like to download the data associated with the metadata in the CSV file. We recommend not trying to download too many images at one time. For example, the script below will read data on Hanford glitches from O2 that were classified by Gravity Spy and filter for only glitches that were labelled as Blips with 90% confidence or higher, and then download the first 4 rows of the filtered table.


    from gwpy.table import GravitySpyTable

    H1_O2 ='H1_O2.csv')

    H1_O2[(H1_O2["ml_label"] == "Blip") & (H1_O2["ml_confidence"] > 0.9)]



    Each of the columns in the CSV files are taken from various different inputs: 

    [‘event_time’, ‘ifo’, ‘peak_time’, ‘peak_time_ns’, ‘start_time’, ‘start_time_ns’, ‘duration’, ‘peak_frequency’, ‘central_freq’, ‘bandwidth’, ‘channel’, ‘amplitude’, ‘snr’, ‘q_value’] contain metadata about the signal from the Omicron pipeline. 

    [‘gravityspy_id’] is the unique identifier for each glitch in the dataset. 

    [‘1400Ripples’, ‘1080Lines’, ‘Air_Compressor’, ‘Blip’, ‘Chirp’, ‘Extremely_Loud’, ‘Helix’, ‘Koi_Fish’, ‘Light_Modulation’, ‘Low_Frequency_Burst’, ‘Low_Frequency_Lines’, ‘No_Glitch’, ‘None_of_the_Above’, ‘Paired_Doves’, ‘Power_Line’, ‘Repeating_Blips’, ‘Scattered_Light’, ‘Scratchy’, ‘Tomte’, ‘Violin_Mode’, ‘Wandering_Line’, ‘Whistle’] contain the machine learning confidence for a glitch being in a particular Gravity Spy class (the confidence in all these columns should sum to unity). 

    [‘ml_label’, ‘ml_confidence’] provide the machine-learning predicted label for each glitch, and the machine learning confidence in its classification. 

    [‘url1’, ‘url2’, ‘url3’, ‘url4’] are the links to the publicly-available Omega scans for each glitch. ‘url1’ shows the glitch for a duration of 0.5 seconds, ‘url2’ for 1 seconds, ‘url3’ for 2 seconds, and ‘url4’ for 4 seconds.


    For the most recently uploaded training set used in Gravity Spy machine learning algorithms, please see Gravity Spy Training Set on Zenodo.

    For detailed information on the training set used for the original Gravity Spy machine learning paper, please see Machine learning for Gravity Spy: Glitch classification and dataset on Zenodo. 

    more » « less