skip to main content


Title: A Robust Hybrid Deep Learning Model for Spatiotemporal Image Fusion
Dense time-series remote sensing data with detailed spatial information are highly desired for the monitoring of dynamic earth systems. Due to the sensor tradeoff, most remote sensing systems cannot provide images with both high spatial and temporal resolutions. Spatiotemporal image fusion models provide a feasible solution to generate such a type of satellite imagery, yet existing fusion methods are limited in predicting rapid and/or transient phenological changes. Additionally, a systematic approach to assessing and understanding how varying levels of temporal phenological changes affect fusion results is lacking in spatiotemporal fusion research. The objective of this study is to develop an innovative hybrid deep learning model that can effectively and robustly fuse the satellite imagery of various spatial and temporal resolutions. The proposed model integrates two types of network models: super-resolution convolutional neural network (SRCNN) and long short-term memory (LSTM). SRCNN can enhance the coarse images by restoring degraded spatial details, while LSTM can learn and extract the temporal changing patterns from the time-series images. To systematically assess the effects of varying levels of phenological changes, we identify image phenological transition dates and design three temporal phenological change scenarios representing rapid, moderate, and minimal phenological changes. The hybrid deep learning model, alongside three benchmark fusion models, is assessed in different scenarios of phenological changes. Results indicate the hybrid deep learning model yields significantly better results when rapid or moderate phenological changes are present. It holds great potential in generating high-quality time-series datasets of both high spatial and temporal resolutions, which can further benefit terrestrial system dynamic studies. The innovative approach to understanding phenological changes’ effect will help us better comprehend the strengths and weaknesses of current and future fusion models.  more » « less
Award ID(s):
1951657 1849821
NSF-PAR ID:
10319488
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Remote Sensing
Volume:
13
Issue:
24
ISSN:
2072-4292
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. State-of-the-art deep learning technology has been successfully applied to relatively small selected areas of very high spatial resolution (0.15 and 0.25 m) optical aerial imagery acquired by a fixed-wing aircraft to automatically characterize ice-wedge polygons (IWPs) in the Arctic tundra. However, any mapping of IWPs at regional to continental scales requires images acquired on different sensor platforms (particularly satellite) and a refined understanding of the performance stability of the method across sensor platforms through reliable evaluation assessments. In this study, we examined the transferability of a deep learning Mask Region-Based Convolutional Neural Network (R-CNN) model for mapping IWPs in satellite remote sensing imagery (~0.5 m) covering 272 km2 and unmanned aerial vehicle (UAV) (0.02 m) imagery covering 0.32 km2. Multi-spectral images were obtained from the WorldView-2 satellite sensor and pan-sharpened to ~0.5 m, and a 20 mp CMOS sensor camera onboard a UAV, respectively. The training dataset included 25,489 and 6022 manually delineated IWPs from satellite and fixed-wing aircraft aerial imagery near the Arctic Coastal Plain, northern Alaska. Quantitative assessments showed that individual IWPs were correctly detected at up to 72% and 70%, and delineated at up to 73% and 68% F1 score accuracy levels for satellite and UAV images, respectively. Expert-based qualitative assessments showed that IWPs were correctly detected at good (40–60%) and excellent (80–100%) accuracy levels for satellite and UAV images, respectively, and delineated at excellent (80–100%) level for both images. We found that (1) regardless of spatial resolution and spectral bands, the deep learning Mask R-CNN model effectively mapped IWPs in both remote sensing satellite and UAV images; (2) the model achieved a better accuracy in detection with finer image resolution, such as UAV imagery, yet a better accuracy in delineation with coarser image resolution, such as satellite imagery; (3) increasing the number of training data with different resolutions between the training and actual application imagery does not necessarily result in better performance of the Mask R-CNN in IWPs mapping; (4) and overall, the model underestimates the total number of IWPs particularly in terms of disjoint/incomplete IWPs. 
    more » « less
  2. Landsat 5 has produced imagery for decades that can now be viewed and manipulated in Google Earth Engine, but a general, automated way of producing a coherent time series from these images—particularly over cloudy areas in the distant past—is elusive. Here, we create a land use and land cover (LULC) time series for part of tropical Mato Grosso, Brazil, using the Bayesian Updating of Land Cover: Unsupervised (BULC-U) technique. The algorithm built backward in time from the GlobCover 2009 data set, a multi-category global LULC data set at 300 m resolution for the year 2009, combining it with Landsat time series imagery to create a land cover time series for the period 1986–2000. Despite the substantial LULC differences between the 1990s and 2009 in this area, much of the landscape remained the same: we asked whether we could harness those similarities and differences to recreate an accurate version of the earlier LULC. The GlobCover basis and the Landsat-5 images shared neither a common spatial resolution nor time frame, But BULC-U successfully combined the labels from the coarser classification with the spatial detail of Landsat. The result was an accurate fine-scale time series that quantified the expansion of deforestation in the study area, which more than doubled in size during this time. Earth Engine directly enabled the fusion of these different data sets held in its catalog: its flexible treatment of spatial resolution, rapid prototyping, and overall processing speed permitted the development and testing of this study. Many would-be users of remote sensing data are currently limited by the need to have highly specialized knowledge to create classifications of older data. The approach shown here presents fewer obstacles to participation and allows a wide audience to create their own time series of past decades. By leveraging both the varied data catalog and the processing speed of Earth Engine, this research can contribute to the rapid advances underway in multi-temporal image classification techniques. Given Earth Engine’s power and deep catalog, this research further opens up remote sensing to a rapidly growing community of researchers and managers who need to understand the long-term dynamics of terrestrial systems. 
    more » « less
  3. Abstract

    Forecasting rates of forest succession at landscape scales will aid global efforts to restore tree cover to millions of hectares of degraded land. While optical satellite remote sensing can detect regional land cover change, quantifying forest structural change is challenging. We developed a state‐space modeling framework that applies Landsat satellite data to estimate variability in rates of natural regeneration between sites in a tropical landscape. Our models work by disentangling measurement error in Landsat‐derived spectral reflectance from process error related to successional variability. We applied our modeling framework to rank rates of forest succession between 10 naturally regenerating sites in Southwestern Panama from about 2001 to 2015 and tested how different models for measurement error impacted forecast accuracy, ecological inference, and rankings of successional rates between sites. We achieved the greatest increase in forecasting accuracy by adding intra‐annual phenological variation to a model based on Landsat‐derived normalized difference vegetation index (NDVI). The best‐performing model accounted for inter‐ and intra‐annual noise in spectral reflectance and translated NDVI to canopy height via Landsat–lidar fusion. Modeling forest succession as a function of canopy height rather than NDVI also resulted in more realistic estimates of forest state during early succession, including greater confidence in rank order of successional rates between sites. These results establish the viability of state‐space models to quantify ecological dynamics from time series of space‐borne imagery. State‐space models also provide a statistical approach well‐suited to fusing high‐resolution data, such as airborne lidar, with lower‐resolution data that provides better temporal and spatial coverage, such as the Landsat satellite record. Monitoring forest succession using satellite imagery could play a key role in achieving global restoration targets, including identifying sites that will regain tree cover with minimal intervention.

     
    more » « less
  4. Earth observation data with high spatiotemporal resolution are critical for dynamic monitoring and prediction in geoscience applications, however, due to some technique and budget limitations, it is not easy to acquire satellite images with both high spatial and high temporal resolutions. Spatiotemporal image fusion techniques provide a feasible and economical solution for generating dense-time data with high spatial resolution, pushing the limits of current satellite observation systems. Among existing various fusion algorithms, deeplearningbased models reveal a promising prospect with higher accuracy and robustness. This paper refined and improved the existing deep convolutional spatiotemporal fusion network (DCSTFN) to further boost model prediction accuracy and enhance image quality. The contributions of this paper are twofold. First, the fusion result is improved considerably with brand-new network architecture and a novel compound loss function. Experiments conducted in two different areas demonstrate these improvements by comparing them with existing algorithms. The enhanced DCSTFN model shows superior performance with higher accuracy, vision quality, and robustness. Second, the advantages and disadvantages of existing deeplearningbased spatiotemporal fusion models are comparatively discussed and a network design guide for spatiotemporal fusion is provided as a reference for future research. Those comparisons and guidelines are summarized based on numbers of actual experiments and have promising potentials to be applied for other image sources with customized spatiotemporal fusion networks. 
    more » « less
  5. Abstract

    Due to climate change and rapid urbanization, Urban Heat Island (UHI), featuring significantly higher temperature in metropolitan areas than surrounding areas, has caused negative impacts on urban communities. Temporal granularity is often limited in UHI studies based on satellite remote sensing data that typically has multi-day frequency coverage of a particular urban area. This low temporal frequency has restricted the development of models for predicting UHI. To resolve this limitation, this study has developed a cyber-based geographic information science and systems (cyberGIS) framework encompassing multiple machine learning models for predicting UHI with high-frequency urban sensor network data combined with remote sensing data focused on Chicago, Illinois, from 2018 to 2020. Enabled by rapid advances in urban sensor network technologies and high-performance computing, this framework is designed to predict UHI in Chicago with fine spatiotemporal granularity based on environmental data collected with the Array of Things (AoT) urban sensor network and Landsat-8 remote sensing imagery. Our computational experiments revealed that a random forest regression (RFR) model outperforms other models with the prediction accuracy of 0.45 degree Celsius in 2020 and 0.8 degree Celsius in 2018 and 2019 with mean absolute error as the evaluation metric. Humidity, distance to geographic center, and PM2.5concentration are identified as important factors contributing to the model performance. Furthermore, we estimate UHI in Chicago with 10-min temporal frequency and 1-km spatial resolution on the hottest day in 2018. It is demonstrated that the RFR model can accurately predict UHI at fine spatiotemporal scales with high-frequency urban sensor network data integrated with satellite remote sensing data.

     
    more » « less