skip to main content


Title: Near-Real-Time Forecast of Satellite-Based Soil Moisture Using Long Short-Term Memory with an Adaptive Data Integration Kernel

Nowcasts, or near-real-time (NRT) forecasts, of soil moisture based on the Soil Moisture Active and Passive (SMAP) mission could provide substantial value for a range of applications including hazards monitoring and agricultural planning. To provide such a NRT forecast with high fidelity, we enhanced a time series deep learning architecture, long short-term memory (LSTM), with a novel data integration (DI) kernel to assimilate the most recent SMAP observations as soon as they become available. The kernel is adaptive in that it can accommodate irregular observational schedules. Testing over the CONUS, this NRT forecast product showcases predictions with unprecedented accuracy when evaluated against subsequent SMAP retrievals. It showed smaller error than NRT forecasts reported in the literature, especially at longer forecast latency. The comparative advantage was due to LSTM’s structural improvements, as well as its ability to utilize more input variables and more training data. The DI-LSTM was compared to the original LSTM model that runs without data integration, referred to as the projection model here. We found that the DI procedure removed the autocorrelated effects of forcing errors and errors due to processes not represented in the inputs, for example, irrigation and floodplain/lake inundation, as well as mismatches due to unseen forcing conditions. The effects of this purely data-driven DI kernel are discussed for the first time in the geosciences. Furthermore, this work presents an upper-bound estimate for the random component of the SMAP retrieval error.

 
more » « less
Award ID(s):
1832294
NSF-PAR ID:
10137627
Author(s) / Creator(s):
 ;  
Publisher / Repository:
American Meteorological Society
Date Published:
Journal Name:
Journal of Hydrometeorology
Volume:
21
Issue:
3
ISSN:
1525-755X
Page Range / eLocation ID:
p. 399-413
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The Soil Moisture Active Passive (SMAP) mission measures important soil moisture data globally. SMAP's products might not always perform better than land surface models (LSM) when evaluated against in situ measurements. However, we hypothesize that SMAP presents added value for long-term soil moisture estimation in a data fusion setting as evaluated by in situ data. Here, with the help of a time series deep learning (DL) method, we created a seamlessly extended SMAP data set to test this hypothesis and, importantly, gauge whether such benefits extend to years beyond SMAP's limited lifespan. We first show that the DL model, called long short-term memory (LSTM), can extrapolate SMAP for several years and the results are similar to the training period. We obtained prolongation results with low-performance degradation where SMAP itself matches well with in situ data. Interannual trends of root-zone soil moisture are surprisingly well captured by LSTM. In some cases, LSTM's performance is limited by SMAP, whose main issue appears to be its shallow sensing depth. Despite this limitation, a simple average between LSTM and an LSM Noah frequently outperforms Noah alone. Moreover, Noah combined with LSTM is more skillful than when it is combined with another LSM. Over sparsely instrumented sites, the Noah-LSTM combination shows a stronger edge. Our results verified the value of LSTM-extended SMAP data. Moreover, DL is completely data driven and does not require structural assumptions. As such, it has its unique potential for long-term projections and may be applied synergistically with other model-data integration techniques. 
    more » « less
  2. Abstract

    Recent observations with varied schedules and types (moving average, snapshot, or regularly spaced) can help to improve streamflow forecasts, but it is challenging to integrate them effectively. Based on a long short‐term memory (LSTM) streamflow model, we tested multiple versions of a flexible procedure we call data integration (DI) to leverage recent discharge measurements to improve forecasts. DI accepts lagged inputs either directly or through a convolutional neural network unit. DI ubiquitously elevated streamflow forecast performance to unseen levels, reaching a record continental‐scale median Nash‐Sutcliffe Efficiency coefficient value of 0.86. Integrating moving‐average discharge, discharge from the last few days, or even average discharge from the previous calendar month could all improve daily forecasts. Directly using lagged observations as inputs was comparable in performance to using the convolutional neural network unit. Importantly, we obtained valuable insights regarding hydrologic processes impacting LSTM and DI performance. Before applying DI, the base LSTM model worked well in mountainous or snow‐dominated regions, but less well in regions with low discharge volumes (due to either low precipitation or high precipitation‐energy synchronicity) and large interannual storage variability. DI was most beneficial in regions with high flow autocorrelation: it greatly reduced baseflow bias in groundwater‐dominated western basins and also improved peak prediction for basins with dynamical surface water storage, such as the Prairie Potholes or Great Lakes regions. However, even DI cannot elevate performance in high‐aridity basins with 1‐day flash peaks. Despite this limitation, there is much promise for a deep‐learning‐based forecast paradigm due to its performance, automation, efficiency, and flexibility.

     
    more » « less
  3. Abstract

    Accurate prediction of snow water equivalent (SWE) can be valuable for water resource managers. Recently, deep learning methods such as long short-term memory (LSTM) have exhibited high accuracy in simulating hydrologic variables and can integrate lagged observations to improve prediction, but their benefits were not clear for SWE simulations. Here we tested an LSTM network with data integration (DI) for SWE in the western United States to integrate 30-day-lagged or 7-day-lagged observations of either SWE or satellite-observed snow cover fraction (SCF) to improve future predictions. SCF proved beneficial only for shallow-snow sites during snowmelt, while lagged SWE integration significantly improved prediction accuracy for both shallow- and deep-snow sites. The median Nash–Sutcliffe model efficiency coefficient (NSE) in temporal testing improved from 0.92 to 0.97 with 30-day-lagged SWE integration, and root-mean-square error (RMSE) and the difference between estimated and observed peak SWE valuesdmaxwere reduced by 41% and 57%, respectively. DI effectively mitigated accumulated model and forcing errors that would otherwise be persistent. Moreover, by applying DI to different observations (30-day-lagged, 7-day-lagged), we revealed the spatial distribution of errors with different persistent lengths. For example, integrating 30-day-lagged SWE was ineffective for ephemeral snow sites in the southwestern United States, but significantly reduced monthly-scale biases for regions with stable seasonal snowpack such as high-elevation sites in California. These biases are likely attributable to large interannual variability in snowfall or site-specific snow redistribution patterns that can accumulate to impactful levels over time for nonephemeral sites. These results set up benchmark levels and provide guidance for future model improvement strategies.

     
    more » « less
  4. Many coastal cities are facing frequent flooding from storm events that are made worse by sea level rise and climate change. The groundwater table level in these low relief coastal cities is an important, but often overlooked, factor in the recurrent flooding these locations face. Infiltration of stormwater and water intrusion due to tidal forcing can cause already shallow groundwater tables to quickly rise toward the land surface. This decreases available storage which increases runoff, stormwater system loads, and flooding. Groundwater table forecasts, which could help inform the modeling and management of coastal flooding, are generally unavailable. This study explores two machine learning models, Long Short-term Memory (LSTM) networks and Recurrent Neural Networks (RNN), to model and forecast groundwater table response to storm events in the flood prone coastal city of Norfolk, Virginia. To determine the effect of training data type on model accuracy, two types of datasets (i) the continuous time series and (ii) a dataset of only storm events, created from observed groundwater table, rainfall, and sea level data from 2010–2018 are used to train and test the models. Additionally, a real-time groundwater table forecasting scenario was carried out to compare the models’ abilities to predict groundwater table levels given forecast rainfall and sea level as input data. When modeling the groundwater table with observed data, LSTM networks were found to have more predictive skill than RNNs (root mean squared error (RMSE) of 0.09 m versus 0.14 m, respectively). The real-time forecast scenario showed that models trained only on storm event data outperformed models trained on the continuous time series data (RMSE of 0.07 m versus 0.66 m, respectively) and that LSTM outperformed RNN models. Because models trained with the continuous time series data had much higher RMSE values, they were not suitable for predicting the groundwater table in the real-time scenario when using forecast input data. These results demonstrate the first use of LSTM networks to create hourly forecasts of groundwater table in a coastal city and show they are well suited for creating operational forecasts in real-time. As groundwater table levels increase due to sea level rise, forecasts of groundwater table will become an increasingly valuable part of coastal flood modeling and management. 
    more » « less
  5. Abstract

    In-depth knowledge about the global patterns and dynamics of land surface net water flux (NWF) is essential for quantification of depletion and recharge of groundwater resources. Net water flux cannot be directly measured, and its estimates as a residual of individual surface flux components often suffer from mass conservation errors due to accumulated systematic biases of individual fluxes. Here, for the first time, we provide direct estimates of global NWF based on near-surface satellite soil moisture retrievals from the Soil Moisture Ocean Salinity (SMOS) and Soil Moisture Active Passive (SMAP) satellites. We apply a recently developed analytical model derived via inversion of the linearized Richards’ equation. The model is parsimonious, yet yields unbiased estimates of long-term cumulative NWF that is generally well correlated with the terrestrial water storage anomaly from the Gravity Recovery and Climate Experiment (GRACE) satellite. In addition, in conjunction with precipitation and evapotranspiration retrievals, the resultant NWF estimates provide a new means for retrieving global infiltration and runoff from satellite observations. However, the efficacy of the proposed approach over densely vegetated regions is questionable, due to the uncertainty of the satellite soil moisture retrievals and the lack of explicit parameterization of transpiration by deeply rooted plants in the proposed model. Future research is needed to advance this modeling paradigm to explicitly account for plant transpiration.

     
    more » « less