skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Value of SMAP for Long-Term Soil Moisture Estimation With the Help of Deep Learning
The Soil Moisture Active Passive (SMAP) mission measures important soil moisture data globally. SMAP's products might not always perform better than land surface models (LSM) when evaluated against in situ measurements. However, we hypothesize that SMAP presents added value for long-term soil moisture estimation in a data fusion setting as evaluated by in situ data. Here, with the help of a time series deep learning (DL) method, we created a seamlessly extended SMAP data set to test this hypothesis and, importantly, gauge whether such benefits extend to years beyond SMAP's limited lifespan. We first show that the DL model, called long short-term memory (LSTM), can extrapolate SMAP for several years and the results are similar to the training period. We obtained prolongation results with low-performance degradation where SMAP itself matches well with in situ data. Interannual trends of root-zone soil moisture are surprisingly well captured by LSTM. In some cases, LSTM's performance is limited by SMAP, whose main issue appears to be its shallow sensing depth. Despite this limitation, a simple average between LSTM and an LSM Noah frequently outperforms Noah alone. Moreover, Noah combined with LSTM is more skillful than when it is combined with another LSM. Over sparsely instrumented sites, the Noah-LSTM combination shows a stronger edge. Our results verified the value of LSTM-extended SMAP data. Moreover, DL is completely data driven and does not require structural assumptions. As such, it has its unique potential for long-term projections and may be applied synergistically with other model-data integration techniques.  more » « less
Award ID(s):
1832294
PAR ID:
10077612
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE Transactions on Geoscience and Remote Sensing
ISSN:
0196-2892
Page Range / eLocation ID:
1 to 13
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Nowcasts, or near-real-time (NRT) forecasts, of soil moisture based on the Soil Moisture Active and Passive (SMAP) mission could provide substantial value for a range of applications including hazards monitoring and agricultural planning. To provide such a NRT forecast with high fidelity, we enhanced a time series deep learning architecture, long short-term memory (LSTM), with a novel data integration (DI) kernel to assimilate the most recent SMAP observations as soon as they become available. The kernel is adaptive in that it can accommodate irregular observational schedules. Testing over the CONUS, this NRT forecast product showcases predictions with unprecedented accuracy when evaluated against subsequent SMAP retrievals. It showed smaller error than NRT forecasts reported in the literature, especially at longer forecast latency. The comparative advantage was due to LSTM’s structural improvements, as well as its ability to utilize more input variables and more training data. The DI-LSTM was compared to the original LSTM model that runs without data integration, referred to as the projection model here. We found that the DI procedure removed the autocorrelated effects of forcing errors and errors due to processes not represented in the inputs, for example, irrigation and floodplain/lake inundation, as well as mismatches due to unseen forcing conditions. The effects of this purely data-driven DI kernel are discussed for the first time in the geosciences. Furthermore, this work presents an upper-bound estimate for the random component of the SMAP retrieval error. 
    more » « less
  2. Abstract. Climate change threatens our ability to grow food for an ever-increasing population. There is aneed for high-quality soil moisture predictions in under-monitored regionslike Africa. However, it is unclear if soil moisture processes are globallysimilar enough to allow our models trained on available in situ data tomaintain accuracy in unmonitored regions. We present a multitask longshort-term memory (LSTM) model that learns simultaneously from globalsatellite-based data and in situ soil moisture data. This model is evaluated inboth random spatial holdout mode and continental holdout mode (trained onsome continents, tested on a different one). The model compared favorably tocurrent land surface models, satellite products, and a candidate machinelearning model, reaching a global median correlation of 0.792 for the randomspatial holdout test. It behaved surprisingly well in Africa and Australia,showing high correlation even when we excluded their sites from the trainingset, but it performed relatively poorly in Alaska where rapid changes areoccurring. In all but one continent (Asia), the multitask model in theworst-case scenario test performed better than the soil moisture activepassive (SMAP) 9 km product. Factorial analysis has shown that the LSTM model'saccuracy varies with terrain aspect, resulting in lower performance for dryand south-facing slopes or wet and north-facing slopes. This knowledgehelps us apply the model while understanding its limitations. This model isbeing integrated into an operational agricultural assistance applicationwhich currently provides information to 13 million African farmers. 
    more » « less
  3. Abstract The Consistent Artificial Intelligence (AI)-based Soil Moisture (CASM) dataset is a global, consistent, and long-term, remote sensing soil moisture (SM) dataset created using machine learning. It is based on the NASA Soil Moisture Active Passive (SMAP) satellite mission SM data and is aimed at extrapolating SMAP-like quality SM back in time using previous satellite microwave platforms. CASM represents SM in the top soil layer, and it is defined on a global 25 km EASE-2 grid and for 2002–2020 with a 3-day temporal resolution. The seasonal cycle is removed for the neural network training to ensure its skill is targeted at predicting SM extremes. CASM comparison to 367 globalin-situSM monitoring sites shows a SMAP-like median correlation of 0.66. Additionally, the SM product uncertainty was assessed, and both aleatoric and epistemic uncertainties were estimated and included in the dataset. CASM dataset can be used to study a wide range of hydrological, carbon cycle, and energy processes since only a consistent long-term dataset allows assessing changes in water availability and water stress. 
    more » « less
  4. null (Ed.)
    Abstract. Plant activity in semi-arid ecosystems is largely controlled by pulses of precipitation, making them particularly vulnerable to increased aridity expected with climate change. Simple bucket-model hydrology schemes in land surface models (LSMs) have had limited ability in accurately capturing semi-arid water stores and fluxes. Recent, more complex, LSM hydrology models have not been widely evaluated against semi-arid ecosystem in situ data. We hypothesize that the failure of older LSM versions to represent evapotranspiration, ET, in arid lands is because simple bucket models do not capture realistic fluctuations in upper layer soil moisture. We therefore predict that including a discretized soil hydrology scheme based on a mechanistic description of moisture diffusion will result in an improvement in model ET when compared to data because the temporal variability of upper layer soil moisture content better corresponds to that of precipitation inputs. To test this prediction, we compared ORCHIDEE LSM simulations from (1) a simple conceptual 2-layer bucket scheme with fixed hydrological parameters; and (2) a 11-layer discretized mechanistic scheme of moisture diffusion in unsaturated soil based on Richards equations against daily and monthly soil moisture and ET observations, together with data-derived transpiration / evaporation, T / ET, ratios, from six semi-arid grass, shrub and forest sites in the southwestern USA. The 11-layer scheme also has modified calculations of surface runoff, bare soil evaporation, and water limitation to be compatible with the more complex hydrology configuration. To diagnose remaining discrepancies in the 11-layer model, we tested two further configurations: (i) the addition of a term that captures bare soil evaporation resistance to dry soil; and (ii) reduced bare soil fraction. We found that the more mechanistic 11-layer model results better representation of the daily and monthly ET observations. We show that is likely because of improved simulation of soil moisture in the upper layers of soil (top 5 cm). Some discrepancies between observed and modelled soil moisture and ET may allow us to prioritize future model development. Adding a soil resistance term generally decreased simulated E and increased soil moisture content, thus increasing T and T / ET ratios and reducing the negative T / ET model-data bias. By reducing the bare soil fraction in the model, we illustrated that modelled leaf T is too low at sparsely vegetated sites. We conclude that a discretized soil hydrology scheme and associated developments improves estimates of ET by allowing the model to more closely match the pulse precipitation dynamics of these semi-arid ecosystems; however, the partitioning of T from bare soil evaporation is not solved by this modification alone. 
    more » « less
  5. Estimation of evapotranspiration and recharge flux are fundamental to sustainable water resource management. These fluxes provide valuable insights for decision-makers, enabling them to implement effective strategies that balance water demand with available resources, promote resilience in the face of climate change, and ensure the long-term sustainability of water ecosystems. In-situ observations of evapotranspiration and recharge are scarce and not representative of large areas. An observation driven variational data assimilation system, named LIDA-2 (Land Integrated Data Assimilation framework) is developed to estimate the key parameters (evaporative fraction, bulk heat transfer coefficient, Brooks-Corey parameter) of evapotranspiration and recharge fluxes by assimilating GOES land surface temperature (LST) and SMAP surface soil moisture observations into a coupled water and dual- source energy balance model. Second order information is used to estimate the uncertainty and guide the model toward a well-posed estimation problem. The algorithm is implemented in part of the US southern great plain, and its performance is evaluated through comparison tests, uncertainty analysis and consistency test. Soil moisture and evapotranspiration estimations are validated against in-situ observations. The spatial pattern of estimated annual recharge map is in good agreement with maps from literature. Overall, the VDA based framework demonstrated its efficacy to do largescale mapping of recharge, and evapotranspiration. 
    more » « less