skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A HydroLSTM‐Based Machine‐Learning Approach to Discovering Regionalized Representations of Catchment Dynamics
Abstract Finding similarities between model parameters across different catchments has proved to be challenging. Existing approaches struggle due to catchment heterogeneity and non‐linear dynamics. In particular, attempts to correlate catchment attributes with hydrological responses have failed due to interdependencies among variables and consequent equifinality. Machine Learning (ML), particularly the Long Short‐Term Memory (LSTM) approach, has demonstrated strong predictive and spatial regionalization performance. However, understanding the nature of the regionalization relationships remains difficult. This study proposes a novel approach to partially decouple learning the representation of (a) catchment dynamics by using theHydroLSTMarchitecture and (b) spatial regionalization relationships by using aRandom Forest(RF) clustering approach to learn the relationships between the catchment attributes and dynamics. This coupled approach, calledRegional HydroLSTM, learns a representation of “potential streamflow” using a single cell‐state, while the output gate corrects it to correspond to the temporal context of the current hydrologic regime. RF clusters mediate the relationship between catchment attributes and dynamics, allowing identification of spatially consistent hydrological regions, thereby providing insight into the factors driving spatial and temporal hydrological variability. Results suggest that by combining complementary architectures, we can enhance the interpretability of regional machine learning models in hydrology, offering a new perspective on the “catchment classification” problem. We conclude that an improved understanding of the underlying nature of hydrologic systems can be achieved by careful design of ML architectures to target the specific things we are seeking to learn from the data.  more » « less
Award ID(s):
2134892 1945195
PAR ID:
10640847
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
American Geophysical Union
Date Published:
Journal Name:
Water Resources Research
Volume:
61
Issue:
8
ISSN:
0043-1397
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The predictive accuracy of regional hydrologic models often varies across both time and space. Interpreting relationships between watershed characteristics, hydrologic regimes, and model performance can reveal potential areas for model improvement. In this study, we use machine learning to assess model performance of a regional hydrologic model to forecast the occurrence of streamflow drought. We demonstrate our methodology using a regional long short‐term memory (LSTM) deep learning model developed by the U.S. Geological Survey (USGS) and data from 384 streamgages across the Colorado River Basin region. Performance was assessed by clustering catchments using: (a) physical and climatological catchment attributes, and (b) streamflow drought signatures time series. We examined the association of USGS LSTM model error measures with clusters generated by both approaches to interpret meaningful spatial and temporal information about LSTM model performance. Clustering static catchment attributes identified elevation, degree of streamflow regulation, baseflow contribution, catchment aridity, and drainage area as the most influential attributes to model performance. Clustering gages by their drought signatures revealed that catchments with significant seasonal peak runoff between January and June generally exhibited better model performance. Additionally, a Random Forest classifier was trained to successfully predict LSTM model performance (F1 score of 0.72) based on physical and climatological catchment attributes. Low degree of flow regulation was identified as a key indicator of better LSTM model performance. These findings point to the opportunities for improving the USGS LSTM model performance in future hydrologic drought prediction efforts across regional and CONUS scales. 
    more » « less
  2. Accurate streamflow prediction is critical for ensuring water supply and detecting floods, while also providing essential hydrological inputs for other scientific models in fields such as climate and agriculture.Recently, deep learning models have been shown to achieve state-of-the-art regionalization performance by building a global hydrologic model. These models predict streamflow given catchment physical characteristics and weather forcing data.However, these models are only focused on gauged basins and cannot adapt to ungaugaed basins, i.e., basins without training data. Prediction in Ungauged Basins (PUB) is considered one of the most important challenges in hydrology, as most basins in the United States and around the world have no observations. In this work, we propose a meta-transfer learning approach by enhancing imperfect physics equations that facilitate model adaptation. Intuitively, physical equations can often be used to regularize deep learning models to achieve robust regionalization performance under gauged scenarios, but they can be inaccurate due to the simplified representation of physics. We correct such uncertainty in physical equation by residual approximation and let these corrected equations guide the model training process. We evaluated the proposed method for predicting daily streamflow on the catchment attributes and meteorology for large-sample studies (CAMELS) dataset. The experiment results on hydrological data over 19 years demonstrate the effectiveness of the proposed method in ungauged scenarios. 
    more » « less
  3. Climate warming in alpine regions is changing patterns of water storage, a primary control on alpine plant ecology, biogeochemistry, and water supplies to lower elevations. There is an outstanding need to determine how the interacting drivers of precipitation and the critical zone (CZ) dictate the spatial pattern and time evolution of soil water storage. In this study, we developed an analytical framework that combines intensive hydrologic measurements and extensive remotely-sensed observations with statistical modeling to identify areas with similar temporal trends in soil water storage within, and predict their relationships across, a 0.26 km 2 alpine catchment in the Colorado Rocky Mountains, U.S.A. Repeat measurements of soil moisture were used to drive an unsupervised clustering algorithm, which identified six unique groups of locations ranging from predominantly dry to persistently very wet within the catchment. We then explored relationships between these hydrologic groups and multiple CZ-related indices, including snow depth, plant productivity, macro- (10 2 ->10 3 m) and microtopography (<10 0 -10 2 m), and hydrological flow paths. Finally, we used a supervised machine learning random forest algorithm to map each of the six hydrologic groups across the catchment based on distributed CZ properties and evaluated their aggregate relationships at the catchment scale. Our analysis indicated that ~40–50% of the catchment is hydrologically connected to the stream channel, lending insight into the portions of the catchment that likely dominate stream water and solute fluxes. This research expands our understanding of patch-to-catchment-scale physical controls on hydrologic and biogeochemical processes, as well as their relationships across space and time, which will inform predictive models aimed at determining future changes to alpine ecosystems. 
    more » « less
  4. In dry summer months, stream baseflow sourced from groundwater is essential to support aquatic ecosystems and anthropogenic water use. Hydrologic signatures, or metrics describing unique features of streamflow timeseries, are useful for quantifying and predicting these valuable baseflow and groundwater storage resources across continental scales. Hydrologic signatures can be predicted based on catchment attributes summarising climate and landscape and can be used to characterise baseflow and groundwater processes that cannot be directly measured. While past watershed‐scale studies suggest that landscape attributes are important controls on baseflow and storage processes, recent regional‐to‐global scale modelling studies have instead found that landscape attributes have weaker relationships with hydrologic signatures of these processes than expected compared to climate attributes. In this study, we quantify two landscape attributes, average geologic age and the proportion of catchment area covered by wetlands. We investigate if incorporating these additional predictors into existing large‐sample attribute datasets strengthens continental‐scale, empirical relationships between landscape attributes and hydrologic signatures. We quantify 14 hydrologic signatures related to baseflow and groundwater processes in catchments across the contiguous United States, evaluate the relationships between the new catchment attributes and hydrologic signatures with correlation analysis and use the new attributes to predict hydrologic signatures with random forest models. We found that the average geologic age of catchments was a highly influential predictor of hydrologic signatures, especially for signatures describing baseflow magnitude in catchments, and had greater importance than existing attributes of the subsurface. In contrast, we found that the proportion of wetlands in catchments had limited influence on our hydrologic signature predictions. We recommend incorporating catchment geologic age into large‐sample catchment datasets to improve predictions of baseflow and storage hydrologic signatures and processes across continental scales. 
    more » « less
  5. While machine learning approaches are rapidly being applied to hydrologic problems, physics-informed approaches are still relatively rare. Many successful deep-learning applications have focused on point estimates of streamflow trained on stream gauge observations over time. While these approaches show promise for some applications, there is a need for distributed approaches that can produce accurate two-dimensional results of model states, such as ponded water depth. Here, we demonstrate a 2D emulator of the Tilted V catchment benchmark problem with solutions provided by the integrated hydrology model ParFlow. This emulator model can use 2D Convolution Neural Network (CNN), 3D CNN, and U-Net machine learning architectures and produces time-dependent spatial maps of ponded water depth from which hydrographs and other hydrologic quantities of interest may be derived. A comparison of different deep learning architectures and hyperparameters is presented with particular focus on approaches such as 3D CNN (that have a time-dependent learning component) and 2D CNN and U-Net approaches (that use only the current model state to predict the next state in time). In addition to testing model performance, we also use a simplified simulation based inference approach to evaluate the ability to calibrate the emulator to randomly selected simulations and the match between ML calibrated input parameters and underlying physics-based simulation. 
    more » « less