Abstract Streamflow prediction is a long‐standing hydrologic problem. Development of models for streamflow prediction often requires incorporation of catchment physical descriptors to characterize the associated complex hydrological processes. Across different scales of catchments, these physical descriptors also allow models to extrapolate hydrologic information from one catchment to others, a process referred to as “regionalization”. Recently, in gauged basin scenarios, deep learning models have been shown to achieve state of the art regionalization performance by building a global hydrologic model. These models predict streamflow given catchment physical descriptors and weather forcing data. However, these physical descriptors are by their nature uncertain, sometimes incomplete, or even unavailable in certain cases, which limits the applicability of this approach. In this paper, we show that by assigning a vector of random values as a surrogate for catchment physical descriptors, we can achieve robust regionalization performance under a gauged prediction scenario. Our results show that the deep learning model using our proposed random vector approach achieves a predictive performance comparable to that of the model using actual physical descriptors. The random vector approach yields robust performance under different data sparsity scenarios and deep learning model selections. Furthermore, based on the use of random vectors, high‐dimensional characterization improves regionalization performance in gauged basin scenario when physical descriptors are uncertain, or insufficient.
more »
« less
This content will become publicly available on August 1, 2026
A HydroLSTM‐Based Machine‐Learning Approach to Discovering Regionalized Representations of Catchment Dynamics
Abstract Finding similarities between model parameters across different catchments has proved to be challenging. Existing approaches struggle due to catchment heterogeneity and non‐linear dynamics. In particular, attempts to correlate catchment attributes with hydrological responses have failed due to interdependencies among variables and consequent equifinality. Machine Learning (ML), particularly the Long Short‐Term Memory (LSTM) approach, has demonstrated strong predictive and spatial regionalization performance. However, understanding the nature of the regionalization relationships remains difficult. This study proposes a novel approach to partially decouple learning the representation of (a) catchment dynamics by using theHydroLSTMarchitecture and (b) spatial regionalization relationships by using aRandom Forest(RF) clustering approach to learn the relationships between the catchment attributes and dynamics. This coupled approach, calledRegional HydroLSTM, learns a representation of “potential streamflow” using a single cell‐state, while the output gate corrects it to correspond to the temporal context of the current hydrologic regime. RF clusters mediate the relationship between catchment attributes and dynamics, allowing identification of spatially consistent hydrological regions, thereby providing insight into the factors driving spatial and temporal hydrological variability. Results suggest that by combining complementary architectures, we can enhance the interpretability of regional machine learning models in hydrology, offering a new perspective on the “catchment classification” problem. We conclude that an improved understanding of the underlying nature of hydrologic systems can be achieved by careful design of ML architectures to target the specific things we are seeking to learn from the data.
more »
« less
- PAR ID:
- 10640847
- Publisher / Repository:
- American Geophysical Union
- Date Published:
- Journal Name:
- Water Resources Research
- Volume:
- 61
- Issue:
- 8
- ISSN:
- 0043-1397
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Accurate streamflow prediction is critical for ensuring water supply and detecting floods, while also providing essential hydrological inputs for other scientific models in fields such as climate and agriculture.Recently, deep learning models have been shown to achieve state-of-the-art regionalization performance by building a global hydrologic model. These models predict streamflow given catchment physical characteristics and weather forcing data.However, these models are only focused on gauged basins and cannot adapt to ungaugaed basins, i.e., basins without training data. Prediction in Ungauged Basins (PUB) is considered one of the most important challenges in hydrology, as most basins in the United States and around the world have no observations. In this work, we propose a meta-transfer learning approach by enhancing imperfect physics equations that facilitate model adaptation. Intuitively, physical equations can often be used to regularize deep learning models to achieve robust regionalization performance under gauged scenarios, but they can be inaccurate due to the simplified representation of physics. We correct such uncertainty in physical equation by residual approximation and let these corrected equations guide the model training process. We evaluated the proposed method for predicting daily streamflow on the catchment attributes and meteorology for large-sample studies (CAMELS) dataset. The experiment results on hydrological data over 19 years demonstrate the effectiveness of the proposed method in ungauged scenarios.more » « less
-
Climate warming in alpine regions is changing patterns of water storage, a primary control on alpine plant ecology, biogeochemistry, and water supplies to lower elevations. There is an outstanding need to determine how the interacting drivers of precipitation and the critical zone (CZ) dictate the spatial pattern and time evolution of soil water storage. In this study, we developed an analytical framework that combines intensive hydrologic measurements and extensive remotely-sensed observations with statistical modeling to identify areas with similar temporal trends in soil water storage within, and predict their relationships across, a 0.26 km 2 alpine catchment in the Colorado Rocky Mountains, U.S.A. Repeat measurements of soil moisture were used to drive an unsupervised clustering algorithm, which identified six unique groups of locations ranging from predominantly dry to persistently very wet within the catchment. We then explored relationships between these hydrologic groups and multiple CZ-related indices, including snow depth, plant productivity, macro- (10 2 ->10 3 m) and microtopography (<10 0 -10 2 m), and hydrological flow paths. Finally, we used a supervised machine learning random forest algorithm to map each of the six hydrologic groups across the catchment based on distributed CZ properties and evaluated their aggregate relationships at the catchment scale. Our analysis indicated that ~40–50% of the catchment is hydrologically connected to the stream channel, lending insight into the portions of the catchment that likely dominate stream water and solute fluxes. This research expands our understanding of patch-to-catchment-scale physical controls on hydrologic and biogeochemical processes, as well as their relationships across space and time, which will inform predictive models aimed at determining future changes to alpine ecosystems.more » « less
-
Abstract Research at long‐term catchment monitoring sites has generated a great volume, variety, and velocity of data for analysis of stream water chemistry dynamics. To harness the potential of these big data and extract patterns that are indicative of underlying functional relationships, machine learning tools have advantages over traditional statistical methods, and are increasingly being applied for dimension reduction, feature extraction, and trend identification. Still, as examples of complex systems, catchments are characterized by multivariate factor interactions and equifinality that are not easily identified by most machine‐learning methods. Using dissolved organic carbon (DOC) dynamics as an illustration, we applied a new evolutionary algorithm (EA) to extract geologic, topographic, meteorologic, hydrologic, and land use attributes that were correlated to mean stream DOC concentration in forested catchments distributed across the continental United States. The EA reduced dimensionality of our attribute dataset to identify the combination of factors, and their specific value ranges, that interacted to drive membership in High or Low mean DOC clusters. High mean DOC concentrations were associated with two distinct geographic locations of variable climatic and vegetative conditions, indicating equifinality. Our findings underscore the importance of critical zone structure in mediating hydrological and biogeochemical processes to govern DOC dynamics at the catchment scale. This multi‐scale, pattern‐to‐process approach is being applied to refine hypotheses for process‐based modeling of DOC dynamics in forested headwater streams at catchment to site scales.more » « less
-
In dry summer months, stream baseflow sourced from groundwater is essential to support aquatic ecosystems and anthropogenic water use. Hydrologic signatures, or metrics describing unique features of streamflow timeseries, are useful for quantifying and predicting these valuable baseflow and groundwater storage resources across continental scales. Hydrologic signatures can be predicted based on catchment attributes summarising climate and landscape and can be used to characterise baseflow and groundwater processes that cannot be directly measured. While past watershed‐scale studies suggest that landscape attributes are important controls on baseflow and storage processes, recent regional‐to‐global scale modelling studies have instead found that landscape attributes have weaker relationships with hydrologic signatures of these processes than expected compared to climate attributes. In this study, we quantify two landscape attributes, average geologic age and the proportion of catchment area covered by wetlands. We investigate if incorporating these additional predictors into existing large‐sample attribute datasets strengthens continental‐scale, empirical relationships between landscape attributes and hydrologic signatures. We quantify 14 hydrologic signatures related to baseflow and groundwater processes in catchments across the contiguous United States, evaluate the relationships between the new catchment attributes and hydrologic signatures with correlation analysis and use the new attributes to predict hydrologic signatures with random forest models. We found that the average geologic age of catchments was a highly influential predictor of hydrologic signatures, especially for signatures describing baseflow magnitude in catchments, and had greater importance than existing attributes of the subsurface. In contrast, we found that the proportion of wetlands in catchments had limited influence on our hydrologic signature predictions. We recommend incorporating catchment geologic age into large‐sample catchment datasets to improve predictions of baseflow and storage hydrologic signatures and processes across continental scales.more » « less
An official website of the United States government
