Abstract As Deep Neural Networks (DNNs) are being increasingly employed to make important simulations in rainfall‐runoff contexts, the demand for interpretability is increasing in the hydrology community. Interpretability is not just a scientific question, but rather knowing where the models fall flat, how to fix them, and how to explain their outcomes to scientific communities so that everyone understands how the model arrives at specific simulations This paper addresses these challenges by deciphering interpretable probabilistic DNNs utilizing the Deep Autoregressive Recurrent (DeepAR) and Temporal Fusion Transformer (TFT) for daily streamflow simulation across the continental United States (CONUS). We benchmarked TFT and DeepAR against conceptual to physics‐based hydrologic models. In this setting, catchment physical attributes were incorporated into the training process to create physics‐guided TFT and DeepAR configurations. Our proposed physics‐guided configurations are also designed to aggregate the patterns across the entire data set, analyze the sensitivity of key catchment physical attributes and facilitate the interpretability of temporal dynamics in rainfall‐runoff generation mechanisms. To assess the uncertainty, the modeling configurations were coupled with a quantile regression by adding Gaussian noise with increasing standard deviation to the individual catchment attributes. Analysis suggested that the physics‐guided TFT was superior in predicting daily streamflow compared to the original TFT and DeepAR as well as benchmark hydrologic models. Predictive uncertainty intervals effectively bracketed most of the observational data by simultaneous simulation of various percentiles (e.g., 10th, 50th, and 90th). Interpretable physics‐guided TFT proved to be a strong candidate for CONUS daily streamflow simulations.
more »
« less
Generating interpretable rainfall-runoff models automatically from data
A sudden surge of data has created new challenges in water management, spanning quality control, assimilation, and analysis. Few approaches are available to integrate growing volumes of data into interpretable results. Process-based hydrologic models have not been designed to consume large amounts of data. Alternatively, new machine learning tools can automate data analysis and forecasting, but their lack of interpretability and reliance on very large data sets limits the discovery of insights and may impact trust. To address this gap, we present a new approach, which seeks to strike a middle ground between process-, and data-based modeling. The contribution of this work is an automated and scalable methodology that discovers differential equations and latent state estimations within hydrologic systems using only rainfall and runoff measurements. We show how this enables automated tools to learn interpretable models of 6 to 18 parameters solely from measurements. We apply this approach to nearly 400 stream gaging sites across the US, showing how complex catchment dynamics can be reconstructed solely from rainfall and runoff measurements. We also show how the approach discovers surrogate models that can replicate the dynamics of a much more complex process-based model, but at a fraction of the computational complexity. We discuss how the resulting representation of watershed dynamics provides insight and computational efficiency to enable automated predictions across large sensor networks.
more »
« less
- Award ID(s):
- 1750744
- PAR ID:
- 10540229
- Publisher / Repository:
- Elsevier
- Date Published:
- Journal Name:
- Advances in Water Resources
- Volume:
- 193
- Issue:
- C
- ISSN:
- 0309-1708
- Page Range / eLocation ID:
- 104796
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Large-scale hydrologic models are increasingly being developed for operational use in the forecasting and planning of water resources. However, the predictive strength of such models depends on how well they resolve various functions of catchment hydrology, which are influenced by gradients in climate, topography, soils, and land use. Most assessments of hydrologic model uncertainty have been limited to traditional statistical methods. Here, we present a proof-of-concept approach that uses interpretable machine learning techniques to provide post hoc assessment of model sensitivity and process deficiency in hydrologic models. We train a random forest model to predict the Kling–Gupta efficiency (KGE) of National Water Model (NWM) and National Hydrologic Model (NHM) streamflow predictions for 4383 stream gauges in the conterminous United States. Thereafter, we explain the local and global controls that 48 catchment attributes exert on KGE prediction using interpretable Shapley values. Overall, we find that soil water content is the most impactful feature controlling successful model performance, suggesting that soil water storage is difficult for hydrologic models to resolve, particularly for arid locations. We identify nonlinear thresholds beyond which predictive performance decreases for NWM and NHM. For example, soil water content less than 210 mm, precipitation less than 900 mm yr−1, road density greater than 5 km km−2, and lake area percent greater than 10 % contributed to lower KGE values. These results suggest that improvements in how these influential processes are represented could result in the largest increases in NWM and NHM predictive performance. This study demonstrates the utility of interrogating process-based models using data-driven techniques, which has broad applicability and potential for improving the next generation of large-scale hydrologic models.more » « less
-
null (Ed.)Abstract. In the past decades, data-driven machine-learning (ML) models have emerged as promising tools for short-term streamflow forecasting. Among other qualities, the popularity of ML models for such applications is due to their relative ease in implementation, less strict distributional assumption, and competitive computational and predictive performance. Despite the encouraging results, most applications of ML for streamflow forecasting have been limited to watersheds in which rainfall is the major source of runoff. In this study, we evaluate the potential of random forests (RFs), a popular ML method, to make streamflow forecasts at 1 d of lead time at 86 watersheds in the Pacific Northwest. These watersheds cover diverse climatic conditions and physiographic settings and exhibit varied contributions of rainfall and snowmelt to their streamflow. Watersheds are classified into three hydrologic regimes based on the timing of center-of-annual flow volume: rainfall-dominated, transient, and snowmelt-dominated. RF performance is benchmarked against naïve and multiple linear regression (MLR) models and evaluated using four criteria: coefficient of determination, root mean squared error, mean absolute error, and Kling–Gupta efficiency (KGE). Model evaluation scores suggest that the RF performs better in snowmelt-driven watersheds compared to rainfall-driven watersheds. The largest improvements in forecasts compared to benchmark models are found among rainfall-driven watersheds. RF performance deteriorates with increases in catchment slope and soil sandiness. We note disagreement between two popular measures of RF variable importance and recommend jointly considering these measures with the physical processes under study. These and other results presented provide new insights for effective application of RF-based streamflow forecasting.more » « less
-
Accurate hydrologic modeling is vital to characterizing how the terrestrial water cycle responds to climate change. Pure deep learning (DL) models have been shown to outperform process-based ones while remaining difficult to interpret. More recently, differentiable physics-informed machine learning models with a physical backbone can systematically integrate physical equations and DL, predicting untrained variables and processes with high performance. However, it is unclear if such models are competitive for global-scale applications with a simple backbone. Therefore, we use – for the first time at this scale – differentiable hydrologic models (full name δHBV-globe1.0-hydroDL, shortened to δHBV here) to simulate the rainfall–runoff processes for 3753 basins around the world. Moreover, we compare the δHBV models to a purely data-driven long short-term memory (LSTM) model to examine their strengths and limitations. Both LSTM and the δHBV models provide competitive daily hydrologic simulation capabilities in global basins, with median Kling–Gupta efficiency values close to or higher than 0.7 (and 0.78 with LSTM for a subset of 1675 basins with long-term discharge records), significantly outperforming traditional models. Moreover, regionalized differentiable models demonstrated stronger spatial generalization ability (median KGE 0.64) than a traditional parameter regionalization approach (median KGE 0.46) and even LSTM for ungauged region tests across continents. Nevertheless, relative to LSTM, the differentiable model was hampered by structural deficiencies for cold or polar regions, highly arid regions, and basins with significant human impacts. This study also sets the benchmark for hydrologic estimates around the world and builds a foundation for improving global hydrologic simulations.more » « less
-
Abstract Hydrologic signatures are quantitative metrics that describe a streamflow time series. Examples include annual maximum flow, baseflow index and recession shape descriptors. In this paper, we use machine learning (ML) to learn encodings that are optimal ML equivalents of hydrologic signatures, and that are derived directly from the data. We compare the learned signatures to classical signatures, interpret their meaning, and use them to build rainfall‐runoff models in otherwise ungauged watersheds. Our model has an encoder–decoder structure. The encoder is a convolutional neural net mapping historical flow and climate data to a low‐dimensional vector encoding, analogous to hydrological signatures. The decoder structure includes stores and fluxes similar to a classical hydrologic model. For each timestep, the decoder uses current climate data, watershed attributes and the encoding to predict coefficients that distribute precipitation between stores and store outflow coefficients. The model is trained end‐to‐end on the U.S. CAMELS watershed data set to minimize streamflow error. We show that learned signatures can extract new information from streamflow series, because using learned signatures as input to the process‐informed model improves prediction accuracy over benchmark configurations that use classical signatures or no signatures. We interpret learned signatures by correlation with classical signatures, and by using sensitivity analysis to assess their impact on modeled store dynamics. Learned signatures are spatially correlated and relate to streamflow dynamics including seasonality, high and low extremes, baseflow and recessions. We conclude that process‐informed ML models and other applications using hydrologic signatures may benefit from replacing expert‐selected signatures with learned signatures.more » « less
An official website of the United States government

