Large-scale hydrologic models are increasingly being developed for operational use in the forecasting and planning of water resources. However, the predictive strength of such models depends on how well they resolve various functions of catchment hydrology, which are influenced by gradients in climate, topography, soils, and land use. Most assessments of hydrologic model uncertainty have been limited to traditional statistical methods. Here, we present a proof-of-concept approach that uses interpretable machine learning techniques to provide post hoc assessment of model sensitivity and process deficiency in hydrologic models. We train a random forest model to predict the Kling–Gupta efficiency (KGE) of National Water Model (NWM) and National Hydrologic Model (NHM) streamflow predictions for 4383 stream gauges in the conterminous United States. Thereafter, we explain the local and global controls that 48 catchment attributes exert on KGE prediction using interpretable Shapley values. Overall, we find that soil water content is the most impactful feature controlling successful model performance, suggesting that soil water storage is difficult for hydrologic models to resolve, particularly for arid locations. We identify nonlinear thresholds beyond which predictive performance decreases for NWM and NHM. For example, soil water content less than 210 mm, precipitation less than 900 mm yr−1, road density greater than 5 km km−2, and lake area percent greater than 10 % contributed to lower KGE values. These results suggest that improvements in how these influential processes are represented could result in the largest increases in NWM and NHM predictive performance. This study demonstrates the utility of interrogating process-based models using data-driven techniques, which has broad applicability and potential for improving the next generation of large-scale hydrologic models. 
                        more » 
                        « less   
                    This content will become publicly available on November 1, 2025
                            
                            Generating interpretable rainfall-runoff models automatically from data
                        
                    
    
            A sudden surge of data has created new challenges in water management, spanning quality control, assimilation, and analysis. Few approaches are available to integrate growing volumes of data into interpretable results. Process-based hydrologic models have not been designed to consume large amounts of data. Alternatively, new machine learning tools can automate data analysis and forecasting, but their lack of interpretability and reliance on very large data sets limits the discovery of insights and may impact trust. To address this gap, we present a new approach, which seeks to strike a middle ground between process-, and data-based modeling. The contribution of this work is an automated and scalable methodology that discovers differential equations and latent state estimations within hydrologic systems using only rainfall and runoff measurements. We show how this enables automated tools to learn interpretable models of 6 to 18 parameters solely from measurements. We apply this approach to nearly 400 stream gaging sites across the US, showing how complex catchment dynamics can be reconstructed solely from rainfall and runoff measurements. We also show how the approach discovers surrogate models that can replicate the dynamics of a much more complex process-based model, but at a fraction of the computational complexity. We discuss how the resulting representation of watershed dynamics provides insight and computational efficiency to enable automated predictions across large sensor networks. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1750744
- PAR ID:
- 10540229
- Publisher / Repository:
- Elsevier
- Date Published:
- Journal Name:
- Advances in Water Resources
- Volume:
- 193
- Issue:
- C
- ISSN:
- 0309-1708
- Page Range / eLocation ID:
- 104796
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            null (Ed.)Abstract. In the past decades, data-driven machine-learning (ML) models have emerged as promising tools for short-term streamflow forecasting. Among other qualities, the popularity of ML models for such applications is due to their relative ease in implementation, less strict distributional assumption, and competitive computational and predictive performance. Despite the encouraging results, most applications of ML for streamflow forecasting have been limited to watersheds in which rainfall is the major source of runoff. In this study, we evaluate the potential of random forests (RFs), a popular ML method, to make streamflow forecasts at 1 d of lead time at 86 watersheds in the Pacific Northwest. These watersheds cover diverse climatic conditions and physiographic settings and exhibit varied contributions of rainfall and snowmelt to their streamflow. Watersheds are classified into three hydrologic regimes based on the timing of center-of-annual flow volume: rainfall-dominated, transient, and snowmelt-dominated. RF performance is benchmarked against naïve and multiple linear regression (MLR) models and evaluated using four criteria: coefficient of determination, root mean squared error, mean absolute error, and Kling–Gupta efficiency (KGE). Model evaluation scores suggest that the RF performs better in snowmelt-driven watersheds compared to rainfall-driven watersheds. The largest improvements in forecasts compared to benchmark models are found among rainfall-driven watersheds. RF performance deteriorates with increases in catchment slope and soil sandiness. We note disagreement between two popular measures of RF variable importance and recommend jointly considering these measures with the physical processes under study. These and other results presented provide new insights for effective application of RF-based streamflow forecasting.more » « less
- 
            Accurate hydrologic modeling is vital to characterizing how the terrestrial water cycle responds to climate change. Pure deep learning (DL) models have been shown to outperform process-based ones while remaining difficult to interpret. More recently, differentiable physics-informed machine learning models with a physical backbone can systematically integrate physical equations and DL, predicting untrained variables and processes with high performance. However, it is unclear if such models are competitive for global-scale applications with a simple backbone. Therefore, we use – for the first time at this scale – differentiable hydrologic models (full name δHBV-globe1.0-hydroDL, shortened to δHBV here) to simulate the rainfall–runoff processes for 3753 basins around the world. Moreover, we compare the δHBV models to a purely data-driven long short-term memory (LSTM) model to examine their strengths and limitations. Both LSTM and the δHBV models provide competitive daily hydrologic simulation capabilities in global basins, with median Kling–Gupta efficiency values close to or higher than 0.7 (and 0.78 with LSTM for a subset of 1675 basins with long-term discharge records), significantly outperforming traditional models. Moreover, regionalized differentiable models demonstrated stronger spatial generalization ability (median KGE 0.64) than a traditional parameter regionalization approach (median KGE 0.46) and even LSTM for ungauged region tests across continents. Nevertheless, relative to LSTM, the differentiable model was hampered by structural deficiencies for cold or polar regions, highly arid regions, and basins with significant human impacts. This study also sets the benchmark for hydrologic estimates around the world and builds a foundation for improving global hydrologic simulations.more » « less
- 
            Abstract Hydrologic signatures are quantitative metrics that describe a streamflow time series. Examples include annual maximum flow, baseflow index and recession shape descriptors. In this paper, we use machine learning (ML) to learn encodings that are optimal ML equivalents of hydrologic signatures, and that are derived directly from the data. We compare the learned signatures to classical signatures, interpret their meaning, and use them to build rainfall‐runoff models in otherwise ungauged watersheds. Our model has an encoder–decoder structure. The encoder is a convolutional neural net mapping historical flow and climate data to a low‐dimensional vector encoding, analogous to hydrological signatures. The decoder structure includes stores and fluxes similar to a classical hydrologic model. For each timestep, the decoder uses current climate data, watershed attributes and the encoding to predict coefficients that distribute precipitation between stores and store outflow coefficients. The model is trained end‐to‐end on the U.S. CAMELS watershed data set to minimize streamflow error. We show that learned signatures can extract new information from streamflow series, because using learned signatures as input to the process‐informed model improves prediction accuracy over benchmark configurations that use classical signatures or no signatures. We interpret learned signatures by correlation with classical signatures, and by using sensitivity analysis to assess their impact on modeled store dynamics. Learned signatures are spatially correlated and relate to streamflow dynamics including seasonality, high and low extremes, baseflow and recessions. We conclude that process‐informed ML models and other applications using hydrologic signatures may benefit from replacing expert‐selected signatures with learned signatures.more » « less
- 
            Abstract Hydrologic connectivity refers to the processes and thresholds leading to water transport across a landscape. In dryland ecosystems, runoff production is mediated by the arrangement of vegetation and bare soil patches on hillslopes and the properties of ephemeral channels. In this study, we used runoff measurements at multiple scales in a small (4.67 ha) mixed shrubland catchment of the Chihuahuan Desert to identify controls on and thresholds of hillslope‐channel connectivity. By relating short‐ and long‐term hydrologic records, we also addressed whether observed changes in outlet discharge since 1977 were linked to modifications in hydrologic connectivity. Hillslope runoff production was controlled by the maximum rainfall intensity occurring in a 30‐min interval (I30), with small‐to‐negligible effects of antecedent surface soil moisture, vegetation cover, or slope aspect. AnI30threshold of nearly 10 mm/h activated runoff propagation from the shrubland hillslopes and through the main ephemeral channel, whereas anI30threshold of about 16 mm/h was required for discharge from the catchment outlet. Since storms rarely exceedI30, full hillslope‐channel connectivity occurs infrequently in the mixed shrubland, leading to <2% of the annual precipitation being converted into outlet discharge. Progressive decreases in outlet discharge since 1977 could not be explained by variations in precipitation metrics, includingI30, or the process of woody plant encroachment. Instead, channel modifications from the buildup of sediment behind measurement flumes may have increased transmission losses and reduced outlet discharge. Thus, alterations in channel properties can play an important role in the long‐term (45‐year) variations of rainfall–runoff dynamics of small desert catchments.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
