Abstract. As a genre of physics-informed machine learning, differentiable process-based hydrologic models (abbreviated as δ or delta models) with regionalized deep-network-based parameterization pipelines were recently shown to provide daily streamflow prediction performance closely approaching that of state-of-the-art long short-term memory (LSTM) deep networks. Meanwhile, δ models provide a full suite of diagnostic physical variables and guaranteed mass conservation. Here, we ran experiments to test (1) their ability to extrapolate to regions far from streamflow gauges and (2) their ability to make credible predictions of long-term (decadal-scale) change trends. We evaluated the models based on daily hydrograph metrics (Nash–Sutcliffe model efficiency coefficient, etc.) and predicted decadal streamflow trends. For prediction in ungauged basins (PUB; randomly sampled ungauged basins representing spatial interpolation), δ models either approached or surpassed the performance of LSTM in daily hydrograph metrics, depending on the meteorological forcing data used. They presented a comparable trend performance to LSTM for annual mean flow and high flow but worse trends for low flow. For prediction in ungauged regions (PUR; regional holdout test representing spatial extrapolation in a highly data-sparse scenario), δ models surpassed LSTM in daily hydrograph metrics, and their advantages in mean and high flow trends became prominent. In addition, an untrained variable, evapotranspiration, retained good seasonality even for extrapolated cases. The δ models' deep-network-based parameterization pipeline produced parameter fields that maintain remarkably stable spatial patterns even in highly data-scarce scenarios, which explains their robustness. Combined with their interpretability and ability to assimilate multi-source observations, the δ models are strong candidates for regional and global-scale hydrologic simulations and climate change impact assessment.
more »
« less
Mitigating Prediction Error of Deep Learning Streamflow Models in Large Data‐Sparse Regions With Ensemble Modeling and Soft Data
Abstract Predicting discharge in contiguously data‐scarce or ungauged regions is needed for quantifying the global hydrologic cycle. We show that prediction in ungauged regions (PUR) has major, underrecognized uncertainty and is drastically more difficult than previous problems where basins can be represented by neighboring or similar basins (known as prediction in ungauged basins). While deep neural networks demonstrated stellar performance for streamflow predictions, performance nonetheless declined for PUR, benchmarked here with a new stringent region‐based holdout test on a US data set with 671 basins. We tested approaches to reduce such errors, leveraging deep network's flexibility to integrate “soft” data, such as satellite‐based soil moisture product, or daily flow distributions which improved low flow simulations. A novel input‐selection ensemble improved average performance and greatly reduced catastrophic failures. Despite challenges, deep networks showed stronger performance metrics for PUR than traditional hydrologic models. They appear competitive for geoscientific modeling even in data‐scarce settings.
more »
« less
- Award ID(s):
- 1832294
- PAR ID:
- 10367736
- Publisher / Repository:
- DOI PREFIX: 10.1029
- Date Published:
- Journal Name:
- Geophysical Research Letters
- Volume:
- 48
- Issue:
- 14
- ISSN:
- 0094-8276
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Recent observations with varied schedules and types (moving average, snapshot, or regularly spaced) can help to improve streamflow forecasts, but it is challenging to integrate them effectively. Based on a long short‐term memory (LSTM) streamflow model, we tested multiple versions of a flexible procedure we call data integration (DI) to leverage recent discharge measurements to improve forecasts. DI accepts lagged inputs either directly or through a convolutional neural network unit. DI ubiquitously elevated streamflow forecast performance to unseen levels, reaching a record continental‐scale median Nash‐Sutcliffe Efficiency coefficient value of 0.86. Integrating moving‐average discharge, discharge from the last few days, or even average discharge from the previous calendar month could all improve daily forecasts. Directly using lagged observations as inputs was comparable in performance to using the convolutional neural network unit. Importantly, we obtained valuable insights regarding hydrologic processes impacting LSTM and DI performance. Before applying DI, the base LSTM model worked well in mountainous or snow‐dominated regions, but less well in regions with low discharge volumes (due to either low precipitation or high precipitation‐energy synchronicity) and large interannual storage variability. DI was most beneficial in regions with high flow autocorrelation: it greatly reduced baseflow bias in groundwater‐dominated western basins and also improved peak prediction for basins with dynamical surface water storage, such as the Prairie Potholes or Great Lakes regions. However, even DI cannot elevate performance in high‐aridity basins with 1‐day flash peaks. Despite this limitation, there is much promise for a deep‐learning‐based forecast paradigm due to its performance, automation, efficiency, and flexibility.more » « less
-
Accurate streamflow prediction is critical for ensuring water supply and detecting floods, while also providing essential hydrological inputs for other scientific models in fields such as climate and agriculture.Recently, deep learning models have been shown to achieve state-of-the-art regionalization performance by building a global hydrologic model. These models predict streamflow given catchment physical characteristics and weather forcing data.However, these models are only focused on gauged basins and cannot adapt to ungaugaed basins, i.e., basins without training data. Prediction in Ungauged Basins (PUB) is considered one of the most important challenges in hydrology, as most basins in the United States and around the world have no observations. In this work, we propose a meta-transfer learning approach by enhancing imperfect physics equations that facilitate model adaptation. Intuitively, physical equations can often be used to regularize deep learning models to achieve robust regionalization performance under gauged scenarios, but they can be inaccurate due to the simplified representation of physics. We correct such uncertainty in physical equation by residual approximation and let these corrected equations guide the model training process. We evaluated the proposed method for predicting daily streamflow on the catchment attributes and meteorology for large-sample studies (CAMELS) dataset. The experiment results on hydrological data over 19 years demonstrate the effectiveness of the proposed method in ungauged scenarios.more » « less
-
Accurate hydrologic modeling is vital to characterizing how the terrestrial water cycle responds to climate change. Pure deep learning (DL) models have been shown to outperform process-based ones while remaining difficult to interpret. More recently, differentiable physics-informed machine learning models with a physical backbone can systematically integrate physical equations and DL, predicting untrained variables and processes with high performance. However, it is unclear if such models are competitive for global-scale applications with a simple backbone. Therefore, we use – for the first time at this scale – differentiable hydrologic models (full name δHBV-globe1.0-hydroDL, shortened to δHBV here) to simulate the rainfall–runoff processes for 3753 basins around the world. Moreover, we compare the δHBV models to a purely data-driven long short-term memory (LSTM) model to examine their strengths and limitations. Both LSTM and the δHBV models provide competitive daily hydrologic simulation capabilities in global basins, with median Kling–Gupta efficiency values close to or higher than 0.7 (and 0.78 with LSTM for a subset of 1675 basins with long-term discharge records), significantly outperforming traditional models. Moreover, regionalized differentiable models demonstrated stronger spatial generalization ability (median KGE 0.64) than a traditional parameter regionalization approach (median KGE 0.46) and even LSTM for ungauged region tests across continents. Nevertheless, relative to LSTM, the differentiable model was hampered by structural deficiencies for cold or polar regions, highly arid regions, and basins with significant human impacts. This study also sets the benchmark for hydrologic estimates around the world and builds a foundation for improving global hydrologic simulations.more » « less
-
Abstract Floods cause hundreds of fatalities and billions of dollars of economic loss each year in the United States. To mitigate these damages, accurate flood prediction is needed for issuing early warnings to the public. This situation is exacerbated in larger model domains flood prediction, particularly in ungauged basins. To improve flood prediction for both gauged and ungauged basins, we propose a spatio‐temporal hierarchical model (STHM) using above‐normal flow estimation with a 10‐day window of modeled National Water Model (NWM) streamflow and a variety of catchment characteristics as input. The STHM is calibrated (1993–2008) and validated (2009–2018) in controlled, natural, and coastal basins over three broad groups, and shows significant improvement for the first two basin types. A seasonal analysis shows the most influential predictors beyond NWM streamflow reanalysis are the previous 3‐day average streamflow and the aridity index for controlled and natural basins, respectively. To evaluate the STHM in improving above‐normal streamflow in ungauged basins, 20‐fold cross‐validation is performed by leaving 5% of sites. Results show that the STHM increases predictive skill in over 50% of sites' by 0.1 Nash‐Sutcliffe efficiency (NSE) and improves over 65% of sites' streamflow prediction to an NSE > 0.67, which demonstrates that the STHM is one of the first of its kind and could be employed for flood prediction in both gauged and ungauged basins.more » « less
An official website of the United States government
