Streamflow prediction is a long‐standing hydrologic problem. Development of models for streamflow prediction often requires incorporation of catchment physical descriptors to characterize the associated complex hydrological processes. Across different scales of catchments, these physical descriptors also allow models to extrapolate hydrologic information from one catchment to others, a process referred to as “regionalization”. Recently, in gauged basin scenarios, deep learning models have been shown to achieve state of the art regionalization performance by building a global hydrologic model. These models predict streamflow given catchment physical descriptors and weather forcing data. However, these physical descriptors are by their nature uncertain, sometimes incomplete, or even unavailable in certain cases, which limits the applicability of this approach. In this paper, we show that by assigning a vector of random values as a surrogate for catchment physical descriptors, we can achieve robust regionalization performance under a gauged prediction scenario. Our results show that the deep learning model using our proposed random vector approach achieves a predictive performance comparable to that of the model using actual physical descriptors. The random vector approach yields robust performance under different data sparsity scenarios and deep learning model selections. Furthermore, based on the use of random vectors, high‐dimensional characterization improves regionalization performance in gauged basin scenario when physical descriptors are uncertain, or insufficient.
more » « less- NSF-PAR ID:
- 10373035
- Publisher / Repository:
- DOI PREFIX: 10.1029
- Date Published:
- Journal Name:
- Water Resources Research
- Volume:
- 58
- Issue:
- 8
- ISSN:
- 0043-1397
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Machine Learning is beginning to provide state-of-the-art performance in a range of environmental applications such as streamflow prediction in a hydrologic basin. However, building accurate broad-scale models for streamflow remains challenging in practice due to the variability in the dominant hydrologic processes, which are best captured by sets of process-related basin characteristics. Existing basin characteristics suffer from noise and uncertainty, among many other things, which adversely impact model performance. To tackle the above challenges, in this paper, we propose a novel Knowledge-guided Self-Supervised Learning (KGSSL) inverse framework to extract system characteristics from driver(input) and response(output) data. This first-of-its-kind framework achieves robust performance even when characteristics are corrupted or missing. We evaluate the KGSSL framework in the context of stream flow modeling using CAMELS (Catchment Attributes and MEteorology for Large-sample Studies) which is a widely used hydrology benchmark dataset. Specifically, KGSSL outperforms baseline by 16% in predicting missing characteristics. Furthermore, in the context of forward modelling, KGSSL inferred characteristics provide a 35% improvement in performance over a standard baseline when the static characteristic are unknown.more » « less
-
Abstract. As a genre of physics-informed machine learning, differentiable process-based hydrologic models (abbreviated as δ or delta models) with regionalized deep-network-based parameterization pipelines were recently shown to provide daily streamflow prediction performance closely approaching that of state-of-the-art long short-term memory (LSTM) deep networks. Meanwhile, δ models provide a full suite of diagnostic physical variables and guaranteed mass conservation. Here, we ran experiments to test (1) their ability to extrapolate to regions far from streamflow gauges and (2) their ability to make credible predictions of long-term (decadal-scale) change trends. We evaluated the models based on daily hydrograph metrics (Nash–Sutcliffe model efficiency coefficient, etc.) and predicted decadal streamflow trends. For prediction in ungauged basins (PUB; randomly sampled ungauged basins representing spatial interpolation), δ models either approached or surpassed the performance of LSTM in daily hydrograph metrics, depending on the meteorological forcing data used. They presented a comparable trend performance to LSTM for annual mean flow and high flow but worse trends for low flow. For prediction in ungauged regions (PUR; regional holdout test representing spatial extrapolation in a highly data-sparse scenario), δ models surpassed LSTM in daily hydrograph metrics, and their advantages in mean and high flow trends became prominent. In addition, an untrained variable, evapotranspiration, retained good seasonality even for extrapolated cases. The δ models' deep-network-based parameterization pipeline produced parameter fields that maintain remarkably stable spatial patterns even in highly data-scarce scenarios, which explains their robustness. Combined with their interpretability and ability to assimilate multi-source observations, the δ models are strong candidates for regional and global-scale hydrologic simulations and climate change impact assessment.more » « less
-
Computational prediction of crystal materials properties can help to do large-scale insiliconscreening. Recent studies of material informatics have focused on expert design of multidimensionalinterpretable material descriptors/features. However, successes of deep learning suchas Convolutional Neural Networks (CNN) in image recognition and speech recognition havedemonstrated their automated feature extraction capability to effectively capture the characteristicsof the data and achieve superior prediction performance. Here, we propose CNN-OFM-Magpie, aCNN model with OFM (Orbital-field Matrix) and Magpie descriptors to predict the formationenergy of 4030 crystal material by exploiting the complementarity of two-dimensional OFM featuresand Magpie features. Experiments showed that our method achieves better performance thanconventional regression algorithms such as support vector machines and Random Forest. It is alsobetter than CNN models using only the OFM features, the Magpie features, or the basic one-hotencodings. This demonstrates the advantages of CNN and feature fusion for materials propertyprediction. Finally, we visualized the two-dimensional OFM descriptors and analyzed the featuresextracted by the CNN to obtain greater understanding of the CNN-OFM model.more » « less
-
Thenkabail, Prasad S. (Ed.)
Physically based hydrologic models require significant effort and extensive information for development, calibration, and validation. The study explored the use of the random forest regression (RFR), a supervised machine learning (ML) model, as an alternative to the physically based Soil and Water Assessment Tool (SWAT) for predicting streamflow in the Rio Grande Headwaters near Del Norte, a snowmelt-dominated mountainous watershed of the Upper Rio Grande Basin. Remotely sensed data were used for the random forest machine learning analysis (RFML) and RStudio for data processing and synthesizing. The RFML model outperformed the SWAT model in accuracy and demonstrated its capability in predicting streamflow in this region. We implemented a customized approach to the RFR model to assess the model’s performance for three training periods, across 1991–2010, 1996–2010, and 2001–2010; the results indicated that the model’s accuracy improved with longer training periods, implying that the model trained on a more extended period is better able to capture the parameters’ variability and reproduce streamflow data more accurately. The variable importance (i.e., IncNodePurity) measure of the RFML model revealed that the snow depth and the minimum temperature were consistently the top two predictors across all training periods. The paper also evaluated how well the SWAT model performs in reproducing streamflow data of the watershed with a conventional approach. The SWAT model needed more time and data to set up and calibrate, delivering acceptable performance in annual mean streamflow simulation, with satisfactory index of agreement (d), coefficient of determination (R2), and percent bias (PBIAS) values, but monthly simulation warrants further exploration and model adjustments. The study recommends exploring snowmelt runoff hydrologic processes, dust-driven sublimation effects, and more detailed topographic input parameters to update the SWAT snowmelt routine for better monthly flow estimation. The results provide a critical analysis for enhancing streamflow prediction, which is valuable for further research and water resource management, including snowmelt-driven semi-arid regions.
-
Abstract The behaviors and skills of models in many geosciences (e.g., hydrology and ecosystem sciences) strongly depend on spatially-varying parameters that need calibration. A well-calibrated model can reasonably propagate information from observations to unobserved variables via model physics, but traditional calibration is highly inefficient and results in non-unique solutions. Here we propose a novel differentiable parameter learning (dPL) framework that efficiently learns a global mapping between inputs (and optionally responses) and parameters. Crucially, dPL exhibits beneficial scaling curves not previously demonstrated to geoscientists: as training data increases, dPL achieves better performance, more physical coherence, and better generalizability (across space and uncalibrated variables), all with orders-of-magnitude lower computational cost. We demonstrate examples that learned from soil moisture and streamflow, where dPL drastically outperformed existing evolutionary and regionalization methods, or required only ~12.5% of the training data to achieve similar performance. The generic scheme promotes the integration of deep learning and process-based models, without mandating reimplementation.