skip to main content

Title: Coupled machine learning and the limits of acceptability approach applied in parameter identification for a distributed hydrological model
Abstract. Monte Carlo (MC) methods have been widely used in uncertainty analysis and parameter identification for hydrological models. The main challenge with these approaches is, however, the prohibitive number of model runs required to acquire an adequate sample size, which may take from days to months – especially when the simulations are run in distributed mode. In the past, emulators have been used to minimize the computational burden of the MC simulation through direct estimation of the residual-based response surfaces. Here, we apply emulators of an MC simulation in parameter identification for a distributed conceptual hydrological model using two likelihood measures, i.e. the absolute bias of model predictions (Score) and another based on the time-relaxed limits of acceptability concept (pLoA). Three machine-learning models (MLMs) were built using model parameter sets and response surfaces with a limited number of model realizations (4000). The developed MLMs were applied to predict pLoA and Score for a large set of model parameters (95 000). The behavioural parameter sets were identified using a time-relaxed limits of acceptability approach, based on the predicted pLoA values, and applied to estimate the quantile streamflow predictions weighted by their respective Score. The three MLMs were able to adequately mimic the response surfaces directly estimated from MC simulations with an R2 value of 0.7 to 0.92. Similarly, the models identified using the coupled machine-learning (ML) emulators and limits of acceptability approach have performed very well in reproducing the median streamflow prediction during the calibration and validation periods, with an average Nash–Sutcliffe efficiency value of 0.89 and 0.83, respectively.  more » « less
Award ID(s):
2013047 1713901
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Hydrology and Earth System Sciences
Page Range / eLocation ID:
4641 to 4658
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The Arctic hydrological system is an interconnected system that is experiencing rapid change. It is comprised of permafrost, snow, glacier, frozen soils, and inland river systems. In this study, we aim to lower the barrier of using complex land models in regional applications by developing a generalizable optimization methodology and workflow for the Community Terrestrial Systems Model (CTSM), to move them toward a more Actionable Science paradigm. Further end‐user engagement is required to make science such as this “fully actionable.” We applied CTSM across Alaska and the Yukon River Basin at 4‐km spatial resolution. We highlighted several potentially useful high‐resolution CTSM configuration changes. Additionally, we performed a multi‐objective optimization using snow and river flow metrics within an adaptive surrogate‐based model optimization scheme. Four representative river basins across our study domain were selected for optimization based on observed streamflow and snow water equivalent observations at 10 SNOTEL sites. Fourteen sensitive parameters were identified for optimization with half of them not directly related to hydrology or snow processes. Across fifteen out‐of‐sample river basins, 13 had improved flow simulations after optimization and the mean Kling‐Gupta Efficiency of daily flow increased from 0.43 to 0.63 in a 30‐year evaluation. In addition, we adapted the Shapley Decomposition to disentangle each parameter's contribution to streamflow performance changes, with the seven non‐hydrological parameters providing a non‐negligible contribution to performance gains. The snow simulation had limited improvement, likely because snow simulation is influenced more by meteorological forcing than model parameter choices.

    more » « less
  2. Abstract

    Accurate soil moisture and streamflow data are an aspirational need of many hydrologically relevant fields. Model simulated soil moisture and streamflow hold promise but models require validation prior to application. Calibration methods are commonly used to improve model fidelity but misrepresentation of the true dynamics remains a challenge. In this study, we leverage soil parameter estimates from the Soil Survey Geographic (SSURGO) database and the probability mapping of SSURGO (POLARIS) to improve the representation of hydrologic processes in the Weather Research and Forecasting Hydrological modeling system (WRF‐Hydro) over a central California domain. Our results show WRF‐Hydro soil moisture exhibits increased correlation coefficients (r), reduced biases, and increased Kling‐Gupta Efficiencies (KGEs) across seven in situ soil moisture observing stations after updating the model's soil parameters according to POLARIS. Compared to four well‐established soil moisture data sets including Soil Moisture Active Passive data and three Phase 2 North American Land Data Assimilation System land surface models, our POLARIS‐adjusted WRF‐Hydro simulations produce the highest mean KGE (0.69) across the seven stations. More importantly, WRF‐Hydro streamflow fidelity also increases, especially in the case where the model domain is set up with SSURGO‐informed total soil thickness. The magnitude and timing of peak flow events are better captured,rincreases across nine United States Geological Survey stream gages, and the mean KGE across seven of the nine gages increases from 0.12 to 0.66. Our pre‐calibration parameter estimate approach, which is transferable to other spatially distributed hydrological models, can substantially improve a model's performance, helping reduce calibration efforts and computational costs.

    more » « less
  3. Abstract

    This paper presents a top–down approach for soil moisture and sap flux sampling design with the goal of understanding ecohydrologic response to interannual climate variation in the rain–snow transition watersheds. The design is based on a priori estimates of soil moisture and transpiration patterns using a physical distributed model, Regional Hydro‐Ecologic Simulation System (RHESSys). RHESSys was initially calibrated with existing snow depth and streamflow data. Calibrated model estimates of seasonal trajectories of snowmelt, root‐zone soil moisture storage, and transpiration were used to develop five hydrologic similarity indicators and map these at (30 m) patch scale across the study watershed. The partitioning around medoids‐clustering algorithm was then used to define six distinctive spatially explicit clusters based on the five hydrologic similarity indictors. A representative site within each cluster was identified for sampling. For each site, soil moisture sensors were installed at the 30‐ and 90‐cm depths and at the five soil pits and a sap flux sensor at the averaged‐size white fir tree for each site. The model‐based cluster analysis suggests that the elevation gradient and topographically driven flow drainage patterns are the dominant drivers of spatial patterns of soil moisture and transpiration. The comparison of model‐based calculated hydrological similarity indicators with measured‐data‐based values shows that spatial patterns of field‐sampled soil moisture data typically fell within uncertainty bounds of model‐based estimates for each cluster. There were however several notable exceptions. The model failed to capture the soil moisture and sap flux dynamics in a riparian zone site and in a site where lateral subsurface flow may not follow surface topography. Results highlight the utility of using a hypothesis driven sampling strategy, based on a physically based model, for efficiently providing new information that can drive both future measurements and strategic refinements to model inputs, parameters, or structure that might reduce these errors. Future research will focus on strategies for using of finer scale representations of microclimate, topography, vegetation, and soil properties to improve models.

    more » « less
  4. Abstract

    By utilizing functional relationships based on observations at plot or field scales, water quality models first compute surface runoff and then use it as the primary governing variable to estimate sediment and nutrient transport. When these models are applied at watershed scales, this serial model structure, coupling a surface runoff sub‐model with a water quality sub‐model, may be inappropriate because dominant hydrological processes differ among scales. A parallel modeling approach is proposed to evaluate how best to combine dominant hydrological processes for predicting water quality at watershed scales. In the parallel scheme, dominant variables of water quality models are identified based entirely on their statistical significance using time series analysis. Four surface runoff models of different model complexity were assessed using both the serial and parallel approaches to quantify the uncertainty on forcing variables used to predict water quality. The eight alternative model structures were tested against a 25‐year high‐resolution data set of streamflow, suspended sediment discharge, and phosphorous discharge at weekly time steps. Models using the parallel approach consistently performed better than serial‐based models, by having less error in predictions of watershed scale streamflow, sediment and phosphorus, which suggests model structures of water quantity and quality models at watershed scales should be reformulated by incorporating the dominant variables. The implication is that hydrological models should be constructed in a way that avoids stacking one sub‐model with one set of scale assumptions onto the front end of another sub‐model with a different set of scale assumptions.

    more » « less
  5. Abstract

    Research at long‐term catchment monitoring sites has generated a great volume, variety, and velocity of data for analysis of stream water chemistry dynamics. To harness the potential of these big data and extract patterns that are indicative of underlying functional relationships, machine learning tools have advantages over traditional statistical methods, and are increasingly being applied for dimension reduction, feature extraction, and trend identification. Still, as examples of complex systems, catchments are characterized by multivariate factor interactions and equifinality that are not easily identified by most machine‐learning methods. Using dissolved organic carbon (DOC) dynamics as an illustration, we applied a new evolutionary algorithm (EA) to extract geologic, topographic, meteorologic, hydrologic, and land use attributes that were correlated to mean stream DOC concentration in forested catchments distributed across the continental United States. The EA reduced dimensionality of our attribute dataset to identify the combination of factors, and their specific value ranges, that interacted to drive membership in High or Low mean DOC clusters. High mean DOC concentrations were associated with two distinct geographic locations of variable climatic and vegetative conditions, indicating equifinality. Our findings underscore the importance of critical zone structure in mediating hydrological and biogeochemical processes to govern DOC dynamics at the catchment scale. This multi‐scale, pattern‐to‐process approach is being applied to refine hypotheses for process‐based modeling of DOC dynamics in forested headwater streams at catchment to site scales.

    more » « less