skip to main content


Title: Robust Inverse Framework using Knowledge-guided Self-Supervised Learning: An application to Hydrology
Machine Learning is beginning to provide state-of-the-art performance in a range of environmental applications such as streamflow prediction in a hydrologic basin. However, building accurate broad-scale models for streamflow remains challenging in practice due to the variability in the dominant hydrologic processes, which are best captured by sets of process-related basin characteristics. Existing basin characteristics suffer from noise and uncertainty, among many other things, which adversely impact model performance. To tackle the above challenges, in this paper, we propose a novel Knowledge-guided Self-Supervised Learning (KGSSL) inverse framework to extract system characteristics from driver(input) and response(output) data. This first-of-its-kind framework achieves robust performance even when characteristics are corrupted or missing. We evaluate the KGSSL framework in the context of stream flow modeling using CAMELS (Catchment Attributes and MEteorology for Large-sample Studies) which is a widely used hydrology benchmark dataset. Specifically, KGSSL outperforms baseline by 16% in predicting missing characteristics. Furthermore, in the context of forward modelling, KGSSL inferred characteristics provide a 35% improvement in performance over a standard baseline when the static characteristic are unknown.  more » « less
Award ID(s):
1934721 1934548 2147195
NSF-PAR ID:
10354259
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Page Range / eLocation ID:
465 to 474
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In hydrology, modeling streamflow remains a challenging task due to the limited availability of basin characteristics information such as soil geology and geomorphology. These characteristics may be noisy due to measurement errors or may be missing altogether. To overcome this challenge, we propose a knowledge-guided, probabilistic inverse modeling method for recovering physical characteristics from streamflow and weather data, which are more readily available. We compare our framework with state-of-the-art inverse models for estimating river basin characteristics. We also show that these estimates offer improvement in streamflow modeling as opposed to using the original basin characteristic values. Our inverse model offers a 3% improvement in R2 for the inverse model (basin characteristic estimation) and 6% for the forward model (streamflow prediction). Our framework also offers improved explainability since it can quantify uncertainty in both the inverse and the forward model. Uncertainty quantification plays a pivotal role in improving the explainability of machine learning models by providing additional insights into the reliability and limitations of model predictions. In our analysis, we assess the quality of the uncertainty estimates. Compared to baseline uncertainty quantification methods, our framework offers a 10% improvement in the dispersion of epistemic uncertainty and a 13% improvement in coverage rate. This information can help stakeholders understand the level of uncertainty associated with the predictions and provide a more comprehensive view of the potential outcomes. 
    more » « less
  2. Abstract Aim

    Streamflow and water temperature are primary variables influencing the distribution of freshwater taxa. Climate‐induced changes in these variables are already causing shifts in species distributions, with continued changes projected in the coming decades. The Mobile River Basin (MRB), located in the southeastern United States, contains some of the highest levels of temperate freshwater biodiversity in North America. We integrated species distribution data with contemporary and future streamflow and water temperature data as well as other physical habitat data to characterize occurrence probabilities of fish species in the MRB with the goal of identifying current and future areas of high conservation value.

    Location

    Mobile River Basin, southeastern United States.

    Methods

    We used a maximum entropy approach to estimate baseline and future occurrence probability distributions for 88 fish species in the MRB based on model‐generated streamflow and water temperature as well as geologic, topographic and land cover data. Areas of conservation prioritization were identified based on regions that contain suitable habitat for high levels of biodiversity according to baseline and future conditions while accounting for uncertainty associated with multiple future climate projections.

    Results

    On average, flow (28%), water temperature (28%) and geology (30%) contribute evenly to determining suitable habitat for fish species in the MRB. Based on baseline and future species distribution model estimates, high priority streams (best 10%) are largely concentrated in the eastern portion of the MRB, with a majority (51%) located within the Coosa and Tallapoosa River systems.

    Main conclusion

    We provide a framework that uses relevant hydrologic and environmental data in the context of future climatic uncertainty to estimate areas of freshwater conservation opportunity in the coming decades. While streamflow and water temperature represent important habitat for freshwater fishes in the MRB, distributions are also constrained by other aspects of the physical environment.

     
    more » « less
  3. Shekhar, Shashi ; Zhou, Zhi-Hua ; Chiang, Yao-Yi ; Stiglic, Gregor (Ed.)
    Rapid advancement in inverse modeling methods have brought into light their susceptibility to imperfect data. This has made it imperative to obtain more explainable and trustworthy estimates from these models. In hydrology, basin characteristics can be noisy or missing, impacting streamflow prediction. We propose a probabilistic inverse model framework that can reconstruct robust hydrology basin characteristics from dynamic input weather driver and streamflow response data. We address two aspects of building more explainable inverse models, uncertainty estimation (uncertainty due to imperfect data and imperfect model) and robustness. This can help improve the trust of water managers, handling of noisy data and reduce costs. We also propose an uncertainty based loss regularization that offers removal of 17% of temporal artifacts in reconstructions, 36% reduction in uncertainty and 4% higher coverage rate for basin characteristics. The forward model performance (streamflow estimation) is also improved by 6% using these uncertainty learning based reconstructions. 
    more » « less
  4. Abstract

    The Amazon River basin contains a vast diversity of lotic habitats and accompanying hydrological regimes. Further understanding the spatial distribution of flow regimes across the Amazon can be useful for recognizing riverine ecohydrological processes and informing river management and conservation, especially in areas with limited or inconsistent streamflow monitoring.

    This study compares four inductive approaches for classifying streamflow regimes across the Amazon using an unprecedented compilation of streamflow records from Bolivia, Brazil, Colombia, Ecuador, and Peru.

    Inductive classification schemes use attributes of streamflow data to categorize river reaches into similar classes, which then may be generalized to understand streamflow behaviour at the basin scale. In this study, classification was accomplished through hierarchical clustering of 67 flow metrics calculated using indicators of hydrologic alteration (IHA) and daily streamflow data from median annual hydrographs (MAHs) for 404 stations (representing >7,000 station‐years) across five Amazonian countries.

    Classification was performed using both flow magnitude‐inclusive and flow magnitude‐independent datasets. For flow magnitude‐independent methods, optimal solutions included six or seven primary hydrological classes for IHA and MAH datasets; for approaches that retained magnitude, variance was sufficiently large to prevent convergence to a specific number of classes.

    Across methods, class membership was strongly associated with the timing, frequency, and rate of change of flow, and spatially coherent clusters were associated with seasonal, elevational, and stream‐order gradients. These results highlight the diversity of flow regimes across the Amazon and provide a framework for studying relationships between hydrological regimes and ecological responses in the context of changing climate, land use, and human‐induced hydrological alteration.

    The methodology applied provides a data‐driven approach for classifying flow regimes based on observed data. When coupled with ecological knowledge and expertise, these classifications can be used to develop ecohydrologically informed and management‐relevant conservation practices.

     
    more » « less
  5. Abstract

    Streamflow prediction is a long‐standing hydrologic problem. Development of models for streamflow prediction often requires incorporation of catchment physical descriptors to characterize the associated complex hydrological processes. Across different scales of catchments, these physical descriptors also allow models to extrapolate hydrologic information from one catchment to others, a process referred to as “regionalization”. Recently, in gauged basin scenarios, deep learning models have been shown to achieve state of the art regionalization performance by building a global hydrologic model. These models predict streamflow given catchment physical descriptors and weather forcing data. However, these physical descriptors are by their nature uncertain, sometimes incomplete, or even unavailable in certain cases, which limits the applicability of this approach. In this paper, we show that by assigning a vector of random values as a surrogate for catchment physical descriptors, we can achieve robust regionalization performance under a gauged prediction scenario. Our results show that the deep learning model using our proposed random vector approach achieves a predictive performance comparable to that of the model using actual physical descriptors. The random vector approach yields robust performance under different data sparsity scenarios and deep learning model selections. Furthermore, based on the use of random vectors, high‐dimensional characterization improves regionalization performance in gauged basin scenario when physical descriptors are uncertain, or insufficient.

     
    more » « less