skip to main content


Title: Using Machine Learning to Identify Hydrologic Signatures With an Encoder–Decoder Framework
Abstract

Hydrologic signatures are quantitative metrics that describe a streamflow time series. Examples include annual maximum flow, baseflow index and recession shape descriptors. In this paper, we use machine learning (ML) to learn encodings that are optimal ML equivalents of hydrologic signatures, and that are derived directly from the data. We compare the learned signatures to classical signatures, interpret their meaning, and use them to build rainfall‐runoff models in otherwise ungauged watersheds. Our model has an encoder–decoder structure. The encoder is a convolutional neural net mapping historical flow and climate data to a low‐dimensional vector encoding, analogous to hydrological signatures. The decoder structure includes stores and fluxes similar to a classical hydrologic model. For each timestep, the decoder uses current climate data, watershed attributes and the encoding to predict coefficients that distribute precipitation between stores and store outflow coefficients. The model is trained end‐to‐end on the U.S. CAMELS watershed data set to minimize streamflow error. We show that learned signatures can extract new information from streamflow series, because using learned signatures as input to the process‐informed model improves prediction accuracy over benchmark configurations that use classical signatures or no signatures. We interpret learned signatures by correlation with classical signatures, and by using sensitivity analysis to assess their impact on modeled store dynamics. Learned signatures are spatially correlated and relate to streamflow dynamics including seasonality, high and low extremes, baseflow and recessions. We conclude that process‐informed ML models and other applications using hydrologic signatures may benefit from replacing expert‐selected signatures with learned signatures.

 
more » « less
Award ID(s):
2124923
NSF-PAR ID:
10399796
Author(s) / Creator(s):
 ;  
Publisher / Repository:
DOI PREFIX: 10.1029
Date Published:
Journal Name:
Water Resources Research
Volume:
59
Issue:
3
ISSN:
0043-1397
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract. Streamflow regimes are rapidly changing in many regions of the world. Attribution of these changes to specific hydrological processes and their underlying climatic and anthropogenic drivers is essential to formulate an effective water policy. Traditional approaches to hydrologic attribution rely on the ability to infer hydrological processes through the development of catchment-scale hydrological models. However, such approaches are challenging to implement in practice due to limitations in using models to accurately associate changes in observed outcomes with corresponding drivers. Here we present an alternative approach that leverages the method of multiple hypotheses to attribute changes in streamflow in the Upper Jhelum watershed, an important tributary headwater region of the Indus basin, where a dramatic decline in streamflow since 2000 has yet to be adequately attributed to its corresponding drivers. We generate and empirically evaluate a series of alternative and complementary hypotheses concerning distinct components of the water balance. This process allows a holistic understanding of watershed-scale processes to be developed, even though the catchment-scale water balance remains open. Using remote sensing and secondary data, we explore changes in climate, surface water, and groundwater. The evidence reveals that climate, rather than land use, had a considerably stronger influence on reductions in streamflow, both through reduced precipitation and increased evapotranspiration. Baseflow analyses suggest different mechanisms affecting streamflow decline in upstream and downstream regions, respectively. These findings offer promising avenues for future research in the Upper Jhelum watershed, and an alternative approach to hydrological attribution in data-scarce regions. 
    more » « less
  2. Abstract

    Hydrologic signatures are quantitative metrics that describe streamflow statistics and dynamics. Signatures have many applications, including assessing habitat suitability and hydrologic alteration, calibrating and evaluating hydrologic models, defining similarity between watersheds and investigating watershed processes. Increasingly, signatures are being used in large sample studies to guide flow management and modelling at continental scales. Using signatures in studies involving 1000s of watersheds brings new challenges as it becomes impractical to examine signature parameters and behaviour in each watershed. For example, we might wish to check that signatures describing flood event characteristics have correctly identified event periods, that signature values have not been biassed by data errors, or that human and natural influences on signature values have been correctly interpreted. In this commentary, we draw from our collective experience to present case studies where naïve application of signatures fails to correctly identify streamflow dynamics. These include unusual precipitation or flow regimes, data quality issues, and signature use in human‐influenced watersheds. We conclude by providing guidance and recommendations on applying signatures in large sample studies.

     
    more » « less
  3. Abstract

    Despite a multitude of small catchment studies, we lack a deep understanding of how variations in critical zone architecture lead to variations in hydrologic states and fluxes. This study characterizes hydrologic dynamics of 15 catchments of the U.S. Critical Zone Observatory (CZO) network where we hypothesized that our understanding of subsurface structure would illuminate patterns of hydrologic partitioning. The CZOs collect data sets that characterize the physical, chemical, and biological architecture of the subsurface, while also monitoring hydrologic fluxes such as streamflow, precipitation, and evapotranspiration. For the first time, we collate time series of hydrologic variables across the CZO network and begin the process of examining hydrologic signatures across sites. We find that catchments with low baseflow indices and high runoff sensitivity to storage receive most of their precipitation as rain and contain clay‐rich regolith profiles, prominent argillic horizons, and/or anthropogenic modifications. In contrast, sites with high baseflow indices and low runoff sensitivity to storage receive the majority of precipitation as snow and have more permeable regolith profiles. The seasonal variability of water balance components is a key control on the dynamic range of hydraulically connected water in the critical zone. These findings lead us to posit that water balance partitioning and streamflow hydraulics are linked through the coevolution of critical zone architecture but that much work remains to parse these controls out quantitatively.

     
    more » « less
  4. Abstract

    How precipitation (P) is translated into streamflow (Q) and over what timescales (i.e., “memory”) is difficult to predict without calibration of site‐specific models or using geochemical approaches, posing barriers to prediction in ungauged basins or advancement of general theories. Here, we used a data‐driven approach to identify regional patterns and exogenous controls on P–Q interactions. We applied an information flow analysis, which quantifies uncertainty reduction, to a daily time series of P and Q from 671 watersheds across the conterminous United States. We first demonstrated that information transfer from P to Q primarily reflects the quickflow component of water‐budgets, based on a watershed model. Readily quantifiable information flows show a functional relationship with model parameters, suggesting utility for model calibration. Second, applied to real watersheds, P–Q information flows exhibit seasonally varying behavior within regions in a manner consistent with dominant runoff generation mechanisms. However, the timing and the magnitude of information flows also reflect considerable subregional heterogeneity, likely attributable to differences in watershed size, baseflow contributions, and variation in aerial coverage of preferential flow paths. A regression analysis showed that a combination of climate and watershed characteristics are predictive of P–Q information flows. Though information flows cannot, in most cases, uniquely determine dominant runoff mechanisms, they provide a means to quantify the heterogeneous outcomes of those mechanisms within regions, thereby serving as a benchmarking tool for models developed at the regional scale. Last, information flows characterize regionally specific ways in which catchment connectivity changes from the wet to dry season.

     
    more » « less
  5. Thenkabail, Prasad S. (Ed.)

    Physically based hydrologic models require significant effort and extensive information for development, calibration, and validation. The study explored the use of the random forest regression (RFR), a supervised machine learning (ML) model, as an alternative to the physically based Soil and Water Assessment Tool (SWAT) for predicting streamflow in the Rio Grande Headwaters near Del Norte, a snowmelt-dominated mountainous watershed of the Upper Rio Grande Basin. Remotely sensed data were used for the random forest machine learning analysis (RFML) and RStudio for data processing and synthesizing. The RFML model outperformed the SWAT model in accuracy and demonstrated its capability in predicting streamflow in this region. We implemented a customized approach to the RFR model to assess the model’s performance for three training periods, across 1991–2010, 1996–2010, and 2001–2010; the results indicated that the model’s accuracy improved with longer training periods, implying that the model trained on a more extended period is better able to capture the parameters’ variability and reproduce streamflow data more accurately. The variable importance (i.e., IncNodePurity) measure of the RFML model revealed that the snow depth and the minimum temperature were consistently the top two predictors across all training periods. The paper also evaluated how well the SWAT model performs in reproducing streamflow data of the watershed with a conventional approach. The SWAT model needed more time and data to set up and calibrate, delivering acceptable performance in annual mean streamflow simulation, with satisfactory index of agreement (d), coefficient of determination (R2), and percent bias (PBIAS) values, but monthly simulation warrants further exploration and model adjustments. The study recommends exploring snowmelt runoff hydrologic processes, dust-driven sublimation effects, and more detailed topographic input parameters to update the SWAT snowmelt routine for better monthly flow estimation. The results provide a critical analysis for enhancing streamflow prediction, which is valuable for further research and water resource management, including snowmelt-driven semi-arid regions.

     
    more » « less