skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Thursday, February 12 until 1:00 AM ET on Friday, February 13 due to maintenance. We apologize for the inconvenience.


Title: Stuck at Home: Machine‐Learning Models Predicting Solute Concentrations of One Stream Failed to Predict Solute Concentrations in Other Streams
Machine‐learning models have been surprisingly successful at predicting stream solute concentrations, even for solutes without dedicated sensors. It would be extremely valuable if these models could predict solute concentrations in streams beyond the one in which they were trained. We assessed the generalizability of random forest models by training them in one or more streams and testing them in another. Models were made using grab sample and sensor data from 10 New Hampshire streams and rivers. As observed in previous studies, models trained in one stream were capable of accurately predicting solute concentrations in that stream. However, models trained on one stream produced inaccurate predictions of solute concentrations in other streams, with the exception of solutes measured by dedicated sensors (i.e., nitrate and dissolved organic carbon). Using data from multiple watersheds improved model results, but model performance was still worse than using the mean of the training dataset (Nash–Sutcliffe Efficiency < 0). Our results demonstrate that machine‐learning models thus far reliably predict solute concentrations only where trained, as differences in solute concentration patterns and sensor‐solute relationships limit their broader applicability.  more » « less
Award ID(s):
2401760 2215300 2129383 2224545
PAR ID:
10599354
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
John Wiley & Sons Ltd
Date Published:
Journal Name:
Hydrological Processes
Volume:
39
Issue:
5
ISSN:
0885-6087
Subject(s) / Keyword(s):
Biogeochemistry Data interpretation High-frequency environmental data Machine learning Model transferability Random forest models Stream solutes Water quality
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. NA (Ed.)
    Abstract Surface runoff and infiltrated water en route to the stream interact with dynamic landscape properties, ranging from vegetation and microbial activities to soil and geological attributes. Stream solute concentrations are highly variable and interconnected due to these interactions, flow paths, and residence times, and often exhibit hysteresis with flow. Significant unknowns remain about how point measurements of stream solute chemistry reflect interdependent hydrobiogeochemical and physical processes, and how signatures are encapsulated as nonlinear dynamical relationships between variables. We take a Machine Learning (ML) approach to understand and capture these dynamical relationships and improve predictions of solutes at short and long time scales. We introduce a physical process‐based “flow‐gate” into an Long Short‐Term Memory (LSTM) model, which enables the model to learn hysteresis behaviors if they exist. Further, we use information‐theoretic metrics to detect how solutes are interdependent and iteratively select source solutes that best predict a given target solute concentration. The “flow‐gate LSTM” model improves model predictions (1%–32% decreases in RMSE) relative to the standard LSTM model for all nine solutes included in the study. The predictive improvements from the flow‐gate LSTM model highlight the importance of lagged concentration and discharge relationships for certain solutes. It also indicates a potential limitation in the traditional LSTM model approach since flow rates are always provided as input sources, but this information is not fully utilized. This work provides a starting point for a predictive understanding of geochemical interdependencies using machine‐learning approaches and highlights potential improvements in model architecture. 
    more » « less
  2. Solute concentrations in stream water vary with discharge in patterns that record complex feedbacks between hydrologic and biogeochemical processes. In a comparison of headwater catchments underlain by shale in Pennsylvania, USA (Shale Hills) and Wales, UK (Plynlimon), dissimilar concentration-discharge behaviors are best explained by contrasting landscape distributions of soil solution chemistry – especially dissolved organic carbon (DOC) – that have been established by patterns of vegetation. Specifically, elements that are concentrated in organic-rich soils due to biotic cycling (Mn, Ca, K) or that form strong complexes with DOC (Fe, Al) are spatially heterogeneous in pore waters because organic matter is heterogeneously distributed across the catchments. These solutes exhibit non-chemostatic "bioactive" behavior in the streams, and solute concentrations either decrease (Shale Hills) or increase (Plynlimon) with increasing discharge. In contrast, solutes that are concentrated in soil minerals and form only weak complexes with DOC (Na, Mg, Si) are spatially homogeneous in pore waters across each catchment. These solutes are chemostatic in that their stream concentrations vary little with stream discharge, likely because these solutes are released quickly from exchange sites in the soils during rainfall events. Differences in the hydrologic connectivity of organic-rich soils to the stream drive differences in concentration behavior between catchments. As such, in catchments where soil organic matter (SOM) is dominantly in lowlands (e.g., Shale Hills), bioactive elements are released to the stream early during rainfall events, whereas in catchments where SOM is dominantly in uplands (e.g., Plynlimon), bioactive elements are released later during rainfall events. The distribution of vegetation and SOM across the landscape is thus a key component for predictive models of solute transport in headwater catchments. 
    more » « less
  3. Abstract Understanding controls on solute export to streams is challenging because heterogeneous catchments can respond uniquely to drivers of environmental change. To understand general solute export patterns, we used a large‐scale inductive approach to evaluate concentration–discharge (C–Q) metrics across catchments spanning a broad range of catchment attributes and hydroclimatic drivers. We leveraged paired C–Q data for 11 solutes from CAMELS‐Chem, a database built upon an existing dataset of catchment and hydroclimatic attributes from relatively undisturbed catchments across the contiguous USA. Because C–Q relationships with Q thresholds reflect a shift in solute export dynamics and are poorly characterized across solutes and diverse catchments, we analysed C–Q relationships using Bayesian segmented regression to quantify Q thresholds in the C–Q relationship. Threshold responses were rare, representing only 12% of C–Q relationships, 56% of which occurred for solutes predominantly sourced from bedrock. Further, solutes were dominated by one or two C–Q patterns that reflected vertical solute–source distributions. Specifically, solutes predominantly sourced from bedrock had diluting C–Q responses in 43%–70% of catchments, and solutes predominantly sourced from soils had more enrichment responses in 35%–51% of catchments. We also linked C–Q relationships to catchment and hydroclimatic attributes to understand controls on export patterns. The relationships were generally weak despite the diversity of solutes and attribute types considered. However, catchment and hydroclimatic attributes in the central USA typically drove the most divergent export behaviour for solutes. Further, we illustrate how our inductive approach generated new hypotheses that can be tested at discrete, representative catchments using deductive approaches to better understand the processes underlying solute export patterns. Finally, given these long‐term C–Q relationships are from minimally disturbed catchments, our findings can be used as benchmarks for change in more disturbed catchments. 
    more » « less
  4. Abstract Stream fluxes are commonly reported without a complete accounting for uncertainty in the estimates, which makes it difficult to evaluate the significance of findings or to identify where to direct efforts to improve monitoring programs. At the Hubbard Brook Experimental Forest in the White Mountains of New Hampshire, USA, stream flow has been monitored continuously and solute concentrations have been sampled approximately weekly in small, gaged headwater streams since 1963, yet comprehensive uncertainty analyses have not been reported. We propagated uncertainty in the stage height–discharge relationship, watershed area, analytical chemistry, the concentration–discharge relationship used to interpolate solute concentrations, and the streamflow gap‐filling procedure to estimate uncertainty for both streamflow and solute fluxes for a recent 6‐year period (2013–2018) using a Monte Carlo approach. As a percentage of solute fluxes, uncertainty was highest for NH4+(34%), total dissolved nitrogen (8.8%), NO3(8.1%), and K+(7.4%), and lowest for dissolved organic carbon (3.7%), SO42−(4.0%), and Mg2+(4.4%). In units of flux, uncertainties were highest for solutes in highest concentration (Si, DOC, SO42−, and Na+) and lowest for those lowest in concentration (H+and NH4+). Laboratory analysis of solute concentration was a greater source of uncertainty than streamflow for solute flux, with the exception of DOC. Our results suggest that uncertainty in solute fluxes could be reduced with more precise measurements of solute concentrations. Additionally, more discharge measurements during high flows are needed to better characterize the stage‐discharge relationship. Quantifying uncertainty in streamflow and element export is important because it allows for determination of significance of differences in fluxes, which can be used to assess watershed response to disturbance and environmental change. 
    more » « less
  5. Abstract. Solute concentrations in stream water vary with discharge in patterns that record complex feedbacks between hydrologic and biogeochemical processes. In a comparison of three shale-underlain headwater catchments located in Pennsylvania, USA (the forested Shale Hills Critical Zone Observatory), and Wales, UK (the peatland-dominated Upper Hafren and forest-dominated Upper Hore catchments in the Plynlimon forest), dissimilar concentration–discharge (CQ) behaviors are best explained by contrasting landscape distributions of soil solution chemistry – especially dissolved organic carbon (DOC) – that have been established by patterns of vegetation and soil organic matter (SOM). Specifically, elements that are concentrated in organic-rich soils due to biotic cycling (Mn, Ca, K) or that form strong complexes with DOC (Fe, Al) are spatially heterogeneous in pore waters because organic matter is heterogeneously distributed across the catchments. These solutes exhibit non-chemostatic behavior in the streams, and solute concentrations either decrease (Shale Hills) or increase (Plynlimon) with increasing discharge. In contrast, solutes that are concentrated in soil minerals and form only weak complexes with DOC (Na, Mg, Si) are spatially homogeneous in pore waters across each catchment. These solutes are chemostatic in that their stream concentrations vary little with stream discharge, likely because these solutes are released quickly from exchange sites in the soils during rainfall events. Furthermore, concentration–discharge relationships of non-chemostatic solutes changed following tree harvest in the Upper Hore catchment in Plynlimon, while no changes were observed for chemostatic solutes, underscoring the role of vegetation in regulating the concentrations of certain elements in the stream. These results indicate that differences in the hydrologic connectivity of organic-rich soils to the stream drive differences in concentration behavior between catchments. As such, in catchments where SOM is dominantly in lowlands (e.g., Shale Hills), we infer that non-chemostatic elements associated with organic matter are released to the stream early during rainfall events, whereas in catchments where SOM is dominantly in uplands (e.g., Plynlimon), these non-chemostatic elements are released later during rainfall events. The distribution of SOM across the landscape is thus a key component for predictive models of solute transport in headwater catchments. 
    more » « less