skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Improving Stream Solute Predictions With a Modified LSTM Model Incorporating Solute Interdependences and Hysteresis Patterns
Abstract Surface runoff and infiltrated water en route to the stream interact with dynamic landscape properties, ranging from vegetation and microbial activities to soil and geological attributes. Stream solute concentrations are highly variable and interconnected due to these interactions, flow paths, and residence times, and often exhibit hysteresis with flow. Significant unknowns remain about how point measurements of stream solute chemistry reflect interdependent hydrobiogeochemical and physical processes, and how signatures are encapsulated as nonlinear dynamical relationships between variables. We take a Machine Learning (ML) approach to understand and capture these dynamical relationships and improve predictions of solutes at short and long time scales. We introduce a physical process‐based “flow‐gate” into an Long Short‐Term Memory (LSTM) model, which enables the model to learn hysteresis behaviors if they exist. Further, we use information‐theoretic metrics to detect how solutes are interdependent and iteratively select source solutes that best predict a given target solute concentration. The “flow‐gate LSTM” model improves model predictions (1%–32% decreases in RMSE) relative to the standard LSTM model for all nine solutes included in the study. The predictive improvements from the flow‐gate LSTM model highlight the importance of lagged concentration and discharge relationships for certain solutes. It also indicates a potential limitation in the traditional LSTM model approach since flow rates are always provided as input sources, but this information is not fully utilized. This work provides a starting point for a predictive understanding of geochemical interdependencies using machine‐learning approaches and highlights potential improvements in model architecture.  more » « less
Award ID(s):
2012850
PAR ID:
10582156
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
DOI PREFIX: 10.1029
Date Published:
Journal Name:
Journal of Geophysical Research: Machine Learning and Computation
Volume:
2
Issue:
1
ISSN:
2993-5210
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT Machine‐learning models have been surprisingly successful at predicting stream solute concentrations, even for solutes without dedicated sensors. It would be extremely valuable if these models could predict solute concentrations in streams beyond the one in which they were trained. We assessed the generalisability of random forest models by training them in one or more streams and testing them in another. Models were made using grab sample and sensor data from 10 New Hampshire streams and rivers. As observed in previous studies, models trained in one stream were capable of accurately predicting solute concentrations in that stream. However, models trained on one stream produced inaccurate predictions of solute concentrations in other streams, with the exception of solutes measured by dedicated sensors (i.e., nitrate and dissolved organic carbon). Using data from multiple watersheds improved model results, but model performance was still worse than using the mean of the training dataset (Nash–Sutcliffe Efficiency < 0). Our results demonstrate that machine‐learning models thus far reliably predict solute concentrations only where trained, as differences in solute concentration patterns and sensor‐solute relationships limit their broader applicability. 
    more » « less
  2. Abstract Stream solute monitoring has produced many insights into ecosystem and Earth system functions. Although new sensors have provided novel information about the fine‐scale temporal variation of some stream water solutes, we lack adequate sensor technology to gain the same insights for many other solutes. We used two machine learning algorithms – Support Vector Machine and Random Forest – to predict concentrations at 15‐min resolution for 10 solutes, of which eight lack specific sensors. The algorithms were trained with data from intensive stream sensing and manual stream sampling (weekly) for four full years in a hydrologic reference stream within the Hubbard Brook Experimental Forest in New Hampshire, USA. The Random Forest algorithm was slightly better at predicting solute concentrations than the Support Vector Machine algorithm (Nash‐Sutcliffe efficiencies ranged from 0.35 to 0.78 for Random Forest compared to 0.29 to 0.79 for Support Vector Machine). Solute predictions were most sensitive to the removal of fluorescent dissolved organic matter, pH and specific conductance as independent variables for both algorithms, and least sensitive to dissolved oxygen and turbidity. The predicted concentrations of calcium and monomeric aluminium were used to estimate catchment solute yield, which changed most dramatically for aluminium because it concentrates with stream discharge. These results show great promise for using a combined approach of stream sensing and intensive stream discrete sampling to build information about the high‐frequency variation of solutes for which an appropriate sensor or proxy is not available. 
    more » « less
  3. Abstract Understanding controls on solute export to streams is challenging because heterogeneous catchments can respond uniquely to drivers of environmental change. To understand general solute export patterns, we used a large‐scale inductive approach to evaluate concentration–discharge (C–Q) metrics across catchments spanning a broad range of catchment attributes and hydroclimatic drivers. We leveraged paired C–Q data for 11 solutes from CAMELS‐Chem, a database built upon an existing dataset of catchment and hydroclimatic attributes from relatively undisturbed catchments across the contiguous USA. Because C–Q relationships with Q thresholds reflect a shift in solute export dynamics and are poorly characterized across solutes and diverse catchments, we analysed C–Q relationships using Bayesian segmented regression to quantify Q thresholds in the C–Q relationship. Threshold responses were rare, representing only 12% of C–Q relationships, 56% of which occurred for solutes predominantly sourced from bedrock. Further, solutes were dominated by one or two C–Q patterns that reflected vertical solute–source distributions. Specifically, solutes predominantly sourced from bedrock had diluting C–Q responses in 43%–70% of catchments, and solutes predominantly sourced from soils had more enrichment responses in 35%–51% of catchments. We also linked C–Q relationships to catchment and hydroclimatic attributes to understand controls on export patterns. The relationships were generally weak despite the diversity of solutes and attribute types considered. However, catchment and hydroclimatic attributes in the central USA typically drove the most divergent export behaviour for solutes. Further, we illustrate how our inductive approach generated new hypotheses that can be tested at discrete, representative catchments using deductive approaches to better understand the processes underlying solute export patterns. Finally, given these long‐term C–Q relationships are from minimally disturbed catchments, our findings can be used as benchmarks for change in more disturbed catchments. 
    more » « less
  4. Solute concentrations in stream water vary with discharge in patterns that record complex feedbacks between hydrologic and biogeochemical processes. In a comparison of headwater catchments underlain by shale in Pennsylvania, USA (Shale Hills) and Wales, UK (Plynlimon), dissimilar concentration-discharge behaviors are best explained by contrasting landscape distributions of soil solution chemistry – especially dissolved organic carbon (DOC) – that have been established by patterns of vegetation. Specifically, elements that are concentrated in organic-rich soils due to biotic cycling (Mn, Ca, K) or that form strong complexes with DOC (Fe, Al) are spatially heterogeneous in pore waters because organic matter is heterogeneously distributed across the catchments. These solutes exhibit non-chemostatic "bioactive" behavior in the streams, and solute concentrations either decrease (Shale Hills) or increase (Plynlimon) with increasing discharge. In contrast, solutes that are concentrated in soil minerals and form only weak complexes with DOC (Na, Mg, Si) are spatially homogeneous in pore waters across each catchment. These solutes are chemostatic in that their stream concentrations vary little with stream discharge, likely because these solutes are released quickly from exchange sites in the soils during rainfall events. Differences in the hydrologic connectivity of organic-rich soils to the stream drive differences in concentration behavior between catchments. As such, in catchments where soil organic matter (SOM) is dominantly in lowlands (e.g., Shale Hills), bioactive elements are released to the stream early during rainfall events, whereas in catchments where SOM is dominantly in uplands (e.g., Plynlimon), bioactive elements are released later during rainfall events. The distribution of vegetation and SOM across the landscape is thus a key component for predictive models of solute transport in headwater catchments. 
    more » « less
  5. Stream channel burial drastically alters watershed flowpaths by routing surface waters underground and increasing the potential for interactions between stream water and urban infrastructure such as storm and sanitary sewers. While numerous studies have investigated storm event solute loads from urban watersheds, the influences of stream channel burial and sewer overflows are often overlooked. This study uses grab samples and natural abundance stable isotope tracers to quantify the event dynamics of solute concentration-discharge relationships as well as cumulative loads in a buried urban stream. Our results demonstrate that different solutes, as well as different sources of the same solute (atmospheric NO3and sewer-derived NO3differentiated by the Δ17O tracer), are delivered via separate watershed flowpaths and thus have different timings within the event and contrasting relationships to flow. This inter-event variability reveals dynamics that result from temporal and spatial heterogeneity in infiltration, exfiltration, and pipe overflows. These results can help guide system-wide infrastructure maintenance as cities seek to meet challenges in sustaining and improving water quality as infrastructural systems age. 
    more » « less