skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Predicting high‐frequency variation in stream solute concentrations with water quality sensors and machine learning
Abstract Stream solute monitoring has produced many insights into ecosystem and Earth system functions. Although new sensors have provided novel information about the fine‐scale temporal variation of some stream water solutes, we lack adequate sensor technology to gain the same insights for many other solutes. We used two machine learning algorithms – Support Vector Machine and Random Forest – to predict concentrations at 15‐min resolution for 10 solutes, of which eight lack specific sensors. The algorithms were trained with data from intensive stream sensing and manual stream sampling (weekly) for four full years in a hydrologic reference stream within the Hubbard Brook Experimental Forest in New Hampshire, USA. The Random Forest algorithm was slightly better at predicting solute concentrations than the Support Vector Machine algorithm (Nash‐Sutcliffe efficiencies ranged from 0.35 to 0.78 for Random Forest compared to 0.29 to 0.79 for Support Vector Machine). Solute predictions were most sensitive to the removal of fluorescent dissolved organic matter, pH and specific conductance as independent variables for both algorithms, and least sensitive to dissolved oxygen and turbidity. The predicted concentrations of calcium and monomeric aluminium were used to estimate catchment solute yield, which changed most dramatically for aluminium because it concentrates with stream discharge. These results show great promise for using a combined approach of stream sensing and intensive stream discrete sampling to build information about the high‐frequency variation of solutes for which an appropriate sensor or proxy is not available.  more » « less
Award ID(s):
1637685 1907683
PAR ID:
10452592
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Hydrological Processes
Volume:
35
Issue:
1
ISSN:
0885-6087
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT Machine‐learning models have been surprisingly successful at predicting stream solute concentrations, even for solutes without dedicated sensors. It would be extremely valuable if these models could predict solute concentrations in streams beyond the one in which they were trained. We assessed the generalisability of random forest models by training them in one or more streams and testing them in another. Models were made using grab sample and sensor data from 10 New Hampshire streams and rivers. As observed in previous studies, models trained in one stream were capable of accurately predicting solute concentrations in that stream. However, models trained on one stream produced inaccurate predictions of solute concentrations in other streams, with the exception of solutes measured by dedicated sensors (i.e., nitrate and dissolved organic carbon). Using data from multiple watersheds improved model results, but model performance was still worse than using the mean of the training dataset (Nash–Sutcliffe Efficiency < 0). Our results demonstrate that machine‐learning models thus far reliably predict solute concentrations only where trained, as differences in solute concentration patterns and sensor‐solute relationships limit their broader applicability. 
    more » « less
  2. Abstract. Solute concentrations in stream water vary with discharge in patterns that record complex feedbacks between hydrologic and biogeochemical processes. In a comparison of three shale-underlain headwater catchments located in Pennsylvania, USA (the forested Shale Hills Critical Zone Observatory), and Wales, UK (the peatland-dominated Upper Hafren and forest-dominated Upper Hore catchments in the Plynlimon forest), dissimilar concentration–discharge (CQ) behaviors are best explained by contrasting landscape distributions of soil solution chemistry – especially dissolved organic carbon (DOC) – that have been established by patterns of vegetation and soil organic matter (SOM). Specifically, elements that are concentrated in organic-rich soils due to biotic cycling (Mn, Ca, K) or that form strong complexes with DOC (Fe, Al) are spatially heterogeneous in pore waters because organic matter is heterogeneously distributed across the catchments. These solutes exhibit non-chemostatic behavior in the streams, and solute concentrations either decrease (Shale Hills) or increase (Plynlimon) with increasing discharge. In contrast, solutes that are concentrated in soil minerals and form only weak complexes with DOC (Na, Mg, Si) are spatially homogeneous in pore waters across each catchment. These solutes are chemostatic in that their stream concentrations vary little with stream discharge, likely because these solutes are released quickly from exchange sites in the soils during rainfall events. Furthermore, concentration–discharge relationships of non-chemostatic solutes changed following tree harvest in the Upper Hore catchment in Plynlimon, while no changes were observed for chemostatic solutes, underscoring the role of vegetation in regulating the concentrations of certain elements in the stream. These results indicate that differences in the hydrologic connectivity of organic-rich soils to the stream drive differences in concentration behavior between catchments. As such, in catchments where SOM is dominantly in lowlands (e.g., Shale Hills), we infer that non-chemostatic elements associated with organic matter are released to the stream early during rainfall events, whereas in catchments where SOM is dominantly in uplands (e.g., Plynlimon), these non-chemostatic elements are released later during rainfall events. The distribution of SOM across the landscape is thus a key component for predictive models of solute transport in headwater catchments. 
    more » « less
  3. Abstract Synoptic sampling of streams is an inexpensive way to gain insight into the spatial distribution of dissolved constituents in the subsurface critical zone. Few spatial synoptics have focused on urban watersheds although this approach is useful in urban areas where monitoring wells are uncommon. Baseflow stream sampling was used to quantify spatial variability of water chemistry in a highly developed Piedmont watershed in suburban Baltimore, MD having no permitted point discharges. Six synoptic surveys were conducted from 2014 to 2016 after an average of 10 days of no rain, when stream discharge was composed of baseflow from groundwater. Samples collected every 50 m over 5 km were analyzed for nitrate, sulfate, chloride, fluoride, and water stable isotopes. Longitudinal spatial patterns differed across constituents for each survey, but the pattern for each constituent varied little across synoptics. Results suggest a spatially heterogeneous, three‐dimensional pattern of localized groundwater contaminant zones steadily contributing solutes to the stream network, where high concentrations result from current and legacy land use practices. By contrast, observations from 35 point piezometers indicate that sparse groundwater measurements are not a good predictor of baseflow stream chemistry in this geologic setting. Cross‐covariance analysis of stream solute concentrations with groundwater model/backward particle tracking results suggest that spatial changes in base‐flow solute concentrations are associated with urban features such as impervious surface area, fill, and leaking potable water and sanitary sewer pipes. Predicted subsurface residence times suggest that legacy solute sources drive baseflow stream chemistry in the urban critical zone. 
    more » « less
  4. Solute concentrations in stream water vary with discharge in patterns that record complex feedbacks between hydrologic and biogeochemical processes. In a comparison of headwater catchments underlain by shale in Pennsylvania, USA (Shale Hills) and Wales, UK (Plynlimon), dissimilar concentration-discharge behaviors are best explained by contrasting landscape distributions of soil solution chemistry – especially dissolved organic carbon (DOC) – that have been established by patterns of vegetation. Specifically, elements that are concentrated in organic-rich soils due to biotic cycling (Mn, Ca, K) or that form strong complexes with DOC (Fe, Al) are spatially heterogeneous in pore waters because organic matter is heterogeneously distributed across the catchments. These solutes exhibit non-chemostatic "bioactive" behavior in the streams, and solute concentrations either decrease (Shale Hills) or increase (Plynlimon) with increasing discharge. In contrast, solutes that are concentrated in soil minerals and form only weak complexes with DOC (Na, Mg, Si) are spatially homogeneous in pore waters across each catchment. These solutes are chemostatic in that their stream concentrations vary little with stream discharge, likely because these solutes are released quickly from exchange sites in the soils during rainfall events. Differences in the hydrologic connectivity of organic-rich soils to the stream drive differences in concentration behavior between catchments. As such, in catchments where soil organic matter (SOM) is dominantly in lowlands (e.g., Shale Hills), bioactive elements are released to the stream early during rainfall events, whereas in catchments where SOM is dominantly in uplands (e.g., Plynlimon), bioactive elements are released later during rainfall events. The distribution of vegetation and SOM across the landscape is thus a key component for predictive models of solute transport in headwater catchments. 
    more » « less
  5. Abstract Stream fluxes are commonly reported without a complete accounting for uncertainty in the estimates, which makes it difficult to evaluate the significance of findings or to identify where to direct efforts to improve monitoring programs. At the Hubbard Brook Experimental Forest in the White Mountains of New Hampshire, USA, stream flow has been monitored continuously and solute concentrations have been sampled approximately weekly in small, gaged headwater streams since 1963, yet comprehensive uncertainty analyses have not been reported. We propagated uncertainty in the stage height–discharge relationship, watershed area, analytical chemistry, the concentration–discharge relationship used to interpolate solute concentrations, and the streamflow gap‐filling procedure to estimate uncertainty for both streamflow and solute fluxes for a recent 6‐year period (2013–2018) using a Monte Carlo approach. As a percentage of solute fluxes, uncertainty was highest for NH4+(34%), total dissolved nitrogen (8.8%), NO3(8.1%), and K+(7.4%), and lowest for dissolved organic carbon (3.7%), SO42−(4.0%), and Mg2+(4.4%). In units of flux, uncertainties were highest for solutes in highest concentration (Si, DOC, SO42−, and Na+) and lowest for those lowest in concentration (H+and NH4+). Laboratory analysis of solute concentration was a greater source of uncertainty than streamflow for solute flux, with the exception of DOC. Our results suggest that uncertainty in solute fluxes could be reduced with more precise measurements of solute concentrations. Additionally, more discharge measurements during high flows are needed to better characterize the stage‐discharge relationship. Quantifying uncertainty in streamflow and element export is important because it allows for determination of significance of differences in fluxes, which can be used to assess watershed response to disturbance and environmental change. 
    more » « less