Abstract Surface runoff and infiltrated water en route to the stream interact with dynamic landscape properties, ranging from vegetation and microbial activities to soil and geological attributes. Stream solute concentrations are highly variable and interconnected due to these interactions, flow paths, and residence times, and often exhibit hysteresis with flow. Significant unknowns remain about how point measurements of stream solute chemistry reflect interdependent hydrobiogeochemical and physical processes, and how signatures are encapsulated as nonlinear dynamical relationships between variables. We take a Machine Learning (ML) approach to understand and capture these dynamical relationships and improve predictions of solutes at short and long time scales. We introduce a physical process‐based “flow‐gate” into an Long Short‐Term Memory (LSTM) model, which enables the model to learn hysteresis behaviors if they exist. Further, we use information‐theoretic metrics to detect how solutes are interdependent and iteratively select source solutes that best predict a given target solute concentration. The “flow‐gate LSTM” model improves model predictions (1%–32% decreases in RMSE) relative to the standard LSTM model for all nine solutes included in the study. The predictive improvements from the flow‐gate LSTM model highlight the importance of lagged concentration and discharge relationships for certain solutes. It also indicates a potential limitation in the traditional LSTM model approach since flow rates are always provided as input sources, but this information is not fully utilized. This work provides a starting point for a predictive understanding of geochemical interdependencies using machine‐learning approaches and highlights potential improvements in model architecture. 
                        more » 
                        « less   
                    
                            
                            Multi‐Model Machine Learning Approach Accurately Predicts Lake Dissolved Oxygen With Multiple Environmental Inputs
                        
                    
    
            Abstract As a key water quality parameter, dissolved oxygen (DO) concentration, and particularly changes in bottom water DO is fundamental for understanding the biogeochemical processes in lake ecosystems. Based on two machine learning (ML) models, Gradient Boost Regressor (GBR) and long‐short‐term‐memory (LSTM) network, this study developed three ML model approaches: direct GBR; direct LSTM; and a 2‐step mixed ML model workflow combining both GBR and LSTM. They were used to simulate multi‐year surface and bottom DO concentrations in five lakes. All approaches were trained with readily available environmental data as predictors. Indices of lake thermal structure and mixing provided by a one‐dimensional (1‐D) hydrodynamic model were also included as predictors in the ML models. The advantages of each ML approach were not consistent for all the tested lakes, but the best one of them was defined that can estimate DO concentration with coefficient of determination (R2) up to 0.6–0.7 in each lake. All three approaches have normalized mean absolute error (NMAE) under 0.15. In a polymictic lake, the 2‐step mixed model workflow showed better representation of bottom DO concentrations, with a highest true positive rate (TPR) of hypolimnetic hypoxia detection of over 90%, while the other workflows resulted in, TPRs are around 50%. In most of the tested lakes, the predicted surface DO concentrations and variables indicating stratified conditions (i.e., Wedderburn number and the temperature difference between surface and bottom water) are essential for simulating bottom DO. The ML approaches showed promising results and could be used to support short‐ and long‐term water management plans. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2025982
- PAR ID:
- 10571747
- Publisher / Repository:
- American Geophysical Union
- Date Published:
- Journal Name:
- Earth and Space Science
- Volume:
- 11
- Issue:
- 7
- ISSN:
- 2333-5084
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract The concentration of dissolved oxygen (DO) is an important attribute of aquatic ecosystems, influencing habitat, drinking water quality, biodiversity, nutrient biogeochemistry, and greenhouse gas emissions. While average summer DO concentrations are declining in lakes across the temperate zone, much remains unknown about seasonal factors contributing to deepwater DO losses. It is unclear whether declines are related to increasing rates of seasonal DO depletion or changes in seasonal stratification that limit re‐oxygenation of deep waters. Furthermore, despite the presence of important biological and ecological DO thresholds, there has been no large‐scale assessment of changes in the amount of habitat crossing these thresholds, limiting the ability to understand the consequences of observed DO losses. We used a dataset from >400 widely distributed lakes to identify the drivers of DO losses and quantify the frequency and volume of lake water crossing biologically and ecologically important threshold concentrations ranging from 5 to 0.5 mg/L. Our results show that while there were no consistent changes over time in seasonal DO depletion rates, over three‐quarters of lakes exhibited an increase in the duration of stratification, providing more time for seasonal deepwater DO depletion to occur. As a result, most lakes have experienced summertime increases in the amount of water below all examined thresholds in deepwater DO concentration, with increases in the proportion of the water column below thresholds ranging between 0.9% and 1.7% per decade. In the 30‐day period preceding the end of stratification, increases were greater at >2.2% per decade and >70% of analyzed lakes experienced increases in the amount of oxygen‐depleted water. These results indicate ongoing climate‐induced increases in the duration of stratification have already contributed to reduction of habitat for many species, likely increased internal nutrient loading, and otherwise altered lake chemistry. Future warming is likely to exacerbate these trends.more » « less
- 
            Abstract Land surface temperature (LST) is crucial for understanding earth system processes. We expanded the Advanced Baseline Imager Live Imaging of Vegetated Ecosystems (ALIVE) framework to estimate LST in near‐real‐time for both cloudy and clear sky conditions at a five‐minute resolution. We compared two machine learning (ML) models, Long Short‐Term Memory (LSTM) networks and Gradient Boosting Regressor (GBR), using top‐of‐atmosphere observations from the Advanced Baseline Imager (ABI) on the GOES‐16 satellite against observations from hundreds of observation sites for a five‐year period. Long Short‐Term Memory outperformed GBR, especially at coarser resolutions and under challenging conditions, with a clear sky R2of 0.96 (RMSE 2.31K) and a cloudy sky R2of 0.83 (RMSE 4.10K) across CONUS, based on 10‐repeat Leave‐One‐Out Cross‐Validation (LOOCV). GBR maintained high accuracy and ran 5.3 times faster, with only a 0.01–0.02 R2drop. Feature importance revealed infrared bands were key in both models, with LSTM adapting dynamically to atmospheric changes, while GBR utilized more time information in cloudy conditions. A comparative analysis against the physically based ABILSTproduct showed strong agreement in winter, particularly under clear sky conditions, while also highlighting the challenges of summer LST estimation due to increased thermal variability. This study underscores the strengths and limitations of data‐driven models for LST estimation and suggests potential pathways for integrating ML models to enhance the accuracy and coverage of LST products.more » « less
- 
            Lake depth is an important characteristic for understanding many lake processes, yet it is unknown for the vast majority of lakes globally. Our objective was to develop a model that predicts lake depth using map-derived metrics of lake and terrestrial geomorphic features. Building on previous models that use local topography to predict lake depth, we hypothesized that regional differences in topography, lake shape, or sedimentation processes could lead to region-specific relationships between lake depth and the mapped features. We therefore used a mixed modeling approach that included region-specific model parameters. We built models using lake and map data from LAGOS, which includes 8164 lakes with maximum depth (Zmax) observations. The model was used to predict depth for all lakes ≥4 ha (n = 42 443) in the study extent. Lake surface area and maximum slope in a 100 m buffer were the best predictors of Zmax. Interactions between surface area and topography occurred at both the local and regional scale; surface area had a larger effect in steep terrain, so large lakes embedded in steep terrain were much deeper than those in flat terrain. Despite a large sample size and inclusion of regional variability, model performance (R2 = 0.29, RMSE = 7.1 m) was similar to other published models. The relative error varied by region, however, highlighting the importance of taking a regional approach to lake depth modeling. Additionally, we provide the largest known collection of observed and predicted lake depth values in the United States.more » « less
- 
            Abstract Declining oxygen concentrations in the deep waters of lakes worldwide pose a pressing environmental and societal challenge. Existing theory suggests that low deep‐water dissolved oxygen (DO) concentrations could trigger a positive feedback through which anoxia (i.e., very low DO) during a given summer begets increasingly severe occurrences of anoxia in following summers. Specifically, anoxic conditions can promote nutrient release from sediments, thereby stimulating phytoplankton growth, and subsequent phytoplankton decomposition can fuel heterotrophic respiration, resulting in increased spatial extent and duration of anoxia. However, while the individual relationships in this feedback are well established, to our knowledge, there has not been a systematic analysis within or across lakes that simultaneously demonstrates all of the mechanisms necessary to produce a positive feedback that reinforces anoxia. Here, we compiled data from 656 widespread temperate lakes and reservoirs to analyze the proposed anoxia begets anoxia feedback. Lakes in the dataset span a broad range of surface area (1–126,909 ha), maximum depth (6–370 m), and morphometry, with a median time‐series duration of 30 years at each lake. Using linear mixed models, we found support for each of the positive feedback relationships between anoxia, phosphorus concentrations, chlorophyllaconcentrations, and oxygen demand across the 656‐lake dataset. Likewise, we found further support for these relationships by analyzing time‐series data from individual lakes. Our results indicate that the strength of these feedback relationships may vary with lake‐specific characteristics: For example, we found that surface phosphorus concentrations were more positively associated with chlorophyllain high‐phosphorus lakes, and oxygen demand had a stronger influence on the extent of anoxia in deep lakes. Taken together, these results support the existence of a positive feedback that could magnify the effects of climate change and other anthropogenic pressures driving the development of anoxia in lakes around the world.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    