skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on November 1, 2026

Title: Advancing Machine Learning-Based Streamflow Prediction Through Event Greedy Selection, Asymmetric Loss Function, and Rainfall Forecasting Uncertainty
This paper advances machine learning (ML)-based streamflow prediction by strategically selecting rainfall events, introducing a new loss function, and addressing rainfall forecast uncertainties. Focusing on the Iowa River Basin, we applied the stochastic storm transposition (SST) method to create realistic rainfall events, which were input into a hydrological model to generate corresponding streamflow data for training and testing deterministic and probabilistic ML models. Long short-term memory (LSTM) networks were employed to predict streamflow up to 12 h ahead. An active learning approach was used to identify the most informative rainfall events, reducing data generation effort. Additionally, we introduced a novel asymmetric peak loss function to improve peak streamflow prediction accuracy. Incorporating rainfall forecast uncertainties, our probabilistic LSTM model provided uncertainty quantification for streamflow predictions. Performance evaluation using different metrics improved the accuracy and reliability of our models. These contributions enhance flood forecasting and decision-making while significantly reducing computational time and costs.  more » « less
Award ID(s):
2226936
PAR ID:
10646466
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
MDPI
Date Published:
Journal Name:
Applied Sciences
Volume:
15
Issue:
21
ISSN:
2076-3417
Page Range / eLocation ID:
11656
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Applications of process‐based models (PBM) for predictions are confounded by multiple uncertainties and computational burdens, resulting in appreciable errors. A novel modeling framework combining a high‐fidelity PBM with surrogate and machine learning (ML) models is developed to tackle these challenges and applied for streamflow prediction. A surrogate model permits high computational efficiency of a PBM solution at a minimum loss of its accuracy. A novel probabilistic ML model partitions the PBM‐surrogate prediction errors into reducible and irreducible types, quantifying their distributions that arise due to both explicitly perceived uncertainties (such as parametric) or those that are entirely hidden to the modeler (not included or unexpected). Using this approach, we demonstrate a substantial improvement of streamflow predictive accuracy for a case study urbanized watershed. Such a framework provides an efficient solution combining the strengths of high‐fidelity and physics‐agnostic models for a wide range of prediction problems in geosciences. 
    more » « less
  2. Abstract The Ensemble Streamflow Prediction (ESP) framework combines a probabilistic forecast structure with process‐based models for water supply predictions. However, process‐based models require computationally intensive parameter estimation, increasing uncertainties and limiting usability. Motivated by the strong performance of deep learning models, we seek to assess whether the Long Short‐Term Memory (LSTM) model can provide skillful forecasts and replace process‐based models within the ESP framework. Given challenges inimplicitlycapturing snowpack dynamics within LSTMs for streamflow prediction, we also evaluated the added skill ofexplicitlyincorporating snowpack information to improve hydrologic memory representation. LSTM‐ESPs were evaluated under four different scenarios: one excluding snow and three including snow with varied snowpack representations. The LSTM models were trained using information from 664 GAGES‐II basins during WY1983–2000. During a testing period, WY2001–2010, 80% of basins exhibited Nash‐Sutcliffe Efficiency (NSE) above 0.5 with a median NSE of around 0.70, indicating satisfactory utility in simulating seasonal water supply. LSTM‐ESP forecasts were then tested during WY2011–2020 over 76 western US basins with operational Natural Resources Conservation Services (NRCS) forecasts. A key finding is that in high snow regions, LSTM‐ESP forecasts using simplified ablation assumptions performed worse than those excluding snow, highlighting that snow data do not consistently improve LSTM‐ESP performance. However, LSTM‐ESP forecasts that explicitly incorporated past years' snow accumulation and ablation performed comparably to NRCS forecasts and better than forecasts excluding snow entirely. Overall, integrating deep learning within an ESP framework shows promise and highlights important considerations for including snowpack information in forecasting. 
    more » « less
  3. Demeniconi, Carlotta; Davidson, Ian (Ed.)
    This paper proposes a physics-guided machine learning approach that combines machine learning models and physics-based models to improve the prediction of water flow and temperature in river networks. We first build a recurrent graph network model to capture the interactions among multiple segments in the river network. Then we transfer knowledge from physics-based models to guide the learning of the machine learning model. We also propose a new loss function that balances the performance over different river segments. We demonstrate the effectiveness of the proposed method in predicting temperature and streamflow in a subset of the Delaware River Basin. In particular, the proposed method has brought a 33%/14% accuracy improvement over the state-of-the-art physics-based model and 24%/14% over traditional machine learning models (e.g., LSTM) in temperature/streamflow prediction using very sparse (0.1%) training data. The proposed method has also been shown to produce better performance when generalized to different seasons or river segments with different streamflow ranges. 
    more » « less
  4. Many coastal cities are facing frequent flooding from storm events that are made worse by sea level rise and climate change. The groundwater table level in these low relief coastal cities is an important, but often overlooked, factor in the recurrent flooding these locations face. Infiltration of stormwater and water intrusion due to tidal forcing can cause already shallow groundwater tables to quickly rise toward the land surface. This decreases available storage which increases runoff, stormwater system loads, and flooding. Groundwater table forecasts, which could help inform the modeling and management of coastal flooding, are generally unavailable. This study explores two machine learning models, Long Short-term Memory (LSTM) networks and Recurrent Neural Networks (RNN), to model and forecast groundwater table response to storm events in the flood prone coastal city of Norfolk, Virginia. To determine the effect of training data type on model accuracy, two types of datasets (i) the continuous time series and (ii) a dataset of only storm events, created from observed groundwater table, rainfall, and sea level data from 2010–2018 are used to train and test the models. Additionally, a real-time groundwater table forecasting scenario was carried out to compare the models’ abilities to predict groundwater table levels given forecast rainfall and sea level as input data. When modeling the groundwater table with observed data, LSTM networks were found to have more predictive skill than RNNs (root mean squared error (RMSE) of 0.09 m versus 0.14 m, respectively). The real-time forecast scenario showed that models trained only on storm event data outperformed models trained on the continuous time series data (RMSE of 0.07 m versus 0.66 m, respectively) and that LSTM outperformed RNN models. Because models trained with the continuous time series data had much higher RMSE values, they were not suitable for predicting the groundwater table in the real-time scenario when using forecast input data. These results demonstrate the first use of LSTM networks to create hourly forecasts of groundwater table in a coastal city and show they are well suited for creating operational forecasts in real-time. As groundwater table levels increase due to sea level rise, forecasts of groundwater table will become an increasingly valuable part of coastal flood modeling and management. 
    more » « less
  5. Abstract Machine learning (ML) has been applied to space weather problems with increasing frequency in recent years, driven by an influx of in-situ measurements and a desire to improve modeling and forecasting capabilities throughout the field. Space weather originates from solar perturbations and is comprised of the resulting complex variations they cause within the numerous systems between the Sun and Earth. These systems are often tightly coupled and not well understood. This creates a need for skillful models with knowledge about the confidence of their predictions. One example of such a dynamical system highly impacted by space weather is the thermosphere, the neutral region of Earth’s upper atmosphere. Our inability to forecast it has severe repercussions in the context of satellite drag and computation of probability of collision between two space objects in low Earth orbit (LEO) for decision making in space operations. Even with (assumed) perfect forecast of model drivers, our incomplete knowledge of the system results in often inaccurate thermospheric neutral mass density predictions. Continuing efforts are being made to improve model accuracy, but density models rarely provide estimates of confidence in predictions. In this work, we propose two techniques to develop nonlinear ML regression models to predict thermospheric density while providing robust and reliable uncertainty estimates: Monte Carlo (MC) dropout and direct prediction of the probability distribution, both using the negative logarithm of predictive density (NLPD) loss function. We show the performance capabilities for models trained on both local and global datasets. We show that the NLPD loss provides similar results for both techniques but the direct probability distribution prediction method has a much lower computational cost. For the global model regressed on the Space Environment Technologies High Accuracy Satellite Drag Model (HASDM) density database, we achieve errors of approximately 11% on independent test data with well-calibrated uncertainty estimates. Using an in-situ CHAllenging Minisatellite Payload (CHAMP) density dataset, models developed using both techniques provide test error on the order of 13%. The CHAMP models—on validation and test data—are within 2% of perfect calibration for the twenty prediction intervals tested. We show that this model can also be used to obtain global density predictions with uncertainties at a given epoch. 
    more » « less