Streamflow prediction is vital for effective water resource management, enabling a better understanding of hydrological variability and its response to environmental factors. This study presents a spatio-temporal graph neural network (STGNN) model for streamflow prediction in the Upper Colorado River Basin (UCRB), integrating graph convolutional networks (GCNs) to model spatial connectivity and long short-term memory (LSTM) networks to capture temporal dynamics. Using 30 years of monthly streamflow data from 20 monitoring stations, the STGNN predicted streamflow over a 36-month horizon and was evaluated against traditional models, including random forest regression (RFR), LSTM, gated recurrent units (GRU), and seasonal auto-regressive integrated moving average (SARIMA). The STGNN outperformed these models across multiple metrics, achieving an R2 of 0.78, an RMSE of 0.81 mm/month, and a KGE of 0.79 at critical locations like Lees Ferry. A sequential analysis of input–output configurations identified the (36, 36) setup as optimal for balancing historical context and forecasting accuracy. Additionally, the STGNN showed strong generalizability when applied to other locations within the UCRB. These results underscore the importance of integrating spatial dependencies and temporal dynamics in hydrological forecasting, offering a scalable and adaptable framework to improve predictive accuracy and support adaptive water resource management in river basins.
more »
« less
Enhancing Monthly Streamflow Prediction Using Meteorological Factors and Machine Learning Models in the Upper Colorado River Basin
Streamflow prediction is crucial for planning future developments and safety measures along river basins, especially in the face of changing climate patterns. In this study, we utilized monthly streamflow data from the United States Bureau of Reclamation and meteorological data (snow water equivalent, temperature, and precipitation) from the various weather monitoring stations of the Snow Telemetry Network within the Upper Colorado River Basin to forecast monthly streamflow at Lees Ferry, a specific location along the Colorado River in the basin. Four machine learning models—Random Forest Regression, Long short-term memory, Gated Recurrent Unit, and Seasonal AutoRegresive Integrated Moving Average—were trained using 30 years of monthly data (1991–2020), split into 80% for training (1991–2014) and 20% for testing (2015–2020). Initially, only historical streamflow data were used for predictions, followed by including meteorological factors to assess their impact on streamflow. Subsequently, sequence analysis was conducted to explore various input-output sequence window combinations. We then evaluated the influence of each factor on streamflow by testing all possible combinations to identify the optimal feature combination for prediction. Our results indicate that the Random Forest Regression model consistently outperformed others, especially after integrating all meteorological factors with historical streamflow data. The best performance was achieved with a 24-month look-back period to predict 12 months of streamflow, yielding a Root Mean Square Error of 2.25 and R-squared (R2) of 0.80. Finally, to assess model generalizability, we tested the best model at other locations—Greenwood Springs (Colorado River), Maybell (Yampa River), and Archuleta (San Juan) in the basin.
more »
« less
- PAR ID:
- 10509313
- Publisher / Repository:
- MDPI
- Date Published:
- Journal Name:
- Hydrology
- Volume:
- 11
- Issue:
- 5
- ISSN:
- 2306-5338
- Page Range / eLocation ID:
- 66
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Streamflow prediction plays a vital role in water resources planning in order to understand the dramatic change of climatic and hydrologic variables over different time scales. In this study, we used machine learning (ML)-based prediction models, including Random Forest Regression (RFR), Long Short-Term Memory (LSTM), Seasonal Auto- Regressive Integrated Moving Average (SARIMA), and Facebook Prophet (PROPHET) to predict 24 months ahead of natural streamflow at the Lees Ferry site located at the bottom part of the Upper Colorado River Basin (UCRB) of the US. Firstly, we used only historic streamflow data to predict 24 months ahead. Secondly, we considered meteorological components such as temperature and precipitation as additional features. We tested the models on a monthly test dataset spanning 6 years, where 24-month predictions were repeated 50 times to ensure the consistency of the results. Moreover, we performed a sensitivity analysis to identify our best-performing model. Later, we analyzed the effects of considering different span window sizes on the quality of predictions made by our best model. Finally, we applied our best-performing model, RFR, on two more rivers in different states in the UCRB to test the model’s generalizability. We evaluated the performance of the predictive models using multiple evaluation measures. The predictions in multivariate time-series models were found to be more accurate, with RMSE less than 0.84 mm per month, R-squared more than 0.8, and MAPE less than 0.25. Therefore, we conclude that the temperature and precipitation of the UCRB increases the accuracy of the predictions. Ultimately, we found that multivariate RFR performs the best among four models and is generalizable to other rivers in the UCRB.more » « less
-
Streamflow forecasting in snowmelt-dominated basins is essential for water resource planning, flood mitigation, and ecological sustainability. This study presents a comparative evaluation of statistical, machine learning (Random Forest), and deep learning models (Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Spatio-Temporal Graph Neural Network (STGNN)) using 30 years of data from 20 monitoring stations across the Upper Colorado River Basin (UCRB). We assess the impact of integrating meteorological variables—particularly, the Snow Water Equivalent (SWE)—and spatial dependencies on predictive performance. Among all models, the Spatio-Temporal Graph Neural Network (STGNN) achieved the highest accuracy, with a Nash–Sutcliffe Efficiency (NSE) of 0.84 and Kling–Gupta Efficiency (KGE) of 0.84 in the multivariate setting at the critical downstream node, Lees Ferry. Compared to the univariate setup, SWE-enhanced predictions reduced Root Mean Square Error (RMSE) by 12.8%. Seasonal and spatial analyses showed the greatest improvements at high-elevation and mid-network stations, where snowmelt dynamics dominate runoff. These findings demonstrate that spatio-temporal learning frameworks, especially STGNNs, provide a scalable and physically consistent approach to streamflow forecasting under variable climatic conditions.more » « less
-
Thenkabail, Prasad S. (Ed.)Physically based hydrologic models require significant effort and extensive information for development, calibration, and validation. The study explored the use of the random forest regression (RFR), a supervised machine learning (ML) model, as an alternative to the physically based Soil and Water Assessment Tool (SWAT) for predicting streamflow in the Rio Grande Headwaters near Del Norte, a snowmelt-dominated mountainous watershed of the Upper Rio Grande Basin. Remotely sensed data were used for the random forest machine learning analysis (RFML) and RStudio for data processing and synthesizing. The RFML model outperformed the SWAT model in accuracy and demonstrated its capability in predicting streamflow in this region. We implemented a customized approach to the RFR model to assess the model’s performance for three training periods, across 1991–2010, 1996–2010, and 2001–2010; the results indicated that the model’s accuracy improved with longer training periods, implying that the model trained on a more extended period is better able to capture the parameters’ variability and reproduce streamflow data more accurately. The variable importance (i.e., IncNodePurity) measure of the RFML model revealed that the snow depth and the minimum temperature were consistently the top two predictors across all training periods. The paper also evaluated how well the SWAT model performs in reproducing streamflow data of the watershed with a conventional approach. The SWAT model needed more time and data to set up and calibrate, delivering acceptable performance in annual mean streamflow simulation, with satisfactory index of agreement (d), coefficient of determination (R2), and percent bias (PBIAS) values, but monthly simulation warrants further exploration and model adjustments. The study recommends exploring snowmelt runoff hydrologic processes, dust-driven sublimation effects, and more detailed topographic input parameters to update the SWAT snowmelt routine for better monthly flow estimation. The results provide a critical analysis for enhancing streamflow prediction, which is valuable for further research and water resource management, including snowmelt-driven semi-arid regions.more » « less
-
Abstract Snowpack provides the majority of predictive information for water supply forecasts (WSFs) in snow-dominated basins across the western United States. Drought conditions typically accompany decreased snowpack and lowered runoff efficiency, negatively impacting WSFs. Here, we investigate the relationship between snow water equivalent (SWE) and April–July streamflow volume (AMJJ-V) during drought in small headwater catchments, using observations from 31 USGS streamflow gauges and 54 SNOTEL stations. A linear regression approach is used to evaluate forecast skill under different historical climatologies used for model fitting, as well as with different forecast dates. Experiments are constructed in which extreme hydrological drought years are withheld from model training, that is, years with AMJJ-V below the 15th percentile. Subsets of the remaining years are used for model fitting to understand how the climatology of different training subsets impacts forecasts of extreme drought years. We generally report overprediction in drought years. However, training the forecast model on drier years, that is, below-median years ( P 15 , P 57.5 ], minimizes residuals by an average of 10% in drought year forecasts, relative to a baseline case, with the highest median skill obtained in mid- to late April for colder regions. We report similar findings using a modified National Resources Conservation Service (NRCS) procedure in nine large Upper Colorado River basin (UCRB) basins, highlighting the importance of the snowpack–streamflow relationship in streamflow predictability. We propose an “adaptive sampling” approach of dynamically selecting training years based on antecedent SWE conditions, showing error reductions of up to 20% in historical drought years relative to the period of record. These alternate training protocols provide opportunities for addressing the challenges of future drought risk to water supply planning. Significance Statement Seasonal water supply forecasts based on the relationship between peak snowpack and water supply exhibit unique errors in drought years due to low snow and streamflow variability, presenting a major challenge for water supply prediction. Here, we assess the reliability of snow-based streamflow predictability in drought years using a fixed forecast date or fixed model training period. We critically evaluate different training protocols that evaluate predictive performance and identify sources of error during historical drought years. We also propose and test an “adaptive sampling” application that dynamically selects training years based on antecedent SWE conditions providing to overcome persistent errors and provide new insights and strategies for snow-guided forecasts.more » « less
An official website of the United States government

