skip to main content


This content will become publicly available on May 1, 2025

Title: Enhancing Monthly Streamflow Prediction Using Meteorological Factors and Machine Learning Models in the Upper Colorado River Basin

Streamflow prediction is crucial for planning future developments and safety measures along river basins, especially in the face of changing climate patterns. In this study, we utilized monthly streamflow data from the United States Bureau of Reclamation and meteorological data (snow water equivalent, temperature, and precipitation) from the various weather monitoring stations of the Snow Telemetry Network within the Upper Colorado River Basin to forecast monthly streamflow at Lees Ferry, a specific location along the Colorado River in the basin. Four machine learning models—Random Forest Regression, Long short-term memory, Gated Recurrent Unit, and Seasonal AutoRegresive Integrated Moving Average—were trained using 30 years of monthly data (1991–2020), split into 80% for training (1991–2014) and 20% for testing (2015–2020). Initially, only historical streamflow data were used for predictions, followed by including meteorological factors to assess their impact on streamflow. Subsequently, sequence analysis was conducted to explore various input-output sequence window combinations. We then evaluated the influence of each factor on streamflow by testing all possible combinations to identify the optimal feature combination for prediction. Our results indicate that the Random Forest Regression model consistently outperformed others, especially after integrating all meteorological factors with historical streamflow data. The best performance was achieved with a 24-month look-back period to predict 12 months of streamflow, yielding a Root Mean Square Error of 2.25 and R-squared (R2) of 0.80. Finally, to assess model generalizability, we tested the best model at other locations—Greenwood Springs (Colorado River), Maybell (Yampa River), and Archuleta (San Juan) in the basin.

 
more » « less
Award ID(s):
2305781 2153379 2204363 2240022
PAR ID:
10509313
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
MDPI
Date Published:
Journal Name:
Hydrology
Volume:
11
Issue:
5
ISSN:
2306-5338
Page Range / eLocation ID:
66
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Streamflow prediction plays a vital role in water resources planning in order to understand the dramatic change of climatic and hydrologic variables over different time scales. In this study, we used machine learning (ML)-based prediction models, including Random Forest Regression (RFR), Long Short-Term Memory (LSTM), Seasonal Auto- Regressive Integrated Moving Average (SARIMA), and Facebook Prophet (PROPHET) to predict 24 months ahead of natural streamflow at the Lees Ferry site located at the bottom part of the Upper Colorado River Basin (UCRB) of the US. Firstly, we used only historic streamflow data to predict 24 months ahead. Secondly, we considered meteorological components such as temperature and precipitation as additional features. We tested the models on a monthly test dataset spanning 6 years, where 24-month predictions were repeated 50 times to ensure the consistency of the results. Moreover, we performed a sensitivity analysis to identify our best-performing model. Later, we analyzed the effects of considering different span window sizes on the quality of predictions made by our best model. Finally, we applied our best-performing model, RFR, on two more rivers in different states in the UCRB to test the model’s generalizability. We evaluated the performance of the predictive models using multiple evaluation measures. The predictions in multivariate time-series models were found to be more accurate, with RMSE less than 0.84 mm per month, R-squared more than 0.8, and MAPE less than 0.25. Therefore, we conclude that the temperature and precipitation of the UCRB increases the accuracy of the predictions. Ultimately, we found that multivariate RFR performs the best among four models and is generalizable to other rivers in the UCRB. 
    more » « less
  2. Thenkabail, Prasad S. (Ed.)

    Physically based hydrologic models require significant effort and extensive information for development, calibration, and validation. The study explored the use of the random forest regression (RFR), a supervised machine learning (ML) model, as an alternative to the physically based Soil and Water Assessment Tool (SWAT) for predicting streamflow in the Rio Grande Headwaters near Del Norte, a snowmelt-dominated mountainous watershed of the Upper Rio Grande Basin. Remotely sensed data were used for the random forest machine learning analysis (RFML) and RStudio for data processing and synthesizing. The RFML model outperformed the SWAT model in accuracy and demonstrated its capability in predicting streamflow in this region. We implemented a customized approach to the RFR model to assess the model’s performance for three training periods, across 1991–2010, 1996–2010, and 2001–2010; the results indicated that the model’s accuracy improved with longer training periods, implying that the model trained on a more extended period is better able to capture the parameters’ variability and reproduce streamflow data more accurately. The variable importance (i.e., IncNodePurity) measure of the RFML model revealed that the snow depth and the minimum temperature were consistently the top two predictors across all training periods. The paper also evaluated how well the SWAT model performs in reproducing streamflow data of the watershed with a conventional approach. The SWAT model needed more time and data to set up and calibrate, delivering acceptable performance in annual mean streamflow simulation, with satisfactory index of agreement (d), coefficient of determination (R2), and percent bias (PBIAS) values, but monthly simulation warrants further exploration and model adjustments. The study recommends exploring snowmelt runoff hydrologic processes, dust-driven sublimation effects, and more detailed topographic input parameters to update the SWAT snowmelt routine for better monthly flow estimation. The results provide a critical analysis for enhancing streamflow prediction, which is valuable for further research and water resource management, including snowmelt-driven semi-arid regions.

     
    more » « less
  3. Abstract

    In the Colorado River Basin (CRB), ensemble streamflow prediction (ESP) forecasts drive operational planning models that project future reservoir system conditions. CRB operational seasonal streamflow forecasts are produced using ESP, which represents climate using an ensemble of meteorological sequences of historical temperature and precipitation, but do not typically leverage additional real‐time subseasonal‐to‐seasonal climate forecasts. Any improvements to streamflow forecasts would help stakeholders who depend on operational projections for decision making. We explore incorporating climate forecasts into ESP through variations on an ESP trace weighting approach, focusing on Colorado River unregulated inflows forecasts to Lake Powell. The k‐nearest neighbors (kNN) technique is employed using North American Multi‐Model Ensemble one‐ and three‐month temperature and precipitation forecasts, and preceding three‐month historical streamflow, as weighting factors. The benefit of disaggregated climate forecast information is assessed through the comparison of two kNN weighting strategies; a basin‐wide kNN uses the same ESP weights over the entire basin, and a disaggregated‐basin kNN applies ESP weights separately to four subbasins. We find in general that climate‐informed forecasts add greater marginal skill in late winter and early spring, and that more spatially granular disaggregated‐basin use of climate forecasts slightly improves skill over the basin‐wide method at most lead times.

     
    more » « less
  4. RCMs produced at ~0.5° (available in the NA-CORDEX database esgf-node.ipsl.upmc.fr/search/cordex-ipsl/) address issues related to coarse resolution of GCMs (produced at 2° to 4°). Nevertheless, due to systematic and random model errors, bias correction is needed for regional study applications. However, an acceptable threshold for magnitude of bias correction that will not affect future RCM projection behavior is unknown. The goal of this study is to evaluate the implications of a bias correction technique (distribution mapping) for four GCM-RCM combinations for simulating regional precipitation and, subsequently, streamflow, surface runoff, and water yield when integrated into Soil and Water Assessment Tool (SWAT) applications for the Des Moines River basin (31,893 km²) in Iowa-Minnesota, U.S. The climate projections tested in this study are an ensemble of 2 GCMs (MPI-ESM-MR and GFDL-ESM2M) and 2 RCMs (WRF and RegCM4) for historical (1981-2005) and future (2030-2050) projections in the NA-CORDEX CMIP5 archive. The PRISM dataset was used for bias correction of GCM-RCM historical precipitation and for SWAT baseline simulations. We found bias correction improves historical total annual volumes for precipitation, seasonality, spatial distribution and mean error for all GCM-RCM combinations. However, improvement of correlation coefficient occurred only for the RegCM4 simulations. Monthly precipitation was overestimated for all raw models from January to April, and WRF overestimated monthly precipitation from January to August. The bias correction method improved monthly average precipitation for all four GCM-RCM combinations. The ability to detect occurrence of precipitation events was slightly better for the raw models, especially for the GCM-WRF combinations. Simulated historical streamflow was compared across 26 monitoring stations: Historical GCM-RCM outputs were unable to replicate PRISM KGE statistical results (KGE>0.5). However, the Pbias streamflow results matched the PRISM simulation for all bias-corrected models and for the raw GFDL-RegCM4 combination. For future scenarios there was no change in the annual trend, except for raw WRF models that estimated an increase of about 35% in annual precipitation. Seasonal variability remained the same, indicating wetter summers and drier winters. However, most models predicted an increase in monthly precipitation from January to March, and a reduction in June and July (except for raw WRF models). The impact on hydrological simulations based on future projected conditions was observed for surface runoff and water yield. Both variables were characterized by monthly volume overestimation; the raw WRF models predicted up to three times greater volume compared to the historical run. RegCM4 projected increased surface runoff and water yield for winter and spring by two times, and a slight volume reduction in summer and autumn. Meanwhile, the bias-corrected models showed changes in prediction signals: In some cases, raw models projected an increase in surface runoff and water yield but the bias-corrected models projected a reduction of these variables. These findings underscore the need for more extended research on bias correction and transposition between historical and future data. 
    more » « less
  5. Abstract

    Snowpack provides the majority of predictive information for water supply forecasts (WSFs) in snow-dominated basins across the western United States. Drought conditions typically accompany decreased snowpack and lowered runoff efficiency, negatively impacting WSFs. Here, we investigate the relationship between snow water equivalent (SWE) and April–July streamflow volume (AMJJ-V) during drought in small headwater catchments, using observations from 31 USGS streamflow gauges and 54 SNOTEL stations. A linear regression approach is used to evaluate forecast skill under different historical climatologies used for model fitting, as well as with different forecast dates. Experiments are constructed in which extreme hydrological drought years are withheld from model training, that is, years with AMJJ-V below the 15th percentile. Subsets of the remaining years are used for model fitting to understand how the climatology of different training subsets impacts forecasts of extreme drought years. We generally report overprediction in drought years. However, training the forecast model on drier years, that is, below-median years (P15,P57.5], minimizes residuals by an average of 10% in drought year forecasts, relative to a baseline case, with the highest median skill obtained in mid- to late April for colder regions. We report similar findings using a modified National Resources Conservation Service (NRCS) procedure in nine large Upper Colorado River basin (UCRB) basins, highlighting the importance of the snowpack–streamflow relationship in streamflow predictability. We propose an “adaptive sampling” approach of dynamically selecting training years based on antecedent SWE conditions, showing error reductions of up to 20% in historical drought years relative to the period of record. These alternate training protocols provide opportunities for addressing the challenges of future drought risk to water supply planning.

    Significance Statement

    Seasonal water supply forecasts based on the relationship between peak snowpack and water supply exhibit unique errors in drought years due to low snow and streamflow variability, presenting a major challenge for water supply prediction. Here, we assess the reliability of snow-based streamflow predictability in drought years using a fixed forecast date or fixed model training period. We critically evaluate different training protocols that evaluate predictive performance and identify sources of error during historical drought years. We also propose and test an “adaptive sampling” application that dynamically selects training years based on antecedent SWE conditions providing to overcome persistent errors and provide new insights and strategies for snow-guided forecasts.

     
    more » « less