skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: ML-Based Streamflow Prediction in the Upper Colorado River Basin Using Climate Variables Time Series Data
Streamflow prediction plays a vital role in water resources planning in order to understand the dramatic change of climatic and hydrologic variables over different time scales. In this study, we used machine learning (ML)-based prediction models, including Random Forest Regression (RFR), Long Short-Term Memory (LSTM), Seasonal Auto- Regressive Integrated Moving Average (SARIMA), and Facebook Prophet (PROPHET) to predict 24 months ahead of natural streamflow at the Lees Ferry site located at the bottom part of the Upper Colorado River Basin (UCRB) of the US. Firstly, we used only historic streamflow data to predict 24 months ahead. Secondly, we considered meteorological components such as temperature and precipitation as additional features. We tested the models on a monthly test dataset spanning 6 years, where 24-month predictions were repeated 50 times to ensure the consistency of the results. Moreover, we performed a sensitivity analysis to identify our best-performing model. Later, we analyzed the effects of considering different span window sizes on the quality of predictions made by our best model. Finally, we applied our best-performing model, RFR, on two more rivers in different states in the UCRB to test the model’s generalizability. We evaluated the performance of the predictive models using multiple evaluation measures. The predictions in multivariate time-series models were found to be more accurate, with RMSE less than 0.84 mm per month, R-squared more than 0.8, and MAPE less than 0.25. Therefore, we conclude that the temperature and precipitation of the UCRB increases the accuracy of the predictions. Ultimately, we found that multivariate RFR performs the best among four models and is generalizable to other rivers in the UCRB.  more » « less
Award ID(s):
2305781 2153379 2204363 2240022
PAR ID:
10404770
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Hydrology
Volume:
10
Issue:
2
ISSN:
2306-5338
Page Range / eLocation ID:
29
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Streamflow prediction is crucial for planning future developments and safety measures along river basins, especially in the face of changing climate patterns. In this study, we utilized monthly streamflow data from the United States Bureau of Reclamation and meteorological data (snow water equivalent, temperature, and precipitation) from the various weather monitoring stations of the Snow Telemetry Network within the Upper Colorado River Basin to forecast monthly streamflow at Lees Ferry, a specific location along the Colorado River in the basin. Four machine learning models—Random Forest Regression, Long short-term memory, Gated Recurrent Unit, and Seasonal AutoRegresive Integrated Moving Average—were trained using 30 years of monthly data (1991–2020), split into 80% for training (1991–2014) and 20% for testing (2015–2020). Initially, only historical streamflow data were used for predictions, followed by including meteorological factors to assess their impact on streamflow. Subsequently, sequence analysis was conducted to explore various input-output sequence window combinations. We then evaluated the influence of each factor on streamflow by testing all possible combinations to identify the optimal feature combination for prediction. Our results indicate that the Random Forest Regression model consistently outperformed others, especially after integrating all meteorological factors with historical streamflow data. The best performance was achieved with a 24-month look-back period to predict 12 months of streamflow, yielding a Root Mean Square Error of 2.25 and R-squared (R2) of 0.80. Finally, to assess model generalizability, we tested the best model at other locations—Greenwood Springs (Colorado River), Maybell (Yampa River), and Archuleta (San Juan) in the basin. 
    more » « less
  2. Can we predict the words a child is going to learn next given information about the words that a child knows now? Do different representations of a child’s vocabulary knowledge affect our ability to predict the acquisition of lexical items for individual children? Past research has often focused on population statistics of vocabulary growth rather than prediction of words an individual child is likely to learn next. We consider a neural network approach to predict vocabulary acquisition. Specifically, we investigate how best to represent the child’s current vocabulary in order to accurately predict future learning. The models we consider are based on qualitatively different sources of information: descriptive information about the child, the specific words a child knows, and representations that aim to capture the child’s aggregate lexical knowledge. Using longitudinal vocabulary data from children aged 15-36 months, we construct neural network models to predict which words are likely to be learned by a particular child in the coming month. Many models based on child-specific vocabulary information outperform models with child information only, suggesting that the words a child knows influence prediction of future language learning. These models provide an understanding of the role of current vocabulary knowledge on future lexical growth. 
    more » « less
  3. null (Ed.)
    Precipitation occurs in two basic forms defined as liquid state and solid state. Different from rain-fed watershed, modeling snow processes is of vital importance in snow-dominated watersheds. The seasonal snowpack is a natural water reservoir, which stores snow water in winter and releases it in spring and summer. The warmer climate in recent decades has led to earlier snowmelt, a decline in snowpack, and change in the seasonality of river flows. The Soil and Water Assessment Tool (SWAT) could be applied in the snow-influenced watershed because of its ability to simultaneously predict the streamflow generated from rainfall and from the melting of snow. The choice of parameters, reference data, and calibration strategy could significantly affect the SWAT model calibration outcome and further affect the prediction accuracy. In this study, SWAT models are implemented in four upland watersheds in the Tulare Lake Basin (TLB) located across the Southern Sierra Nevada Mountains. Three calibration scenarios considering different calibration parameters and reference datasets are applied to investigate the impact of the Parallel Energy Balance Model (ParBal) snow reconstruction data and snow parameters on the streamflow and snow water-equivalent (SWE) prediction accuracy. In addition, the watershed parameters and lapse rate parameters-led equifinality is also evaluated. The results indicate that calibration of the SWAT model with respect to both streamflow and SWE reference data could improve the model SWE prediction reliability in general. Comparatively, the streamflow predictions are not significantly affected by differently lumped calibration schemes. The default snow parameter values capture the extreme high flows better than the other two calibration scenarios, whereas there is no remarkable difference among the three calibration schemes for capturing the extreme low flows. The watershed and lapse rate parameters-induced equifinality affects the flow prediction more (Nash-Sutcliffe Efficiency (NSE) varies between 0.2–0.3) than the SWE prediction (NSE varies less than 0.1). This study points out the remote-sensing-based SWE reconstruction product as a promising alternative choice for model calibration in ungauged snow-influenced watersheds. The streamflow-reconstructed SWE bi-objective calibrated model could improve the prediction reliability of surface water supply change for the downstream agricultural region under the changing climate. 
    more » « less
  4. Abstract Seasonal climate forecasts have socioeconomic value, and the quality of the forecasts is important to various societal applications. Here we evaluate seasonal forecasts of three climate variables, vapor pressure deficit (VPD), temperature, and precipitation, from operational dynamical models over the major cropland areas of South America; analyze their predictability from global and local circulation patterns, such as El Niño–Southern Oscillation (ENSO); and attribute the source of prediction errors. We show that the European Centre for Medium-Range Weather Forecasts (ECMWF) model has the highest quality among the models evaluated. Forecasts of VPD and temperature have better agreement with observations (average Pearson correlation of 0.65 and 0.70, respectively, among all months for 1-month-lead predictions from the ECMWF) than those of precipitation (0.40). Forecasts degrade with increasing lead times, and the degradation is due to the following reasons: 1) the failure of capturing local circulation patterns and capturing the linkages between the patterns and local climate; and 2) the overestimation of ENSO’s influence on regions not affected by ENSO. For regions affected by ENSO, forecasts of the three climate variables as well as their extremes are well predicted up to 6 months ahead, providing valuable lead time for risk preparedness and management. The results provide useful information for further development of dynamical models and for those who use seasonal climate forecasts for planning and management. Significance Statement Seasonal climate forecasts have socioeconomic value, and the quality of the forecasts is important to their applications. This study evaluated the quality of monthly forecasts of three important climate variables that are critical to agricultural management, risk assessment, and natural hazards warning. The findings provide useful information for those who use seasonal climate forecasts for planning and management. This study also analyzed the predictability of the climate variables and the attribution of prediction errors and thus provides insights for understanding models’ varying performance and for future improvement of seasonal climate forecasts from dynamical models. 
    more » « less
  5. Abstract Streamflow forecasting at a subseasonal time scale (10–30 days into the future) is important for various human activities. The ensemble streamflow prediction (ESP) is a widely applied technique for subseasonal streamflow forecasting. However, ESP’s reliance on the randomly resampled historical precipitation limits its predictive capability. Available dynamical subseasonal precipitation forecasts provide an alternative to the randomly resampled precipitation in ESP. Prior studies found the predictive performance of raw subseasonal precipitation forecast is limited in many regions such as the central south of the United States, which raises questions about its effectiveness in assisting streamflow forecasting. To further assess the hydrologic applicability of dynamical subseasonal precipitation forecasts, we test the subseasonal precipitation forecast from North America Multi-Model Ensemble Phase II (NMME-2) at four watersheds in the central south region of the United States. The subseasonal precipitation forecasts are postprocessed with bias correction and spatial disaggregation (BCSD) to correct bias and improve spatial resolution before replacing the randomly resampled precipitation in ESP for streamflow predictions. The performance of the resulting streamflow predictions is benchmarked with ESP. Evaluation is conducted using Kling–Gupta Efficiency (KGE), continuous ranked probability score (CRPS), probability of detection (POD), false alarm ratios (FARs), as well as reliability diagrams. Our results suggest that BCSD-corrected subseasonal precipitation forecasts lead to overall improved streamflow predictions due to added skills in winter and spring. Our results also suggest that BCSD-corrected subseasonal precipitation forecasts lead to improved predictions on the occurrence of high-percentile streamflow values above 75%. Overall, BCSD-corrected subseasonal precipitation has shown promising performance, highlighting its potential broader applications for river and flood forecasting. 
    more » « less