skip to main content

Title: ML-Based Streamflow Prediction in the Upper Colorado River Basin Using Climate Variables Time Series Data
Streamflow prediction plays a vital role in water resources planning in order to understand the dramatic change of climatic and hydrologic variables over different time scales. In this study, we used machine learning (ML)-based prediction models, including Random Forest Regression (RFR), Long Short-Term Memory (LSTM), Seasonal Auto- Regressive Integrated Moving Average (SARIMA), and Facebook Prophet (PROPHET) to predict 24 months ahead of natural streamflow at the Lees Ferry site located at the bottom part of the Upper Colorado River Basin (UCRB) of the US. Firstly, we used only historic streamflow data to predict 24 months ahead. Secondly, we considered meteorological components such as temperature and precipitation as additional features. We tested the models on a monthly test dataset spanning 6 years, where 24-month predictions were repeated 50 times to ensure the consistency of the results. Moreover, we performed a sensitivity analysis to identify our best-performing model. Later, we analyzed the effects of considering different span window sizes on the quality of predictions made by our best model. Finally, we applied our best-performing model, RFR, on two more rivers in different states in the UCRB to test the model’s generalizability. We evaluated the performance of the predictive models using multiple evaluation measures. The predictions in multivariate time-series models were found to be more accurate, with RMSE less than 0.84 mm per month, R-squared more than 0.8, and MAPE less than 0.25. Therefore, we conclude that the temperature and precipitation of the UCRB increases the accuracy of the predictions. Ultimately, we found that multivariate RFR performs the best among four models and is generalizable to other rivers in the UCRB.  more » « less
Award ID(s):
2305781 2153379 2204363
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Seasonal climate forecasts have socioeconomic value, and the quality of the forecasts is important to various societal applications. Here we evaluate seasonal forecasts of three climate variables, vapor pressure deficit (VPD), temperature, and precipitation, from operational dynamical models over the major cropland areas of South America; analyze their predictability from global and local circulation patterns, such as El Niño–Southern Oscillation (ENSO); and attribute the source of prediction errors. We show that the European Centre for Medium-Range Weather Forecasts (ECMWF) model has the highest quality among the models evaluated. Forecasts of VPD and temperature have better agreement with observations (average Pearson correlation of 0.65 and 0.70, respectively, among all months for 1-month-lead predictions from the ECMWF) than those of precipitation (0.40). Forecasts degrade with increasing lead times, and the degradation is due to the following reasons: 1) the failure of capturing local circulation patterns and capturing the linkages between the patterns and local climate; and 2) the overestimation of ENSO’s influence on regions not affected by ENSO. For regions affected by ENSO, forecasts of the three climate variables as well as their extremes are well predicted up to 6 months ahead, providing valuable lead time for risk preparedness and management. The results provide useful information for further development of dynamical models and for those who use seasonal climate forecasts for planning and management. Significance Statement Seasonal climate forecasts have socioeconomic value, and the quality of the forecasts is important to their applications. This study evaluated the quality of monthly forecasts of three important climate variables that are critical to agricultural management, risk assessment, and natural hazards warning. The findings provide useful information for those who use seasonal climate forecasts for planning and management. This study also analyzed the predictability of the climate variables and the attribution of prediction errors and thus provides insights for understanding models’ varying performance and for future improvement of seasonal climate forecasts from dynamical models. 
    more » « less
  2. null (Ed.)
    Precipitation occurs in two basic forms defined as liquid state and solid state. Different from rain-fed watershed, modeling snow processes is of vital importance in snow-dominated watersheds. The seasonal snowpack is a natural water reservoir, which stores snow water in winter and releases it in spring and summer. The warmer climate in recent decades has led to earlier snowmelt, a decline in snowpack, and change in the seasonality of river flows. The Soil and Water Assessment Tool (SWAT) could be applied in the snow-influenced watershed because of its ability to simultaneously predict the streamflow generated from rainfall and from the melting of snow. The choice of parameters, reference data, and calibration strategy could significantly affect the SWAT model calibration outcome and further affect the prediction accuracy. In this study, SWAT models are implemented in four upland watersheds in the Tulare Lake Basin (TLB) located across the Southern Sierra Nevada Mountains. Three calibration scenarios considering different calibration parameters and reference datasets are applied to investigate the impact of the Parallel Energy Balance Model (ParBal) snow reconstruction data and snow parameters on the streamflow and snow water-equivalent (SWE) prediction accuracy. In addition, the watershed parameters and lapse rate parameters-led equifinality is also evaluated. The results indicate that calibration of the SWAT model with respect to both streamflow and SWE reference data could improve the model SWE prediction reliability in general. Comparatively, the streamflow predictions are not significantly affected by differently lumped calibration schemes. The default snow parameter values capture the extreme high flows better than the other two calibration scenarios, whereas there is no remarkable difference among the three calibration schemes for capturing the extreme low flows. The watershed and lapse rate parameters-induced equifinality affects the flow prediction more (Nash-Sutcliffe Efficiency (NSE) varies between 0.2–0.3) than the SWE prediction (NSE varies less than 0.1). This study points out the remote-sensing-based SWE reconstruction product as a promising alternative choice for model calibration in ungauged snow-influenced watersheds. The streamflow-reconstructed SWE bi-objective calibrated model could improve the prediction reliability of surface water supply change for the downstream agricultural region under the changing climate. 
    more » « less
  3. Abstract

    Extreme weather, including heat waves, droughts, and high rainfall, is becoming more common and affecting a diversity of species and taxa. However, researchers lack a framework that can anticipate how diverse species will respond to weather extremes spanning weeks to months. Here we used high‐resolution occurrence data from eBird, a global citizen science initiative, and dynamic species distribution models to examine how 109 North American bird species ranging in migration distance, diet, body size, habitat preference, and prevalence (commonness) respond to extreme heat, drought, and rainfall across a wide range of temporal scales. Across species, temperature influenced species’ distributions more than precipitation at weekly and monthly scales, while precipitation was more important at seasonal scales. Phylogenetically controlled multivariate models revealed that migration distance was the most important factor mediating responses to extremely hot or dry weeks; residents and short‐distance migrants occurred less often following extreme heat. At monthly or seasonal scales, less common birds experienced decreases in occurrence following drought‐like conditions, while widespread species were unaffected. Spatial predictions demonstrated variation in responses to extreme weather across species’ ranges, with predicted decreases in occurrence up to 40% in parts of ranges. Our results highlight that extreme weather has variable and potentially strong implications for birds at different time scales, but these responses are mediated by life‐history characteristics. As weather once considered extreme occurs more frequently, researchers and managers require a better understanding of how diverse species respond to extreme conditions.

    more » « less
  4. Abstract

    Forecasting the El Niño-Southern Oscillation (ENSO) has been a subject of vigorous research due to the important role of the phenomenon in climate dynamics and its worldwide socioeconomic impacts. Over the past decades, numerous models for ENSO prediction have been developed, among which statistical models approximating ENSO evolution by linear dynamics have received significant attention owing to their simplicity and comparable forecast skill to first-principles models at short lead times. Yet, due to highly nonlinear and chaotic dynamics (particularly during ENSO initiation), such models have limited skill for longer-term forecasts beyond half a year. To resolve this limitation, here we employ a new nonparametric statistical approach based on analog forecasting, called kernel analog forecasting (KAF), which avoids assumptions on the underlying dynamics through the use of nonlinear kernel methods for machine learning and dimension reduction of high-dimensional datasets. Through a rigorous connection with Koopman operator theory for dynamical systems, KAF yields statistically optimal predictions of future ENSO states as conditional expectations, given noisy and potentially incomplete data at forecast initialization. Here, using industrial-era Indo-Pacific sea surface temperature (SST) as training data, the method is shown to successfully predict the Niño 3.4 index in a 1998–2017 verification period out to a 10-month lead, which corresponds to an increase of 3–8 months (depending on the decade) over a benchmark linear inverse model (LIM), while significantly improving upon the ENSO predictability “spring barrier”. In particular, KAF successfully predicts the historic 2015/16 El Niño at initialization times as early as June 2015, which is comparable to the skill of current dynamical models. An analysis of a 1300-yr control integration of a comprehensive climate model (CCSM4) further demonstrates that the enhanced predictability afforded by KAF holds over potentially much longer leads, extending to 24 months versus 18 months in the benchmark LIM. Probabilistic forecasts for the occurrence of El Niño/La Niña events are also performed and assessed via information-theoretic metrics, showing an improvement of skill over LIM approaches, thus opening an avenue for environmental risk assessment relevant in a variety of contexts.

    more » « less
  5. Abstract. We present a simple method that allows snow depth measurements tobe converted to snow water equivalent (SWE) estimates. These estimates areuseful to individuals interested in water resources, ecological function,and avalanche forecasting. They can also be assimilated into models to helpimprove predictions of total water volumes over large regions. Theconversion of depth to SWE is particularly valuable since snow depthmeasurements are far more numerous than costlier and more complex SWEmeasurements. Our model regresses SWE against snow depth (h), day of wateryear (DOY) and climatological (30-year normal) values for winter (December,January, February) precipitation (PPTWT), and the difference (TD) between meantemperature of the warmest month and mean temperature of the coldest month,producing a power-law relationship. Relying on climatological normals ratherthan weather data for a given year allows our model to be applied atmeasurement sites lacking a weather station. Separate equations are obtainedfor the accumulation and the ablation phases of the snowpack. The model isvalidated against a large database of snow pillow measurements and yields abias in SWE of less than 2 mm and a root-mean-squared error (RMSE) in SWE ofless than 60 mm. The model is additionally validated against two completelyindependent sets of data: one from western North America and one from thenortheastern United States. Finally, the results are compared with three othermodels for bulk density that have varying degrees of complexity and thatwere built in multiple geographic regions. The results show that the modeldescribed in this paper has the best performance for the validation datasets. 
    more » « less