skip to main content


Title: Evaluation of random forests for short-term daily streamflow forecasting in rainfall- and snowmelt-driven watersheds
Abstract. In the past decades, data-driven machine-learning (ML) models have emerged as promising tools for short-term streamflow forecasting. Among other qualities, the popularity of ML models for such applications is due to their relative ease in implementation, less strict distributional assumption, and competitive computational and predictive performance. Despite the encouraging results, most applications of ML for streamflow forecasting have been limited to watersheds in which rainfall is the major source of runoff. In this study, we evaluate the potential of random forests (RFs), a popular ML method, to make streamflow forecasts at 1 d of lead time at 86 watersheds in the Pacific Northwest. These watersheds cover diverse climatic conditions and physiographic settings and exhibit varied contributions of rainfall and snowmelt to their streamflow. Watersheds are classified into three hydrologic regimes based on the timing of center-of-annual flow volume: rainfall-dominated, transient, and snowmelt-dominated. RF performance is benchmarked against naïve and multiple linear regression (MLR) models and evaluated using four criteria: coefficient of determination, root mean squared error, mean absolute error, and Kling–Gupta efficiency (KGE). Model evaluation scores suggest that the RF performs better in snowmelt-driven watersheds compared to rainfall-driven watersheds. The largest improvements in forecasts compared to benchmark models are found among rainfall-driven watersheds. RF performance deteriorates with increases in catchment slope and soil sandiness. We note disagreement between two popular measures of RF variable importance and recommend jointly considering these measures with the physical processes under study. These and other results presented provide new insights for effective application of RF-based streamflow forecasting.  more » « less
Award ID(s):
2006633 1827093
NSF-PAR ID:
10276435
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Hydrology and Earth System Sciences
Volume:
25
Issue:
6
ISSN:
1607-7938
Page Range / eLocation ID:
2997 to 3015
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Thenkabail, Prasad S. (Ed.)

    Physically based hydrologic models require significant effort and extensive information for development, calibration, and validation. The study explored the use of the random forest regression (RFR), a supervised machine learning (ML) model, as an alternative to the physically based Soil and Water Assessment Tool (SWAT) for predicting streamflow in the Rio Grande Headwaters near Del Norte, a snowmelt-dominated mountainous watershed of the Upper Rio Grande Basin. Remotely sensed data were used for the random forest machine learning analysis (RFML) and RStudio for data processing and synthesizing. The RFML model outperformed the SWAT model in accuracy and demonstrated its capability in predicting streamflow in this region. We implemented a customized approach to the RFR model to assess the model’s performance for three training periods, across 1991–2010, 1996–2010, and 2001–2010; the results indicated that the model’s accuracy improved with longer training periods, implying that the model trained on a more extended period is better able to capture the parameters’ variability and reproduce streamflow data more accurately. The variable importance (i.e., IncNodePurity) measure of the RFML model revealed that the snow depth and the minimum temperature were consistently the top two predictors across all training periods. The paper also evaluated how well the SWAT model performs in reproducing streamflow data of the watershed with a conventional approach. The SWAT model needed more time and data to set up and calibrate, delivering acceptable performance in annual mean streamflow simulation, with satisfactory index of agreement (d), coefficient of determination (R2), and percent bias (PBIAS) values, but monthly simulation warrants further exploration and model adjustments. The study recommends exploring snowmelt runoff hydrologic processes, dust-driven sublimation effects, and more detailed topographic input parameters to update the SWAT snowmelt routine for better monthly flow estimation. The results provide a critical analysis for enhancing streamflow prediction, which is valuable for further research and water resource management, including snowmelt-driven semi-arid regions.

     
    more » « less
  2. null (Ed.)
    Precipitation occurs in two basic forms defined as liquid state and solid state. Different from rain-fed watershed, modeling snow processes is of vital importance in snow-dominated watersheds. The seasonal snowpack is a natural water reservoir, which stores snow water in winter and releases it in spring and summer. The warmer climate in recent decades has led to earlier snowmelt, a decline in snowpack, and change in the seasonality of river flows. The Soil and Water Assessment Tool (SWAT) could be applied in the snow-influenced watershed because of its ability to simultaneously predict the streamflow generated from rainfall and from the melting of snow. The choice of parameters, reference data, and calibration strategy could significantly affect the SWAT model calibration outcome and further affect the prediction accuracy. In this study, SWAT models are implemented in four upland watersheds in the Tulare Lake Basin (TLB) located across the Southern Sierra Nevada Mountains. Three calibration scenarios considering different calibration parameters and reference datasets are applied to investigate the impact of the Parallel Energy Balance Model (ParBal) snow reconstruction data and snow parameters on the streamflow and snow water-equivalent (SWE) prediction accuracy. In addition, the watershed parameters and lapse rate parameters-led equifinality is also evaluated. The results indicate that calibration of the SWAT model with respect to both streamflow and SWE reference data could improve the model SWE prediction reliability in general. Comparatively, the streamflow predictions are not significantly affected by differently lumped calibration schemes. The default snow parameter values capture the extreme high flows better than the other two calibration scenarios, whereas there is no remarkable difference among the three calibration schemes for capturing the extreme low flows. The watershed and lapse rate parameters-induced equifinality affects the flow prediction more (Nash-Sutcliffe Efficiency (NSE) varies between 0.2–0.3) than the SWE prediction (NSE varies less than 0.1). This study points out the remote-sensing-based SWE reconstruction product as a promising alternative choice for model calibration in ungauged snow-influenced watersheds. The streamflow-reconstructed SWE bi-objective calibrated model could improve the prediction reliability of surface water supply change for the downstream agricultural region under the changing climate. 
    more » « less
  3. Abstract

    Streamflow generation in mountain watersheds is strongly influenced by snow accumulation and melt, and multiple studies have found that snow loss leads to earlier snowmelt timing and declines in annual streamflow. However, hydrologic responses to snow loss are heterogeneous, and not all areas experience streamflow declines. This research examines whether streamflow generation is different for rainfall versus snowmelt inputs. We compiled a sample of 57 small U.S. Geological Survey watersheds in the western United States containing a Natural Resource Conservation Service Snow Telemetry site and having ratios of mean annual peak snow water equivalent to precipitation ratios >0.25. Daily streamflow was separated into quickflow and baseflow using a digital filter, and quickflow was then divided into quickflow response intervals using thresholds in quickflow slope. Each quickflow response interval was categorized by its fraction of input from snowmelt. Most sites exhibited two streamflow generation peaks each year, with one peak in the winter when runoff efficiency is greatest, and the second in the spring during peak snowmelt input. On average, study watersheds were dominated by snowmelt inputs (70%), and snowmelt and mixed inputs usually generated greater streamflow than rainfall because of higher inputs and longer durations. However, rainfall produced high streamflow generation in winter, when watersheds have their highest runoff efficiency (81%) across all input types. We demonstrate that while snowmelt is important for streamflow generation due to high input over long periods, increases in rain and mixed input during wet winter periods can countervail tendencies for reduced streamflow with declining snowpacks.

     
    more » « less
  4. Forecasting the timing and magnitude of snowmelt and runoff is critical to managing mountain water resources. Warming temperatures are increasing the rain–snow transition elevation and are limiting the forecasting skill of statistical models relating historical snow water equivalent to streamflow. While physically based methods are available, they require accurate estimations of the spatial and temporal distribution of meteorological variables in complex terrain. Across many mountainous areas, measurements of precipitation and other meteorological variables are limited to a few reference stations and are not adequate to resolve the complex interactions between topography and atmospheric flow. In this paper, we evaluate the ability of the Weather Research and Forecasting (WRF) Model to approximate the inputs required for a physics-based snow model, iSnobal, instead of using meteorological measurements, for the Boise River Basin (BRB) in Idaho, United States. An iSnobal simulation using station data from 40 locations in and around the BRB resulted in an average root-mean-square error (RMSE) of 4.5 mm compared with 12 SNOTEL measurements. Applying WRF forcings alone was associated with an RMSE of 10.5 mm, while including a simple bias correction to the WRF outputs of temperature and precipitation reduced the RMSE to 6.5 mm. The results highlight the utility of using WRF outputs as input to snowmelt models, as all required input variables are spatiotemporally complete. This will have important benefits in areas with sparse measurement networks and will aid snowmelt and runoff forecasting in mountainous basins.

     
    more » « less
  5. null (Ed.)
    Predicting workload behavior during execution is essential for dynamic resource optimization of processor systems. Early studies used simple prediction algorithms such as a history tables. More recently, researchers have applied advanced machine learning regression techniques. Workload prediction can be cast as a time series forecasting problem. Time series forecasting is an active research area with recent advances that have not been studied in the context of workload prediction. In this paper, we first perform a comparative study of representative time series forecasting techniques to predict the dynamic workload of applications running on a CPU. We adapt state-of-the-art matrix profile and dynamic linear models (DLMs) not previously applied to workload prediction and compare them against traditional SVM and LSTM models that have been popular for handling non-stationary data. We find that all time series forecasting models struggle to predict abrupt workload changes. These changes occur because workloads go through phases, where prior work has studied workload phase detection, classification and prediction. We propose a novel approach that combines time series forecasting with phase prediction. We process each phase as a separate time series and train one forecasting model per phase. At runtime, forecasts from phase-specific models are selected and combined based on the predicted phase behavior. We apply our approach to forecasting of SPEC workloads running on a state-of-the-art Intel machine. Our results show that an LSTM-based phase-aware predictor can forecast workload CPI with less than 8% mean absolute error while reducing CPI error by more than 12% on average compared to a non-phase-aware approach. 
    more » « less