skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 29 until 11:59 PM ET on Saturday, September 30 due to maintenance. We apologize for the inconvenience.

Title: Evaluation of random forests for short-term daily streamflow forecasting in rainfall- and snowmelt-driven watersheds
Abstract. In the past decades, data-driven machine-learning (ML) models have emerged as promising tools for short-term streamflow forecasting. Among other qualities, the popularity of ML models for such applications is due to their relative ease in implementation, less strict distributional assumption, and competitive computational and predictive performance. Despite the encouraging results, most applications of ML for streamflow forecasting have been limited to watersheds in which rainfall is the major source of runoff. In this study, we evaluate the potential of random forests (RFs), a popular ML method, to make streamflow forecasts at 1 d of lead time at 86 watersheds in the Pacific Northwest. These watersheds cover diverse climatic conditions and physiographic settings and exhibit varied contributions of rainfall and snowmelt to their streamflow. Watersheds are classified into three hydrologic regimes based on the timing of center-of-annual flow volume: rainfall-dominated, transient, and snowmelt-dominated. RF performance is benchmarked against naïve and multiple linear regression (MLR) models and evaluated using four criteria: coefficient of determination, root mean squared error, mean absolute error, and Kling–Gupta efficiency (KGE). Model evaluation scores suggest that the RF performs better in snowmelt-driven watersheds compared to rainfall-driven watersheds. The largest improvements in forecasts compared to benchmark models are found among rainfall-driven watersheds. RF performance deteriorates with increases in catchment slope and soil sandiness. We note disagreement between two popular measures of RF variable importance and recommend jointly considering these measures with the physical processes under study. These and other results presented provide new insights for effective application of RF-based streamflow forecasting.  more » « less
Award ID(s):
2006633 1827093
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Hydrology and Earth System Sciences
Page Range / eLocation ID:
2997 to 3015
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Precipitation occurs in two basic forms defined as liquid state and solid state. Different from rain-fed watershed, modeling snow processes is of vital importance in snow-dominated watersheds. The seasonal snowpack is a natural water reservoir, which stores snow water in winter and releases it in spring and summer. The warmer climate in recent decades has led to earlier snowmelt, a decline in snowpack, and change in the seasonality of river flows. The Soil and Water Assessment Tool (SWAT) could be applied in the snow-influenced watershed because of its ability to simultaneously predict the streamflow generated from rainfall and from the melting of snow. The choice of parameters, reference data, and calibration strategy could significantly affect the SWAT model calibration outcome and further affect the prediction accuracy. In this study, SWAT models are implemented in four upland watersheds in the Tulare Lake Basin (TLB) located across the Southern Sierra Nevada Mountains. Three calibration scenarios considering different calibration parameters and reference datasets are applied to investigate the impact of the Parallel Energy Balance Model (ParBal) snow reconstruction data and snow parameters on the streamflow and snow water-equivalent (SWE) prediction accuracy. In addition, the watershed parameters and lapse rate parameters-led equifinality is also evaluated. The results indicate that calibration of the SWAT model with respect to both streamflow and SWE reference data could improve the model SWE prediction reliability in general. Comparatively, the streamflow predictions are not significantly affected by differently lumped calibration schemes. The default snow parameter values capture the extreme high flows better than the other two calibration scenarios, whereas there is no remarkable difference among the three calibration schemes for capturing the extreme low flows. The watershed and lapse rate parameters-induced equifinality affects the flow prediction more (Nash-Sutcliffe Efficiency (NSE) varies between 0.2–0.3) than the SWE prediction (NSE varies less than 0.1). This study points out the remote-sensing-based SWE reconstruction product as a promising alternative choice for model calibration in ungauged snow-influenced watersheds. The streamflow-reconstructed SWE bi-objective calibrated model could improve the prediction reliability of surface water supply change for the downstream agricultural region under the changing climate. 
    more » « less
  2. Abstract

    Streamflow generation in mountain watersheds is strongly influenced by snow accumulation and melt, and multiple studies have found that snow loss leads to earlier snowmelt timing and declines in annual streamflow. However, hydrologic responses to snow loss are heterogeneous, and not all areas experience streamflow declines. This research examines whether streamflow generation is different for rainfall versus snowmelt inputs. We compiled a sample of 57 small U.S. Geological Survey watersheds in the western United States containing a Natural Resource Conservation Service Snow Telemetry site and having ratios of mean annual peak snow water equivalent to precipitation ratios >0.25. Daily streamflow was separated into quickflow and baseflow using a digital filter, and quickflow was then divided into quickflow response intervals using thresholds in quickflow slope. Each quickflow response interval was categorized by its fraction of input from snowmelt. Most sites exhibited two streamflow generation peaks each year, with one peak in the winter when runoff efficiency is greatest, and the second in the spring during peak snowmelt input. On average, study watersheds were dominated by snowmelt inputs (70%), and snowmelt and mixed inputs usually generated greater streamflow than rainfall because of higher inputs and longer durations. However, rainfall produced high streamflow generation in winter, when watersheds have their highest runoff efficiency (81%) across all input types. We demonstrate that while snowmelt is important for streamflow generation due to high input over long periods, increases in rain and mixed input during wet winter periods can countervail tendencies for reduced streamflow with declining snowpacks.

    more » « less
  3. Forecasting the timing and magnitude of snowmelt and runoff is critical to managing mountain water resources. Warming temperatures are increasing the rain–snow transition elevation and are limiting the forecasting skill of statistical models relating historical snow water equivalent to streamflow. While physically based methods are available, they require accurate estimations of the spatial and temporal distribution of meteorological variables in complex terrain. Across many mountainous areas, measurements of precipitation and other meteorological variables are limited to a few reference stations and are not adequate to resolve the complex interactions between topography and atmospheric flow. In this paper, we evaluate the ability of the Weather Research and Forecasting (WRF) Model to approximate the inputs required for a physics-based snow model, iSnobal, instead of using meteorological measurements, for the Boise River Basin (BRB) in Idaho, United States. An iSnobal simulation using station data from 40 locations in and around the BRB resulted in an average root-mean-square error (RMSE) of 4.5 mm compared with 12 SNOTEL measurements. Applying WRF forcings alone was associated with an RMSE of 10.5 mm, while including a simple bias correction to the WRF outputs of temperature and precipitation reduced the RMSE to 6.5 mm. The results highlight the utility of using WRF outputs as input to snowmelt models, as all required input variables are spatiotemporally complete. This will have important benefits in areas with sparse measurement networks and will aid snowmelt and runoff forecasting in mountainous basins.

    more » « less
  4. null (Ed.)
    Predicting workload behavior during execution is essential for dynamic resource optimization of processor systems. Early studies used simple prediction algorithms such as a history tables. More recently, researchers have applied advanced machine learning regression techniques. Workload prediction can be cast as a time series forecasting problem. Time series forecasting is an active research area with recent advances that have not been studied in the context of workload prediction. In this paper, we first perform a comparative study of representative time series forecasting techniques to predict the dynamic workload of applications running on a CPU. We adapt state-of-the-art matrix profile and dynamic linear models (DLMs) not previously applied to workload prediction and compare them against traditional SVM and LSTM models that have been popular for handling non-stationary data. We find that all time series forecasting models struggle to predict abrupt workload changes. These changes occur because workloads go through phases, where prior work has studied workload phase detection, classification and prediction. We propose a novel approach that combines time series forecasting with phase prediction. We process each phase as a separate time series and train one forecasting model per phase. At runtime, forecasts from phase-specific models are selected and combined based on the predicted phase behavior. We apply our approach to forecasting of SPEC workloads running on a state-of-the-art Intel machine. Our results show that an LSTM-based phase-aware predictor can forecast workload CPI with less than 8% mean absolute error while reducing CPI error by more than 12% on average compared to a non-phase-aware approach. 
    more » « less
  5. Abstract. Climate warming will cause mountain snowpacks to melt earlier, reducing summer streamflow and threatening water supplies and ecosystems. Quantifying how sensitive streamflow timing is to climate change and where it is most sensitive remain key questions. Physically based hydrological models are often used for this purpose; however, they have embedded assumptions that translate into uncertain hydrological projections that need to be quantified and constrained to provide reliable inferences. The purpose of this study is to evaluate differences in projected end-of-century changes to streamflow timing between a new empirical model based on diel (daily) streamflow cycles and regional land surface simulations across the mountainous western USA. We develop an observational technique for detecting streamflow responses to snowmelt using diel cycles of incoming solar radiation and streamflow to detect when snowmelt occurs. We measure the date of the 20th percentile of snowmelt days (DOS20) across 31 western USA watersheds affected by snow, as a proxy for the beginning of snowmelt-initiated streamflow. Historic DOS20 varies from mid-January to late May among our sites, with warmer basins having earlier snowmelt-mediated streamflow. Mean annual DOS20 strongly correlates with the dates of 25 % and 50 % annual streamflow volume (DOQ25 and DOQ50, both R2=0.85), suggesting that a 1 d earlier DOS20 corresponds with a 1 d earlier DOQ25 and 0.7 d earlier DOQ50. Empirical projections of future DOS20 based on a stepwise multiple linear regression across sites and years under the RCP8.5 scenario for the late 21st century show that DOS20 will occur on average 11±4 d earlier per 1 ∘C of warming. However, DOS20 in colder watersheds (mean November–February air temperature, TNDJF<-8 ∘C) is on average 70 % more sensitive to climate change than in warmer watersheds (TNDJF>0 ∘C). Moreover, empirical projections of DOQ25 and DOQ50 based on DOS20 are about four and two times more sensitive to climate change, respectively, than those simulated by a state-of-the-art land surface model (NoahMP-WRF) under the same scenario. Given the importance of changes in streamflow timing for water resources, and the significant discrepancies found in projected streamflow sensitivity, snowmelt detection methods such as DOS20 based on diel streamflow cycles may help to constrain model parameters, improve hydrological predictions, and inform process understanding. 
    more » « less