skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Exploring Random Forest Machine Learning and Remote Sensing Data for Streamflow Prediction: An Alternative Approach to a Process-Based Hydrologic Modeling in a Snowmelt-Driven Watershed
Physically based hydrologic models require significant effort and extensive information for development, calibration, and validation. The study explored the use of the random forest regression (RFR), a supervised machine learning (ML) model, as an alternative to the physically based Soil and Water Assessment Tool (SWAT) for predicting streamflow in the Rio Grande Headwaters near Del Norte, a snowmelt-dominated mountainous watershed of the Upper Rio Grande Basin. Remotely sensed data were used for the random forest machine learning analysis (RFML) and RStudio for data processing and synthesizing. The RFML model outperformed the SWAT model in accuracy and demonstrated its capability in predicting streamflow in this region. We implemented a customized approach to the RFR model to assess the model’s performance for three training periods, across 1991–2010, 1996–2010, and 2001–2010; the results indicated that the model’s accuracy improved with longer training periods, implying that the model trained on a more extended period is better able to capture the parameters’ variability and reproduce streamflow data more accurately. The variable importance (i.e., IncNodePurity) measure of the RFML model revealed that the snow depth and the minimum temperature were consistently the top two predictors across all training periods. The paper also evaluated how well the SWAT model performs in reproducing streamflow data of the watershed with a conventional approach. The SWAT model needed more time and data to set up and calibrate, delivering acceptable performance in annual mean streamflow simulation, with satisfactory index of agreement (d), coefficient of determination (R2), and percent bias (PBIAS) values, but monthly simulation warrants further exploration and model adjustments. The study recommends exploring snowmelt runoff hydrologic processes, dust-driven sublimation effects, and more detailed topographic input parameters to update the SWAT snowmelt routine for better monthly flow estimation. The results provide a critical analysis for enhancing streamflow prediction, which is valuable for further research and water resource management, including snowmelt-driven semi-arid regions.  more » « less
Award ID(s):
2142686
PAR ID:
10479936
Author(s) / Creator(s):
; ; ;
Corporate Creator(s):
Editor(s):
Thenkabail, Prasad S.
Publisher / Repository:
MDPI
Date Published:
Journal Name:
Remote Sensing
Volume:
15
Issue:
16
ISSN:
2072-4292
Page Range / eLocation ID:
3999
Subject(s) / Keyword(s):
streamflow prediction random forest machine learning hydrologic modeling water resource management remote sensing data climate change
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Streamflow prediction plays a vital role in water resources planning in order to understand the dramatic change of climatic and hydrologic variables over different time scales. In this study, we used machine learning (ML)-based prediction models, including Random Forest Regression (RFR), Long Short-Term Memory (LSTM), Seasonal Auto- Regressive Integrated Moving Average (SARIMA), and Facebook Prophet (PROPHET) to predict 24 months ahead of natural streamflow at the Lees Ferry site located at the bottom part of the Upper Colorado River Basin (UCRB) of the US. Firstly, we used only historic streamflow data to predict 24 months ahead. Secondly, we considered meteorological components such as temperature and precipitation as additional features. We tested the models on a monthly test dataset spanning 6 years, where 24-month predictions were repeated 50 times to ensure the consistency of the results. Moreover, we performed a sensitivity analysis to identify our best-performing model. Later, we analyzed the effects of considering different span window sizes on the quality of predictions made by our best model. Finally, we applied our best-performing model, RFR, on two more rivers in different states in the UCRB to test the model’s generalizability. We evaluated the performance of the predictive models using multiple evaluation measures. The predictions in multivariate time-series models were found to be more accurate, with RMSE less than 0.84 mm per month, R-squared more than 0.8, and MAPE less than 0.25. Therefore, we conclude that the temperature and precipitation of the UCRB increases the accuracy of the predictions. Ultimately, we found that multivariate RFR performs the best among four models and is generalizable to other rivers in the UCRB. 
    more » « less
  2. Over the last decade, autocalibration routines have become commonplace in watershed modeling. This approach is most often used to simulate a streamflow at a basin’s outlet. In alpine settings, spring/early summer snowmelt is by far the dominant signal in this system. Therefore, there is great potential for a modeled watershed to underperform during other times of the year. This tendency has been noted in many prior studies. In this work, the Soil and Water Assessment Tool (SWAT) model was auto-calibrated with the SUFI-2 routine. A mountainous watershed from Idaho was examined (Upper North Fork). In this study, this basin was calibrated using three estimates of evapotranspiration (ET): Moderate Resolution Imagining Spectrometer (MODIS), Simplified Surface Energy Balance, and Global Land Evaporation: the Amsterdam Model. The MODIS product in particular, had the greatest utility in helping to constrain SWAT parameters that have a high sensitivity to ET. Streamflow simulations that utilize these ET parameter values have improved recessional and summertime streamflow performances during calibration (2007 to 2011) and validation (2012 to 2014) periods. Streamflow performance was monitored with standard objective metrics (Bias and Nash Sutcliffe coefficients) that quantified overall, recessional, and summertime peak flows. This approach yielded dramatic enhancements for all three observations. These results demonstrate the utility of this approach for improving watershed modeling fidelity outside the main snowmelt season. 
    more » « less
  3. Abstract Snow dominated mountainous karst watersheds are the primary source of water supply in many areas in the western U.S. and worldwide. These watersheds are typically characterized by complex terrain, spatiotemporally varying snow accumulation and melt processes, and duality of flow and storage dynamics because of the juxtaposition of matrix (micropores and small fissures) and karst conduits. As a result, predicting streamflow from meteorological inputs has been challenging due to the inability of physically based or conceptual hydrologic models to represent these unique characteristics. We present a hybrid modeling approach that integrates a physically based, spatially distributed, snow model with a deep learning karst model. More specifically, the high‐resolution snow model captures spatiotemporal variability in snowmelt, and the deep learning model simulates the corresponding response of streamflow as influenced by complex surface and subsurface properties. The deep learning model is based on the Convolutional Long Short‐Term Memory (ConvLSTM) architecture capable of handling spatiotemporal recharge patterns and watershed storage dynamics. The hybrid modeling approach is tested on a watershed in northern Utah with seasonal snow cover and variably karstified carbonate bedrock. The hybrid models were able to simulate streamflow at the watershed outlet with high accuracy. The spatial and temporal recharge and discharge patterns learned by the ConvLSTM model were then examined and compared with known hydrogeologic information. Results suggest that ConvLSTM simulates streamflow with higher accuracy than reference models for the study area and provides insight into spatially influenced hydrologic responses that are unavailable within lumped modeling approaches. 
    more » « less
  4. Abstract Floodplains are essential ecosystems that provide a variety of economic, hydrologic, and ecologic services. Within floodplains, surface water‐groundwater exchange plays an important role in facilitating biogeochemical processes and can have a strong influence on stream hydrology through infiltration or discharge of water. These functions can be difficult to assess due to the heterogeneity of floodplains and monitoring constraints, so numerical models are useful tools to estimate fluxes, especially at large spatial extents. In this study, we use the SWAT+ (Soil and Water Assessment Tool) ecohydrological model to quantify magnitudes and spatiotemporal patterns of floodplain surface water‐groundwater exchange in a mountainous watershed using an updated version of thegwflowmodule that directly calculates floodplain‐aquifer exchange rates during periods of floodplain inundation. Thegwflowmodule is a spatially distributed groundwater modelling subroutine within the SWAT+ code that uses a gridded network and physically based equations to predict groundwater storage, groundwater head, and groundwater fluxes. We used SWAT+ to model the 7516 km2Colorado River headwaters watershed and streamflow data from USGS gages for calibration and testing. Models that included floodplain‐groundwater interactions outperformed those without such interactions and provided valuable information about floodplain exchange rates and volumes. Our analyses on the location of floodplain fluxes in the watershed also show that wider areas of floodplains, “beads” (e.g., like beads on a necklace), exchanged a higher net and per area volume of water, as well as higher rates of exchange, compared to narrower areas, “strings.” Study results show that floodplain channel‐groundwater exchange is a valuable process to include in hydrologic models, and model outputs could inform land conservation practises by indicating priority locations, such as beads, where substantial hydrologic exchange occurs. 
    more » « less
  5. null (Ed.)
    Precipitation occurs in two basic forms defined as liquid state and solid state. Different from rain-fed watershed, modeling snow processes is of vital importance in snow-dominated watersheds. The seasonal snowpack is a natural water reservoir, which stores snow water in winter and releases it in spring and summer. The warmer climate in recent decades has led to earlier snowmelt, a decline in snowpack, and change in the seasonality of river flows. The Soil and Water Assessment Tool (SWAT) could be applied in the snow-influenced watershed because of its ability to simultaneously predict the streamflow generated from rainfall and from the melting of snow. The choice of parameters, reference data, and calibration strategy could significantly affect the SWAT model calibration outcome and further affect the prediction accuracy. In this study, SWAT models are implemented in four upland watersheds in the Tulare Lake Basin (TLB) located across the Southern Sierra Nevada Mountains. Three calibration scenarios considering different calibration parameters and reference datasets are applied to investigate the impact of the Parallel Energy Balance Model (ParBal) snow reconstruction data and snow parameters on the streamflow and snow water-equivalent (SWE) prediction accuracy. In addition, the watershed parameters and lapse rate parameters-led equifinality is also evaluated. The results indicate that calibration of the SWAT model with respect to both streamflow and SWE reference data could improve the model SWE prediction reliability in general. Comparatively, the streamflow predictions are not significantly affected by differently lumped calibration schemes. The default snow parameter values capture the extreme high flows better than the other two calibration scenarios, whereas there is no remarkable difference among the three calibration schemes for capturing the extreme low flows. The watershed and lapse rate parameters-induced equifinality affects the flow prediction more (Nash-Sutcliffe Efficiency (NSE) varies between 0.2–0.3) than the SWE prediction (NSE varies less than 0.1). This study points out the remote-sensing-based SWE reconstruction product as a promising alternative choice for model calibration in ungauged snow-influenced watersheds. The streamflow-reconstructed SWE bi-objective calibrated model could improve the prediction reliability of surface water supply change for the downstream agricultural region under the changing climate. 
    more » « less