skip to main content


This content will become publicly available on August 1, 2024

Title: Exploring Random Forest Machine Learning and Remote Sensing Data for Streamflow Prediction: An Alternative Approach to a Process-Based Hydrologic Modeling in a Snowmelt-Driven Watershed

Physically based hydrologic models require significant effort and extensive information for development, calibration, and validation. The study explored the use of the random forest regression (RFR), a supervised machine learning (ML) model, as an alternative to the physically based Soil and Water Assessment Tool (SWAT) for predicting streamflow in the Rio Grande Headwaters near Del Norte, a snowmelt-dominated mountainous watershed of the Upper Rio Grande Basin. Remotely sensed data were used for the random forest machine learning analysis (RFML) and RStudio for data processing and synthesizing. The RFML model outperformed the SWAT model in accuracy and demonstrated its capability in predicting streamflow in this region. We implemented a customized approach to the RFR model to assess the model’s performance for three training periods, across 1991–2010, 1996–2010, and 2001–2010; the results indicated that the model’s accuracy improved with longer training periods, implying that the model trained on a more extended period is better able to capture the parameters’ variability and reproduce streamflow data more accurately. The variable importance (i.e., IncNodePurity) measure of the RFML model revealed that the snow depth and the minimum temperature were consistently the top two predictors across all training periods. The paper also evaluated how well the SWAT model performs in reproducing streamflow data of the watershed with a conventional approach. The SWAT model needed more time and data to set up and calibrate, delivering acceptable performance in annual mean streamflow simulation, with satisfactory index of agreement (d), coefficient of determination (R2), and percent bias (PBIAS) values, but monthly simulation warrants further exploration and model adjustments. The study recommends exploring snowmelt runoff hydrologic processes, dust-driven sublimation effects, and more detailed topographic input parameters to update the SWAT snowmelt routine for better monthly flow estimation. The results provide a critical analysis for enhancing streamflow prediction, which is valuable for further research and water resource management, including snowmelt-driven semi-arid regions.

 
more » « less
Award ID(s):
2142686
NSF-PAR ID:
10479936
Author(s) / Creator(s):
; ; ;
Corporate Creator(s):
Editor(s):
Thenkabail, Prasad S.
Publisher / Repository:
MDPI
Date Published:
Journal Name:
Remote Sensing
Volume:
15
Issue:
16
ISSN:
2072-4292
Page Range / eLocation ID:
3999
Subject(s) / Keyword(s):
["streamflow prediction","random forest machine learning","hydrologic modeling","water resource management","remote sensing data","climate change"]
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Streamflow prediction plays a vital role in water resources planning in order to understand the dramatic change of climatic and hydrologic variables over different time scales. In this study, we used machine learning (ML)-based prediction models, including Random Forest Regression (RFR), Long Short-Term Memory (LSTM), Seasonal Auto- Regressive Integrated Moving Average (SARIMA), and Facebook Prophet (PROPHET) to predict 24 months ahead of natural streamflow at the Lees Ferry site located at the bottom part of the Upper Colorado River Basin (UCRB) of the US. Firstly, we used only historic streamflow data to predict 24 months ahead. Secondly, we considered meteorological components such as temperature and precipitation as additional features. We tested the models on a monthly test dataset spanning 6 years, where 24-month predictions were repeated 50 times to ensure the consistency of the results. Moreover, we performed a sensitivity analysis to identify our best-performing model. Later, we analyzed the effects of considering different span window sizes on the quality of predictions made by our best model. Finally, we applied our best-performing model, RFR, on two more rivers in different states in the UCRB to test the model’s generalizability. We evaluated the performance of the predictive models using multiple evaluation measures. The predictions in multivariate time-series models were found to be more accurate, with RMSE less than 0.84 mm per month, R-squared more than 0.8, and MAPE less than 0.25. Therefore, we conclude that the temperature and precipitation of the UCRB increases the accuracy of the predictions. Ultimately, we found that multivariate RFR performs the best among four models and is generalizable to other rivers in the UCRB. 
    more » « less
  2. Abstract

    Snow dominated mountainous karst watersheds are the primary source of water supply in many areas in the western U.S. and worldwide. These watersheds are typically characterized by complex terrain, spatiotemporally varying snow accumulation and melt processes, and duality of flow and storage dynamics because of the juxtaposition of matrix (micropores and small fissures) and karst conduits. As a result, predicting streamflow from meteorological inputs has been challenging due to the inability of physically based or conceptual hydrologic models to represent these unique characteristics. We present a hybrid modeling approach that integrates a physically based, spatially distributed, snow model with a deep learning karst model. More specifically, the high‐resolution snow model captures spatiotemporal variability in snowmelt, and the deep learning model simulates the corresponding response of streamflow as influenced by complex surface and subsurface properties. The deep learning model is based on the Convolutional Long Short‐Term Memory (ConvLSTM) architecture capable of handling spatiotemporal recharge patterns and watershed storage dynamics. The hybrid modeling approach is tested on a watershed in northern Utah with seasonal snow cover and variably karstified carbonate bedrock. The hybrid models were able to simulate streamflow at the watershed outlet with high accuracy. The spatial and temporal recharge and discharge patterns learned by the ConvLSTM model were then examined and compared with known hydrogeologic information. Results suggest that ConvLSTM simulates streamflow with higher accuracy than reference models for the study area and provides insight into spatially influenced hydrologic responses that are unavailable within lumped modeling approaches.

     
    more » « less
  3. null (Ed.)
    Precipitation occurs in two basic forms defined as liquid state and solid state. Different from rain-fed watershed, modeling snow processes is of vital importance in snow-dominated watersheds. The seasonal snowpack is a natural water reservoir, which stores snow water in winter and releases it in spring and summer. The warmer climate in recent decades has led to earlier snowmelt, a decline in snowpack, and change in the seasonality of river flows. The Soil and Water Assessment Tool (SWAT) could be applied in the snow-influenced watershed because of its ability to simultaneously predict the streamflow generated from rainfall and from the melting of snow. The choice of parameters, reference data, and calibration strategy could significantly affect the SWAT model calibration outcome and further affect the prediction accuracy. In this study, SWAT models are implemented in four upland watersheds in the Tulare Lake Basin (TLB) located across the Southern Sierra Nevada Mountains. Three calibration scenarios considering different calibration parameters and reference datasets are applied to investigate the impact of the Parallel Energy Balance Model (ParBal) snow reconstruction data and snow parameters on the streamflow and snow water-equivalent (SWE) prediction accuracy. In addition, the watershed parameters and lapse rate parameters-led equifinality is also evaluated. The results indicate that calibration of the SWAT model with respect to both streamflow and SWE reference data could improve the model SWE prediction reliability in general. Comparatively, the streamflow predictions are not significantly affected by differently lumped calibration schemes. The default snow parameter values capture the extreme high flows better than the other two calibration scenarios, whereas there is no remarkable difference among the three calibration schemes for capturing the extreme low flows. The watershed and lapse rate parameters-induced equifinality affects the flow prediction more (Nash-Sutcliffe Efficiency (NSE) varies between 0.2–0.3) than the SWE prediction (NSE varies less than 0.1). This study points out the remote-sensing-based SWE reconstruction product as a promising alternative choice for model calibration in ungauged snow-influenced watersheds. The streamflow-reconstructed SWE bi-objective calibrated model could improve the prediction reliability of surface water supply change for the downstream agricultural region under the changing climate. 
    more » « less
  4. Abstract

    This paper presents a top–down approach for soil moisture and sap flux sampling design with the goal of understanding ecohydrologic response to interannual climate variation in the rain–snow transition watersheds. The design is based on a priori estimates of soil moisture and transpiration patterns using a physical distributed model, Regional Hydro‐Ecologic Simulation System (RHESSys). RHESSys was initially calibrated with existing snow depth and streamflow data. Calibrated model estimates of seasonal trajectories of snowmelt, root‐zone soil moisture storage, and transpiration were used to develop five hydrologic similarity indicators and map these at (30 m) patch scale across the study watershed. The partitioning around medoids‐clustering algorithm was then used to define six distinctive spatially explicit clusters based on the five hydrologic similarity indictors. A representative site within each cluster was identified for sampling. For each site, soil moisture sensors were installed at the 30‐ and 90‐cm depths and at the five soil pits and a sap flux sensor at the averaged‐size white fir tree for each site. The model‐based cluster analysis suggests that the elevation gradient and topographically driven flow drainage patterns are the dominant drivers of spatial patterns of soil moisture and transpiration. The comparison of model‐based calculated hydrological similarity indicators with measured‐data‐based values shows that spatial patterns of field‐sampled soil moisture data typically fell within uncertainty bounds of model‐based estimates for each cluster. There were however several notable exceptions. The model failed to capture the soil moisture and sap flux dynamics in a riparian zone site and in a site where lateral subsurface flow may not follow surface topography. Results highlight the utility of using a hypothesis driven sampling strategy, based on a physically based model, for efficiently providing new information that can drive both future measurements and strategic refinements to model inputs, parameters, or structure that might reduce these errors. Future research will focus on strategies for using of finer scale representations of microclimate, topography, vegetation, and soil properties to improve models.

     
    more » « less
  5. Abstract. Measurement of light absorption of solar radiation byaerosols is vital for assessing direct aerosol radiative forcing, whichaffects local and global climate. Low-cost and easy-to-operate filter-basedinstruments, such as the Particle Soot Absorption Photometer (PSAP), that collect aerosols on a filter and measure light attenuation through thefilter are widely used to infer aerosol light absorption. However,filter-based absorption measurements are subject to artifacts that aredifficult to quantify. These artifacts are associated with the presence ofthe filter medium and the complex interactions between the filter fibers and accumulated aerosols. Various correction algorithms have been introduced to correct for the filter-based absorption coefficient measurements toward predicting the particle-phase absorption coefficient (Babs). However, the inability of these algorithms to incorporate into their formulations the complex matrix of influencing parameters such as particle asymmetry parameter, particle size, and particle penetration depth results in prediction of particle-phase absorption coefficients with relatively low accuracy. The analytical forms of corrections also suffer from a lack of universal applicability: different corrections are required for rural andurban sites across the world. In this study, we analyzed and compared 3 months of high-time-resolution ambient aerosol absorption data collectedsynchronously using a three-wavelength photoacoustic absorption spectrometer (PASS) and PSAP. Both instruments were operated on the same sampling inletat the Department of Energy's Atmospheric Radiation Measurement program's Southern Great Plains (SGP) user facility in Oklahoma. We implemented the two mostcommonly used analytical correction algorithms, namely, Virkkula (2010) and the average of Virkkula (2010) and Ogren (2010)–Bond et al. (1999) as well as a random forest regression (RFR) machine learning algorithm to predict Babs values from the PSAP's filter-based measurements. The predicted Babs was compared against the reference Babs measured by the PASS. The RFR algorithm performed the best by yielding the lowest root mean squareerror of prediction. The algorithm was trained using input datasets from the PSAP (transmission and uncorrected absorption coefficient), a co-locatednephelometer (scattering coefficients), and the Aerosol Chemical Speciation Monitor (mass concentration of non-refractory aerosol particles). A revisedform of the Virkkula (2010) algorithm suitable for the SGP site has beenproposed; however, its performance yields approximately 2-fold errors when compared to the RFR algorithm. To generalize the accuracy and applicabilityof our proposed RFR algorithm, we trained and tested it on a dataset oflaboratory measurements of combustion aerosols. Input variables to thealgorithm included the aerosol number size distribution from the Scanning Mobility Particle Sizer, absorption coefficients from the filter-basedTricolor Absorption Photometer, and scattering coefficients from amultiwavelength nephelometer. The RFR algorithm predicted Babs values within 5 % of the reference Babs measured by the multiwavelength PASS during the laboratory experiments. Thus, we show that machine learningapproaches offer a promising path to correct for biases in long-termfilter-based absorption datasets and accurately quantify their variabilityand trends needed for robust radiative forcing determination. 
    more » « less