Floods are among the most destructive natural hazards, with damages expected to intensify under climate change and socio-economic pressures. Effective reservoir operation remains a critical yet challenging strategy for mitigating downstream impacts, as operators must navigate nonlinear system dynamics, uncertain inflow forecasts, and trade-offs between competing objectives. This study proposes a novel end-to-end data-driven framework that integrates process-based hydraulic simulations, a Transformer-based surrogate model for flood damage prediction, and reinforcement learning (RL) for reservoir gate operation optimization. The framework is demonstrated using the Coralville Reservoir (Iowa, USA) and two major historical flood events (2008 and 2013). Hydraulic and impact simulations with HEC-RAS and HEC-FIA were used to generate training data, enabling the development of a Transformer model that accurately predicts time-varying flood damages. This surrogate is coupled with a Transformer-enhanced Deep Q-Network (DQN) to derive adaptive gate operation strategies. Results show that the RL-derived optimal policy reduces both peak and time-integrated damages compared to expert and zero-opening benchmarks, while maintaining smooth and feasible operations. Comparative analysis with a genetic algorithm (GA) highlights the robustness of the RL framework, particularly its ability to generalize across uncertain inflows and varying initial storage conditions. Importantly, the adaptive RL policy trained on perturbed synthetic inflows transferred effectively to the hydrologically distinct 2013 event, and fine-tuning achieved near-identical performance to the event-specific optimal policy. These findings highlight the capability of the proposed framework to provide adaptive, transferable, and computationally efficient tools for flood-resilient reservoir operation.
more »
« less
Bootstrap Aggregation and Cross‐Validation Methods to Reduce Overfitting in Reservoir Control Policy Search
Abstract Policy search methods provide a heuristic mapping between observations and decisions and have been widely used in reservoir control studies. However, recent studies have observed a tendency for policy search methods to overfit to the hydrologic data used in training, particularly the sequence of flood and drought events. This technical note develops an extension of bootstrap aggregation (bagging) and cross‐validation techniques, inspired by the machine learning literature, to improve reservoir control policy performance on out‐of‐sample hydrological sequences. We explore these methods using a case study of Folsom Reservoir, California, using control policies structured as binary trees, and streamflow resampling based on the paleo‐inflow record. Results show that calibration‐validation strategies for policy selection coupled with certain ensemble aggregation methods can improve out‐of‐sample performance in water supply and flood risk objectives over baseline performance given fixed computational costs. Our findings highlight the potential to improve policy search methodologies by leveraging these well‐established model training strategies from machine learning.
more »
« less
- Award ID(s):
- 1803589
- PAR ID:
- 10449291
- Publisher / Repository:
- DOI PREFIX: 10.1029
- Date Published:
- Journal Name:
- Water Resources Research
- Volume:
- 56
- Issue:
- 8
- ISSN:
- 0043-1397
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Reservoirs are designed and operated to mitigate hydroclimatic variability and extremes to fulfill various beneficial purposes. Existing reservoir infrastructure capacity and operation policies derived from historical records are challenged by hydrologic regime change and storage reduction from sedimentation. Furthermore, climate change could amplify the water footprint of reservoir operation (i.e. non-beneficial evaporative loss), further influencing the complex interactions among hydrologic variability, reservoir characteristics, and operation decisions. Disentangling and quantifying these impacts is essential to assess the effectiveness of reservoir operation under future climate and identify the opportunities for adaptive reservoir management (e.g. storage reallocation). Using reservoirs in Texas as a testing case, this study develops data-driven models to represent the current reservoir operation policies and assesses the challenges and opportunities in flood control and water supply under dynamically downscaled climate projections from the Coupled Model Intercomparison Project Phase 6. We find that current policies are robust in reducing future flood risks by eliminating small floods, reducing peak magnitude, and extending the duration for large floods. Current operation strategies can effectively reduce the risk of storage shortage for many reservoirs investigated, but reservoir evaporation and sedimentation pose urgent needs for revisions in the current guidelines to enhance system resilience. We also identify the opportunities for reservoir storage reallocation through seasonal-varying conservation pool levels to improve water supply reliability with negligible flood risk increase. This study provides a framework for stakeholders to evaluate the effectiveness of the current reservoir operation policy under future climate through the interactions among hydroclimatology, reservoir infrastructure, and operation policy.more » « less
-
Abstract BackgroundHuman-human (HH) interaction mediated by machines (e.g., robots or passive sensorized devices), which we call human-machine-human (HMH) interaction, has been studied with increasing interest in the last decade. The use of machines allows the implementation of different forms of audiovisual and/or physical interaction in dyadic tasks. HMH interaction between two partners can improve the dyad’s ability to accomplish a joint motor task (task performance) beyond either partner’s ability to perform the task solo. It can also be used to more efficiently train an individual to improve their solo task performance (individual motor learning). We review recent research on the impact of HMH interaction on task performance and individual motor learning in the context of motor control and rehabilitation, and we propose future research directions in this area. MethodsA systematic search was performed on the Scopus, IEEE Xplore, and PubMed databases. The search query was designed to find studies that involve HMH interaction in motor control and rehabilitation settings. Studies that do not investigate the effect of changing the interaction conditions were filtered out. Thirty-one studies met our inclusion criteria and were used in the qualitative synthesis. ResultsStudies are analyzed based on their results related to the effects of interaction type (e.g., audiovisual communication and/or physical interaction), interaction mode (collaborative, cooperative, co-active, and competitive), and partner characteristics. Visuo-physical interaction generally results in better dyadic task performance than visual interaction alone. In cases where the physical interaction between humans is described by a spring, there are conflicting results as to the effect of the stiffness of the spring. In terms of partner characteristics, having a more skilled partner improves dyadic task performance more than having a less skilled partner. However, conflicting results were observed in terms of individual motor learning. ConclusionsAlthough it is difficult to draw clear conclusions as to which interaction type, mode, or partner characteristic may lead to optimal task performance or individual motor learning, these results show the possibility for improved outcomes through HMH interaction. Future work that focuses on selecting the optimal personalized interaction conditions and exploring their impact on rehabilitation settings may facilitate the transition of HMH training protocols to clinical implementations.more » « less
-
Many machine learning models have tuning parameters to be determined by the training data, and cross‐validation (CV) is perhaps the most commonly used method for selecting tuning parameters. This work concerns the problem of estimating the generalization error of a CV‐tuned predictive model. We propose to use an honest leave‐one‐out cross‐validation framework to produce a nearly unbiased estimator of the post‐tuning generalization error. By using the kernel support vector machine and the kernel logistic regression as examples, we demonstrate that the honest leave‐one‐out cross‐validation has very competitive performance even when competing with the state‐of‐the‐art .632+ estimator.more » « less
-
Actor-critic methods, like Twin Delayed Deep Deterministic Policy Gradient (TD3), depend on basic noise-based exploration, which can result in less than optimal policy convergence. In this study, we introduce Monte Carlo Beam Search (MCBS), a new hybrid method that combines beam search and Monte Carlo rollouts with TD3 to improve exploration and action selection. MCBS produces several candidate actions around the policy's output and assesses them through short-horizon rollouts, enabling the agent to make better-informed choices. We test MCBS across various continuous-control benchmarks, including HalfCheetah-v4, Walker2d-v5, and Swimmer-v5, showing enhanced sample efficiency and performance compared to standard TD3 and other baseline methods like SAC, PPO, and A2C. Our findings emphasize MCBS's capability to enhance policy learning through structured look-ahead search while ensuring computational efficiency. Additionally, we offer a detailed analysis of crucial hyperparameters, such as beam width and rollout depth, and explore adaptive strategies to optimize MCBS for complex control tasks. Our method shows a higher convergence rate across different environments compared to TD3, SAC, PPO, and A2C. For instance, we achieved 90% of the maximum achievable reward within around 200 thousand timesteps compared to 400 thousand timesteps for the second-best method.more » « less
An official website of the United States government
