Compared with capital improvement projects, real-time control of stormwater systems may be a more effective and efficient approach to address the increasing risk of flooding in urban areas. One way to automate the design process of control policies is through reinforcement learning (RL). Recently, RL methods have been applied to small stormwater systems and have demonstrated better performance over passive systems and simple rule-based strategies. However, it remains unclear how effective RL methods are for larger and more complex systems. Current RL-based control policies also suffer from poor convergence and stability, which may be due to large updates made by the underlying RL algorithm. In this study, we use the Proximal Policy Optimization (PPO) algorithm and develop control policies for a medium-sized stormwater system that can significantly mitigate flooding during large storm events. Our approach demonstrates good convergence behavior and stability, and achieves robust out-of-sample performance. 
                        more » 
                        « less   
                    
                            
                            MAGICS: Adversarial RL with Minimax Actors Guided by Implicit Critic Stackelberg for Convergent Neural Synthesis of Robot Safety
                        
                    
    
            While robust optimal control theory provides a rigorous framework to compute robot control policies that are provably safe, it struggles to scale to high- dimensional problems, leading to increased use of deep learning for tractable synthesis of robot safety. Unfortunately, existing neural safety synthesis methods often lack convergence guarantees and solution interpretability. In this paper, we present Minimax Actors Guided by Implicit Critic Stackelberg (MAGICS), a novel adversarial reinforcement learning (RL) algorithm that guarantees local convergence to a minimax equilibrium solution. We then build on this approach to provide local convergence guarantees for a general deep RL-based robot safety synthesis algorithm. Through both simulation studies on OpenAI Gym environ- ments and hardware experiments with a 36-dimensional quadruped robot, we show that MAGICS can yield robust control policies outperforming the state- of-the-art neural safety synthesis methods. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2340851
- PAR ID:
- 10621715
- Publisher / Repository:
- Springer Proceedings in Advanced Robotics (SPAR)
- Date Published:
- Volume:
- XVI
- Subject(s) / Keyword(s):
- adversarial reinforcement learning robot safety game theory
- Format(s):
- Medium: X
- Location:
- Chicago, IL
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            We propose a deductive synthesis framework for construct- ing reinforcement learning (RL) agents that provably satisfy temporal reach-avoid specifications over infinite horizons. Our approach decomposes these temporal specifications into a sequence of finite-horizon subtasks, for which we synthesize individual RL policies. Using formal verification techniques, we ensure that the composition of a finite number of subtask policies guarantees satisfaction of the overall specification over infinite horizons. Experimental results on a suite of benchmarks show that our synthesized agents outperform standard RL methods in both task performance and compliance with safety and temporal requirements.more » « less
- 
            Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to support logic deduction and generalization in the representation of a deep RL model causes it less effective in exploring complex long-horizon robot-control tasks with sparse reward signals. Existing program synthesis algorithms for RL problems inherit the same limitation, as they either adapt conventional RL algorithms to guide program search or synthesize robot-control programs to imitate an RL model. We propose ReGuS, a reward-guided synthesis paradigm, to unlock the potential of program synthesis to overcome the exploration challenges. We develop a novel hierarchical synthesis algorithm with decomposed search space for loops, on-demand synthesis of conditional statements, and curriculum synthesis for procedure calls, to effectively compress the exploration space for long-horizon, multi-stage, and procedural robot-control tasks that are difficult to address by conventional RL techniques. Experiment results demonstrate that ReGuS significantly outperforms state-of-the-art RL algorithms and standard program synthesis baselines on challenging robot tasks including autonomous driving, locomotion control, and object manipulation. CCS Concepts: •Software and its engineering → Automatic programming.more » « less
- 
            Abstract We propose a deductive synthesis framework for constructing reinforcement learning (RL) agents that provably satisfy temporal reach-avoid specifications over infinite horizons. Our approach decomposes these temporal specifications into a sequence of finite-horizon subtasks, for which we synthesize individual RL policies. Using formal verification techniques, we ensure that the composition of a finite number of subtask policies guarantees satisfaction of the overall specification over infinite horizons. Experimental results on a suite of benchmarks show that our synthesized agents outperform standard RL methods in both task performance and compliance with safety and temporal requirements.more » « less
- 
            Abstract The ability to reuse trained models in Reinforcement Learning (RL) holds substantial practical value in particular for complex tasks. While model reusability is widely studied for supervised models in data management, to the best of our knowledge, this is the first ever principled study that is proposed for RL. To capture trained policies, we develop a framework based on an expressive and lossless graph data model that accommodates Temporal Difference Learning and Deep-RL based RL algorithms. Our framework is able to capture arbitrary reward functions that can be composed at inference time. The framework comes with theoretical guarantees and shows that it yields the same result as policies trained from scratch. We design a parameterized algorithm that strikes a balance between efficiency and quality w.r.t cumulative reward. Our experiments with two common RL tasks (query refinement and robot movement) corroborate our theory and show the effectiveness and efficiency of our algorithms.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    