Comprehensive state-action exploration is essential for reinforcement learning (RL) algorithms. It enables them to find optimal solutions and avoid premature convergence. In value-based RL, optimistic initialization of the value function ensures sufficient exploration for finding the optimal solution. Optimistic values lead to curiosity-driven exploration enabling visitation of under-explored regions. However, optimistic initialization has limitations in stochastic and non-stationary environments due to its inability to explore ''infinitely-often''. To address this limitation, we propose a novel exploration strategy for value-based RL, denoted COIN, based on recurring optimistic initialization. By injecting a continual exploration bonus, we overcome the shortcoming of optimistic initialization (sensitivity to environment noise). We provide a rigorous theoretical comparison of COIN versus existing popular exploration strategies and prove it provides a unique set of attributes (coverage, infinite-often, no visitation tracking, and curiosity). We demonstrate the superiority of COIN over popular existing strategies on a designed toy domain as well as present results on common benchmark tasks. We observe that COIN outperforms existing exploration strategies in four out of six benchmark tasks while performing on par with the best baseline on the other two tasks.
more »
« less
Chapter 14 - Adaptive Hinfinity Tracking Control of Nonlinear Systems Using Reinforcement Learning
his chapter presents online solutions to the optimal Hinfinity tracking of nonlinear systems to attenuate the effect of disturbance on the performance of the systems. To obviate the requirement of the complete knowledge of the system dynamics, reinforcement learning (RL) is used to learn the solutions to the Hamilton–Jacobi–Isaacs equations arising from solving the tracking problem. Off-policy RL algorithms are designed for continuous-time systems, which allows the reuse of data for learning and consequently leads to data efficient RL algorithms. A solution is first presented for the Hinfinity optimal tracking control of affine nonlinear systems. It is then extended to a special class of nonlinear nonaffine systems. It is shown that for the nonaffine systems existence of a stabilizing solution depends on the performance function. A performance function is designed to assure the existence of the solution to a class of nonaffine system, while taking into account the input constraints.
more »
« less
- Award ID(s):
- 1851588
- PAR ID:
- 10078414
- Date Published:
- Journal Name:
- Adaptive Learning Methods for Nonlinear System Modeling (Book)
- Page Range / eLocation ID:
- 313-333
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper proposes a data-driven optimal tracking control scheme for unknown general nonlinear systems using neural networks. First, a new neural networks structure is established to reconstruct the unknown system dynamics of the form ˙ x(t) = f (x(t))+g(x(t))u(t). Two networks in parallel are designed to approximate the functions f (x) and g(x). Then the obtained data-driven models are used to build the optimal tracking control. The developed control consists of two parts, the feed-forward control and the optimal feedback control. The optimal feedback control is developed by approximating the solution of the Hamilton-Jacobi-Bellman equation with neural networks. Unlike other studies, the Hamilton-Jacobi-Bellman solution is found by estimating the value function derivative using neural networks. Finally, the proposed control scheme is tested on a delta robot. Two trajectory tracking examples are provided to verify the effectiveness of the proposed optimal control approach.more » « less
-
A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. Often, what is available is an intuitive but sparse reward function that only indicates whether the task is completed partially or fully. However, the lack of carefully designed, fine grain feedback implies that most existing RL algorithms fail to learn an acceptable policy in a reasonable time frame. This is because of the large number of exploration actions that the policy has to perform before it gets any useful feedback that it can learn from. In this work, we address this challenging problem by developing an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy for faster and efficient online RL in such sparse reward settings. The proposed algorithm, which we call the Learning Online with Guidance Offline (LOGO) algorithm, merges a policy improvement step with an additional policy guidance step by using the offline demonstration data. The key idea is that by obtaining guidance from - not imitating - the offline data, LOGO orients its policy in the manner of the sub-optimal policy, while yet being able to learn beyond and approach optimality. We provide a theoretical analysis of our algorithm, and provide a lower bound on the performance improvement in each learning episode. We also extend our algorithm to the even more challenging incomplete observation setting, where the demonstration data contains only a censored version of the true state observation. We demonstrate the superior performance of our algorithm over state-of-the-art approaches on a number of benchmark environments with sparse rewards and censored state. Further, we demonstrate the value of our approach via implementing LOGO on a mobile robot for trajectory tracking and obstacle avoidance, where it shows excellent performance.more » « less
-
This paper addresses the distributed tracking control of multiple uncertain high-order nonlinear systems with prescribed performance requirements. By introducing a performance function and a nonlinear transformation, the prescribed fixed-time performance tracking control problem is reformulated as a distributed tracking control problem for multiple special nonlinear systems. With the aid of the universal approximation theorem for continuous functions and algebraic graph theory, distributed robust adaptive controllers are designed using the backstepping technique. Simulation results are presented to demonstrate the effectiveness of the proposed algorithms.more » « less
-
The lack of standardized benchmarks for reinforcement learning (RL) in sustainability applications has made it difficult to both track progress on specific domains and identify bottlenecks for researchers to focus their efforts. In this paper, we present SustainGym, a suite of five environments designed to test the performance of RL algorithms on realistic sustainable energy system tasks, ranging from electric vehicle charging to carbon-aware data center job scheduling. The environments test RL algorithms under realistic distribution shifts as well as in multi-agent settings. We show that standard off-the-shelf RL algorithms leave significant room for improving performance and highlight the challenges ahead for introducing RL to real-world sustainability tasks.more » « less
An official website of the United States government

