TOPS: Transition-Based Volatility-Reduced Policy Search

Liangliang, X.; Daoming, L.; Yangchen, P.

Citation Details

Existing risk-averse reinforcement learning approaches still face several challenges, including the lack of global optimality guarantee and the necessity of learning from long-term consecutive trajectories. Long-term consecutive trajectories are prone to involving visiting hazardous states, which is a major concern in the risk-averse setting. This paper proposes Transition-based vOlatility-controlled Policy Search (TOPS), a novel algorithm that solves risk-averse problems by learning from transitions. We prove that our algorithm—under the over-parameterized neural network regime—finds a globally optimal policy at a sublinear rate with proximal policy optimization and natural policy gradient. The convergence rate is comparable to the state-of-the-art risk-neutral policy-search methods. The algorithm is evaluated on challenging Mujoco robot simulation tasks under the mean-variance evaluation metric. Both theoretical analysis and experimental results demonstrate a state-of-the-art level of TOPS’ performance among existing risk-averse policy search methods. more »

Award ID(s):: 1910794

PAR ID:: 10471336

Author(s) / Creator(s):: Liangliang, X.; Daoming, L.; Yangchen, P.

Editor(s):: Melo, S. F.; Fang. F.

Publisher / Repository:: Springer

Date Published:: 2022-11-06

Journal Name:: Lecture notes in computer science

ISSN:: 1611-3349

ISBN:: 978-3-031-20179-0

Page Range / eLocation ID:: 3-47

Subject(s) / Keyword(s):: reinforcement learning risk control volatility control

Format(s):: Medium: X

Location:: Virtual Event

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Conference Paper:
The DOI is not currently available.

More Like this