Search for: All records

Award ID contains: 2045783

« Prev Next »

Total Resources

16

Resource Type
Conference Paper

13

Conference Proceeding

1

Dataset

0

Journal Article

2

Workshop Report

0

Availability
Full Text / Resource Available

15

Citation Only

1

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation

Ruida Zhou, Tao Liu ( December 2023 , Advances in neural information processing systems)
Distributionally Robust Behavioral Cloning for Robust Imitation Learning

https://doi.org/10.1109/CDC49753.2023.10383976

Panaganti, Kishan ; Xu, Zaiyan ; Kalathil, Dileep ; Ghavamzadeh, Mohammad ( December 2023 , IEEE)
Dynamic Regret Analysis of Safe Distributed Online Optimization for Convex and Non-convex Problems

Ting-Jui Chang, Sapana Chaudhary ( October 2023 , Transactions on machine learning research)

Free, publicly-accessible full text available October 5, 2024
Meta-Learning Online Control for Linear Dynamical Systems

https://doi.org/10.1109/CDC51059.2022.9993222

Muthirayan, Deepan ; Kalathil, Dileep ; Khargonekar, Pramod P. ( December 2022 , IEEE 61st Conference on Decision and Control (CDC))

Full Text Available
Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

Zhou, Ruida ; Liu, Tao ; Kalathil, Dileep ; Kumar, P. R. ; Tian, Chao ( December 2022 , Advances in neural information processing systems)

Full Text Available
Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments

Rengarajan, Desik ; Chaudhary, Sapana ; Kim, Jaewon ; Kalathil, Dileep ; Shakkottai, Srinivas ( December 2022 , Advances in neural information processing systems)

Full Text Available
Robust Reinforcement Learning using Offline Data

Panaganti, Kishan ; Xu, Zaiyan ; Kalathil, Dileep ; Ghavamzadeh, Mohammad ( December 2022 , Advances in neural information processing systems)

Full Text Available
DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement Learning

Bura, Archana ; HasanzadeZonuzy, Aria ; Kalathil, Dileep ; Shakkottai, Srinivas ; Chamberland, Jean-Francois ( December 2022 , Advances in neural information processing systems)

Full Text Available
Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

Rengarajan, D. ; Vaidya, G. ; Sarvesh, A. ; Kalathil, D. ; Shakkottai, S ( April 2022 , International Conference on Learning Representations (ICLR))

A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. Often, what is available is an intuitive but sparse reward function that only indicates whether the task is completed partially or fully. However, the lack of carefully designed, fine grain feedback implies that most existing RL algorithms fail to learn an acceptable policy in a reasonable time frame. This is because of the large number of exploration actions that the policy has to perform before it gets any useful feedback that it can learn from. In this work, we address this challenging problem by developing an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy for faster and efficient online RL in such sparse reward settings. The proposed algorithm, which we call the Learning Online with Guidance Offline (LOGO) algorithm, merges a policy improvement step with an additional policy guidance step by using the offline demonstration data. The key idea is that by obtaining guidance from - not imitating - the offline data, LOGO orients its policy in the manner of the sub-optimal policy, while yet being able to learn beyond and approach optimality. We provide a theoretical analysis of our algorithm, and provide a lower bound on the performance improvement in each learning episode. We also extend our algorithm to the even more challenging incomplete observation setting, where the demonstration data contains only a censored version of the true state observation. We demonstrate the superior performance of our algorithm over state-of-the-art approaches on a number of benchmark environments with sparse rewards and censored state. Further, we demonstrate the value of our approach via implementing LOGO on a mobile robot for trajectory tracking and obstacle avoidance, where it shows excellent performance.
more » « less
Full Text Available
Sample Complexity of Robust Reinforcement Learning with a Generative Model

Panaganti, Kishan ; Kalathil, Dileep ( March 2022 , International Conference on Artificial Intelligence and Statistics (AISTATS))

The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mis- matches between the simulator model and real-world settings. An RMDP problem is typically formulated as a max-min problem, where the objective is to find the policy that maximizes the value function for the worst possible model that lies in an uncertainty set around a nominal model. The standard robust dynamic programming approach requires the knowledge of the nominal model for computing the optimal robust policy. In this work, we propose a model-based reinforcement learning (RL) algorithm for learning an ε-optimal robust policy when the nominal model is unknown. We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence. For each of these uncertainty sets, we give a precise characterization of the sample complexity of our proposed algorithm. In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies. Finally, we demonstrate the performance of our algorithm on two benchmark problems.
more » « less
Full Text Available

« Prev Next »