NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Sample Complexity of Robust Reinforcement Learning with a Generative Model

Panaganti, Kishan; Kalathil, Dileep (March 2022, International Conference on Artificial Intelligence and Statistics (AISTATS))

The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mis- matches between the simulator model and real-world settings. An RMDP problem is typically formulated as a max-min problem, where the objective is to find the policy that maximizes the value function for the worst possible model that lies in an uncertainty set around a nominal model. The standard robust dynamic programming approach requires the knowledge of the nominal model for computing the optimal robust policy. In this work, we propose a model-based reinforcement learning (RL) algorithm for learning an ε-optimal robust policy when the nominal model is unknown. We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence. For each of these uncertainty sets, we give a precise characterization of the sample complexity of our proposed algorithm. In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies. Finally, we demonstrate the performance of our algorithm on two benchmark problems.
more » « less
Full Text Available
Safe Online Convex Optimization with Unknown Linear Safety Constraints

Chaudhary Sapana; Kalathil, Dileep (February 2022, AAAI Conference on Artificial Intelligence (AAAI))

We study the problem of safe online convex optimization, where the action at each time step must satisfy a set of linear safety constraints. The goal is to select a sequence of ac- tions to minimize the regret without violating the safety constraints at any time step (with high probability). The parameters that specify the linear safety constraints are unknown to the algorithm. The algorithm has access to only the noisy observations of constraints for the chosen actions. We pro- pose an algorithm, called the Safe Online Projected Gradient Descent(SO-PGD) algorithm to address this problem. We show that, under the assumption of the availability of a safe baseline action, the SO-PGD algorithm achieves a regret O(T^2/3). While there are many algorithms for online convex optimization (OCO) problems with safety constraints avail- able in the literature, they allow constraint violations during learning/optimization, and the focus has been on characterizing the cumulative constraint violations. To the best of our knowledge, ours is the first work that provides an algorithm with provable guarantees on the regret, without violating the linear safety constraints (with high probability) at any time step.
more » « less
Full Text Available
Distributed Learning-based Stability Assessment for Large Scale Networks of Dissipative Systems

https://doi.org/10.1109/CDC45484.2021.9683774

Jena, Amit; Huang, Tong; Sivaranjani, S.; Kalathil, Dileep; Xie, Le (December 2021, IEEE Conference on Decision and Control (CDC))

We propose a new distributed learning-based framework for stability assessment of a class of networked nonlinear systems, where each subsystem is dissipative. The aim is to learn, in a distributed manner, a Lyapunov function and associated region of attraction for the networked system. We begin by using a neural network function approximation to learn a storage function for each subsystem such that the subsystem satisfies a local dissipativity property. We next use a satisfiability modulo theories (SMT) solver based falsifier that verifies the local dissipativity of each subsystem by deter- mining an absence of counterexamples that violate the local dissipativity property, as established by the neural network approximation. Finally, we verify network-level stability by using an alternating direction method of multipliers (ADMM) approach to update the storage function of each subsystem in a distributed manner until a global stability condition for the network of dissipative subsystems is satisfied. This step also leads to a network-level Lyapunov function that we then use to estimate a region of attraction. We illustrate the proposed algorithm and its advantages on a microgrid interconnection with power electronics interfaces.
more » « less
Full Text Available
Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs

Liu, T.; Zhou, R.; Kalathil, D.; Kumar, P.; Tian, C. (December 2021, Advances in Neural Information Processing Systems (NeurIPS))

Full Text Available
Decoupled Data-Based Approach for Learning to Control Nonlinear Dynamical Systems

https://doi.org/10.1109/TAC.2021.3108552

Wang, Ran; Parunandi, Karthikeya Sharma; Yu, Dan; Kalathil, Dileep; Chakravorty, Suman (August 2021, IEEE Transactions on Automatic Control)

This paper addresses the problem of learning the optimal control policy for a nonlinear stochastic dynam- ical. This problem is subject to the ‘curse of dimension- ality’ associated with the dynamic programming method. This paper proposes a novel decoupled data-based con- trol (D2C) algorithm that addresses this problem using a decoupled, ‘open-loop - closed-loop’, approach. First, an open-loop deterministic trajectory optimization problem is solved using a black-box simulation model of the dynamical system. Then, closed-loop control is developed around this open-loop trajectory by linearization of the dynamics about this nominal trajectory. By virtue of linearization, a linear quadratic regulator based algorithm can be used for this closed-loop control. We show that the performance of D2C algorithm is approximately optimal. Moreover, simulation performance suggests a significant reduction in training time compared to other state of the art algorithms.
more » « less
Full Text Available
Bounded Regret for Finitely Parameterized Multi-Armed Bandits

https://doi.org/10.1109/LCSYS.2020.3008798

Panaganti, Kishan; Kalathil, Dileep (July 2021, IEEE Control Systems Letters)
null (Ed.)
Full Text Available
Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs.

Aria HasanzadeZonuzy, Archana Bura (January 2021, Proceedings of the AAAI Conference on Artificial Intelligence)
null (Ed.)
Full Text Available
Reinforcement Learning for Mean Field Games with Strategic Complementarities

Kiyeob Lee, Desik Rengarajan (January 2021, International Conference on Artificial Intelligence and Statistics (AISTATS))
Banerjee, Arindam and (Ed.)
Full Text Available
Fully Decentralized Reinforcement Learning-Based Control of Photovoltaics in Distribution Grids for Joint Provision of Real and Reactive Power

https://doi.org/10.1109/OAJPE.2021.3077218

El Helou, Rayan; Kalathil, Dileep; Xie, Le (January 2021, IEEE Open Access Journal of Power and Energy)
Reinforcement Learning for Multi-Hop Scheduling and Routing of Real-Time Flows

Aria HasanzadeZonuzy, Dileep Kalathil (January 2020, 18th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks)
null (Ed.)
Full Text Available

Search for: All records