NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

On the Convergence of the Iterative Linear Exponential Quadratic Gaussian Algorithm to Stationary Points

https://doi.org/10.23919/ACC45564.2020.9147694

Roulet, Vincent; Fazel, Maryam; Srinivasa, Siddhartha; Harchaoui, Zaid (July 2020, 2020 American Control Conference)

A classical method for risk-sensitive nonlinear control is the iterative linear exponential quadratic Gaussian algorithm. We present its convergence analysis from a first-order optimization viewpoint. We identify the objective that the algorithm actually minimizes and we show how the addition of a proximal term guarantees convergence to a stationary point.
more » « less
Full Text Available
Iterative Linearized Control: Stable Algorithms and Complexity Guarantees

Roulet, Vincent; Srinivasa, Siddhartha; Drusvyatskiy, Dmitriy; Harchaoui, Zaid (June 2019, Proceedings of Machine Learning Research)

We examine popular gradient-based algorithms for nonlinear control in the light of the modern complexity analysis of first-order optimization algorithms. The examination reveals that the complexity bounds can be clearly stated in terms of calls to a computational oracle related to dynamic programming and implementable by gradient back-propagation using machine learning software libraries such as PyTorch or TensorFlow. Finally, we propose a regularized Gauss-Newton algorithm enjoying worst-case complexity bounds and improved convergence behavior in practice. The software library based on PyTorch is publicly available.
more » « less
Full Text Available
Provably Efficient Maximum Entropy Exploration

Hazan, Elad; Kakade, Sham; Singh, Karan; Van Soest, Abby (January 2019, Proceedings of the 36th International Conference on Machine Learning)

Suppose an agent is in a (possibly unknown) Markov Decision Process in the absence of a reward signal, what might we hope that an agent can efficiently learn to do? This work studies a broad class of objectives that are defined solely as functions of the state-visitation frequencies that are induced by how the agent behaves. For example, one natural, intrinsically defined, objective problem is for the agent to learn a policy which induces a distribution over state space that is as uniform as possible, which can be measured in an entropic sense. We provide an efficient algorithm to optimize such such intrinsically defined objectives, when given access to a black box planning oracle (which is robust to function approximation). Furthermore, when restricted to the tabular setting where we have sample based access to the MDP, our proposed algorithm is provably efficient, both in terms of its sample and computational complexities. Key to our algorithmic methodology is utilizing the conditional gradient method (a.k.a. the Frank-Wolfe algorithm) which utilizes an approximate MDP solver.
more » « less
Full Text Available

Search for: All records