NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Model- free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Chen-Yu Wei, Mehdi Jafarnia (July 2021, Proceedings of Machine Learning Research)
null (Ed.)
Full Text Available
Finite Time Guarantees for Continuous State MDPs with Generative Model

https://doi.org/10.1109/CDC42340.2020.9303840

Sharma, Hiteshi; Jain, Rahul (December 2020, 2020 59th IEEE Conference on Decision and Control (CDC))
null (Ed.)
Full Text Available
Empirical Q-Value Iteration

https://doi.org/10.1287/stsy.2019.0062

Kalathil, Dileep; Borkar, Vivek S.; Jain, Rahul (October 2020, Stochastic Systems)
null (Ed.)
We propose a new simple and natural algorithm for learning the optimal Q-value function of a discounted-cost Markov decision process (MDP) when the transition kernels are unknown. Unlike the classical learning algorithms for MDPs, such as Q-learning and actor-critic algorithms, this algorithm does not depend on a stochastic approximation-based method. We show that our algorithm, which we call the empirical Q-value iteration algorithm, converges to the optimal Q-value function. We also give a rate of convergence or a nonasymptotic sample complexity bound and show that an asynchronous (or online) version of the algorithm will also work. Preliminary experimental results suggest a faster rate of convergence to a ballpark estimate for our algorithm compared with stochastic approximation-based algorithms.
more » « less
Full Text Available
Posterior Sampling-Based Reinforcement Learning for Control of Unknown Linear Systems

https://doi.org/10.1109/TAC.2019.2950156

Ouyang, Yi; Gagrani, Mukul; Jain, Rahul (August 2020, IEEE Transactions on Automatic Control)

Full Text Available
Non-indexability of the stochastic appointment scheduling problem

https://doi.org/10.1016/j.automatica.2020.109016

Jafarnia-Jahromi, Mehdi; Jain, Rahul (August 2020, Automatica)

Full Text Available
A Universal Empirical Dynamic Programming Algorithm for Continuous State MDPs

https://doi.org/10.1109/TAC.2019.2907414

Haskell, William B.; Jain, Rahul; Sharma, Hiteshi; Yu, Pengqian (January 2020, IEEE Transactions on Automatic Control)
null (Ed.)
Full Text Available
Empirical Algorithms for General Stochastic Systems with Continuous States and Actions

https://doi.org/10.1109/CDC40024.2019.9029308

Sharma, Hiteshi; Jain, Rahul; Haskell, William (December 2019, Proc. IEEE Control and Decision Conference)

Full Text Available
An Approximately Optimal Relative Value Learning Algorithm for Averaged MDPs with Continuous States and Actions

https://doi.org/10.1109/ALLERTON.2019.8919719

Sharma, Hiteshi; Jain, Rahul (September 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton))

It has long been a challenging problem to design algorithms for Markov decision processes (MDPs) with continuous states and actions that are provably approximately optimal and can provide arbitrarily good approximation for any MDP. In this paper, we propose an empirical value learning algorithm for average MDPs with continuous states and actions that combines empirical value iteration with n function-parametric approximation and approximation of transition probability distribution with kernel density estimation. We view each iteration as operation of random operator and argue convergence using the probabilistic contraction analysis method that the authors (along with others) have recently developed.
more » « less
Full Text Available
Approximate Relative Value Learning for Average-reward Continuous State MDPs

Sharma, Hiteshi; Jafarnia-Jahromi, Mehdi; Jain, Rahul (July 2019, Proceedings UAI)

In this paper, we propose an approximate rela- tive value learning (ARVL) algorithm for non- parametric MDPs with continuous state space and finite actions and average reward criterion. It is a sampling based algorithm combined with kernel density estimation and function approx- imation via nearest neighbors. The theoreti- cal analysis is done via a random contraction operator framework and stochastic dominance argument. This is the first such algorithm for continuous state space MDPs with average re- ward criteria with these provable properties which does not require any discretization of state space as far as we know. We then eval- uate the proposed algorithm on a benchmark problem numerically.
more » « less
Full Text Available
An Empirical Relative Value Learning Algorithm for Non-parametric MDPs with Continuous State Space

https://doi.org/10.23919/ECC.2019.8795982

Sharma, Hiteshi; Jain, Rahul; Gupta, Abhishek (June 2019, 2019 18th European Control Conference (ECC))

We propose an empirical relative value learning (ERVL) algorithm for non-parametric MDPs with continuous state space and finite actions and average reward criterion. The ERVL algorithm relies on function approximation via nearest neighbors, and minibatch samples for value function update. It is universal (will work for any MDP), computationally quite simple and yet provides arbitrarily good approximation with high probability in finite time. This is the first such algorithm for non-parametric (and continuous state space) MDPs with average reward criteria with these provable properties as far as we know. Numerical evaluation on a benchmark problem of optimal replacement suggests good performance.
more » « less
Full Text Available

Search for: All records