NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MDP Geometry, Normalization and Reward Balancing Solvers

Mustafin, Arsenii; Pakharev, Aleksei; Olshevsky, Alex; Paschalidis, Ioannis (March 2025, Proceedings of AISTATS (28th International Conference on Artificial Intelligence and Statistics))

We present a new geometric interpretation of Markov Decision Processes (MDPs) with a natural normalization procedure that allows us to adjust the value function at each state without altering the advantage of any action with respect to any policy. This advantage-preserving transformation of the MDP motivates a class of algorithms which we call Reward Balancing, which solve MDPs by iterating through these transformations, until an approximately optimal policy can be trivially found. We provide a convergence analysis of several algorithms in this class, in particular showing that for MDPs for unknown transition probabilities we can improve upon state-of-the-art sample complexity results.
more » « less
Free, publicly-accessible full text available March 10, 2026
Closing the gap between SVRG and TD-SVRG with Gradient Splitting

Mustafin, A; Olshevsky, A; Paschalidis, IC (August 2024, Transactions on Machine Learning Research)

Full Text Available
One-Shot Averaging for Distributed TD (λ) Under Markov Sampling

https://doi.org/10.1109/LCSYS.2024.3412648

Tian, Haoxing; Paschalidis, Ioannis Ch; Olshevsky, Alex (June 2024, IEEE Control Systems Letters)

Full Text Available
Convergence of Actor-Critic with Multi-Layer Neural Networks

Tian, H.; Olshevsky, A.; Paschalidis, I.C. (December 2023, Advances in neural information processing systems)

Full Text Available
Convergence of Actor-Critic with Multi-Layer Neural Networks

H. Tian, A. Olshevsky (December 2023, Advances in neural information processing systems)

Full Text Available
A Small Gain Analysis of Single Timescale Actor Critic

https://doi.org/10.1137/22M1483335

Olshevsky, Alex; Gharesifard, Bahman (April 2023, SIAM Journal on Control and Optimization)

Full Text Available
Distributed TD(0) With Almost No Communication

https://doi.org/10.1109/LCSYS.2023.3287952

Liu, Rui; Olshevsky, Alex (January 2023, IEEE Control Systems Letters)

Full Text Available

Search for: All records