MDP Geometry, Normalization and Reward Balancing Solvers

Mustafin, Arsenii; Pakharev, Aleksei; Olshevsky, Alex; Paschalidis, Ioannis

Citation Details

This content will become publicly available on March 10, 2026

MDP Geometry, Normalization and Reward Balancing Solvers

We present a new geometric interpretation of Markov Decision Processes (MDPs) with a natural normalization procedure that allows us to adjust the value function at each state without altering the advantage of any action with respect to any policy. This advantage-preserving transformation of the MDP motivates a class of algorithms which we call Reward Balancing, which solve MDPs by iterating through these transformations, until an approximately optimal policy can be trivially found. We provide a convergence analysis of several algorithms in this class, in particular showing that for MDPs for unknown transition probabilities we can improve upon state-of-the-art sample complexity results. more »

Award ID(s):: 2240848 2317079 2245059

PAR ID:: 10578796

Author(s) / Creator(s):: Mustafin, Arsenii; Pakharev, Aleksei; Olshevsky, Alex; Paschalidis, Ioannis

Publisher / Repository:: Proceedings of AISTATS (28th International Conference on Artificial Intelligence and Statistics)

Date Published:: 2025-03-10

ISSN:: 2640-3498

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on March 10, 2026
Conference Paper:
The DOI is not currently available.

More Like this