Average-Reward Soft Actor-Critic

Adamczyk, Jacob; Makarenko, Volodymyr; Tiomkin, Stas; Kulkarni, Rahul V

Citation Details

This content will become publicly available on May 9, 2026

Average-Reward Soft Actor-Critic

The average-reward formulation of reinforcement learning (RL) has drawn increased interest in recent years for its ability to solve temporally-extended problems without relying on discounting. Meanwhile, in the discounted setting, algorithms with entropy regularization have been developed, leading to improvements over deterministic methods. Despite the distinct benefits of these approaches, deep RL algorithms for the entropy-regularized average-reward objective have not been developed. While policy-gradient based approaches have recently been presented for the average-reward literature, the corresponding actor-critic framework remains less explored. In this paper, we introduce an average-reward soft actor-critic algorithm to address these gaps in the field. We validate our method by comparing with existing average-reward algorithms on standard RL benchmarks, achieving superior performance for the average-reward criterion. more »

Award ID(s):: 2019786

PAR ID:: 10620823

Author(s) / Creator(s):: Adamczyk, Jacob; Makarenko, Volodymyr; Tiomkin, Stas; Kulkarni, Rahul V

Publisher / Repository:: Open Review

Date Published:: 2025-05-09

Format(s):: Medium: X

Location:: https://openreview.net/forum?id=ywygpSXlHG#discussion

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on May 9, 2026
Conference Paper:
The DOI is not currently available.

More Like this