Q-Functionals For Value-Based Continuous Control

Lobel, S; Rammohan, S; He, B; Yu, S; Konidaris, GD

Citation Details

We present Q-functionals, an alternative architecture for continuous control deep reinforcement learning. Instead of returning a single value for a state-action pair, our network transforms a state into a function that can be rapidly evaluated in parallel for many actions, allowing us to efficiently choose high-value actions through sampling. This contrasts with the typical architecture of off-policy continuous control, where a policy network is trained for the sole purpose of selecting actions from the Q-function. We represent our action-dependent Q-function as a weighted sum of basis functions (Fourier, Polynomial, etc) over the action space, where the weights are state-dependent and output by the Q-functional network. Fast sampling makes practical a variety of techniques that require Monte-Carlo integration over Q-functions, and enables action-selection strategies besides simple value-maximization. We characterize our framework, describe various implementations of Q-functionals, and demonstrate strong performance on a suite of continuous control tasks. more »

Award ID(s):: 1844960 1717569 1955361

NSF-PAR ID:: 10404719

Author(s) / Creator(s):: Lobel, S; Rammohan, S; He, B; Yu, S; Konidaris, GD

Date Published:: 2023-02-01

Journal Name:: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this