Monte Carlo Beam Search for Actor-Critic Reinforcement Learning in Continuous Control

Alzorgan, Hazim; Razi, Abolfazl

Citation Details

This content will become publicly available on June 27, 2026

Monte Carlo Beam Search for Actor-Critic Reinforcement Learning in Continuous Control

Actor-critic methods, like Twin Delayed Deep Deterministic Policy Gradient (TD3), depend on basic noise-based exploration, which can result in less than optimal policy convergence. In this study, we introduce Monte Carlo Beam Search (MCBS), a new hybrid method that combines beam search and Monte Carlo rollouts with TD3 to improve exploration and action selection. MCBS produces several candidate actions around the policy's output and assesses them through short-horizon rollouts, enabling the agent to make better-informed choices. We test MCBS across various continuous-control benchmarks, including HalfCheetah-v4, Walker2d-v5, and Swimmer-v5, showing enhanced sample efficiency and performance compared to standard TD3 and other baseline methods like SAC, PPO, and A2C. Our findings emphasize MCBS's capability to enhance policy learning through structured look-ahead search while ensuring computational efficiency. Additionally, we offer a detailed analysis of crucial hyperparameters, such as beam width and rollout depth, and explore adaptive strategies to optimize MCBS for complex control tasks. Our method shows a higher convergence rate across different environments compared to TD3, SAC, PPO, and A2C. For instance, we achieved 90% of the maximum achievable reward within around 200 thousand timesteps compared to 400 thousand timesteps for the second-best method. more »

Award ID(s):: 2204721 2120485 1755984

PAR ID:: 10654018

Author(s) / Creator(s):: Alzorgan, Hazim; Razi, Abolfazl

Publisher / Repository:: ACM Ubiquitous Robotics

Date Published:: 2025-06-27

Format(s):: Medium: X

Location:: College Station, Texas, US

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 27, 2026
Conference Paper:
The DOI is not currently available.

More Like this