FRONT: Foresighted Online Policy Optimization with Interference

Xiang, Liner; Wang, Jiayi; Cai, Hengrui

Citation Details

This content will become publicly available on June 10, 2026

FRONT: Foresighted Online Policy Optimization with Interference

Contextual bandits, which leverage baseline features of sequentially arriving individuals to optimize cumulative rewards while balancing exploration and exploitation, are critical for online decision-making. Existing approaches typically assume no interference, where each individual’s action affects only their own reward. Yet, such an assumption can be violated in many practical scenarios, and the oversight of interference can lead to short-sighted policies that focus solely on maximizing the immediate outcomes for individuals, which further results in suboptimal decisions and potentially increased regret over time. To address this significant gap, we introduce the foresighted online policy with interference (FRONT) that innovatively considers the long-term impact of the current decision on subsequent decisions and rewards. more »

Award ID(s):: 2401271

PAR ID:: 10613416

Author(s) / Creator(s):: Xiang, Liner; Wang, Jiayi; Cai, Hengrui

Publisher / Repository:: Reinforcement Learning Journal

Date Published:: 2025-06-10

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 10, 2026
Workshop Report:
The DOI is not currently available.

More Like this