Primal-Dual Spectral Representation for Off-policy Evaluation

Hu, Yang; Chen, Tianyi; Li, Na; Wang, Kai; Dai, Bo

Citation Details

This content will become publicly available on May 3, 2026

Primal-Dual Spectral Representation for Off-policy Evaluation

Off-policy evaluation (OPE) is one of the most fundamental problems in reinforcement learning (RL) to estimate the expected long-term payoff of a given target policy with \emph{only} experiences from another behavior policy that is potentially unknown. The distribution correction estimation (DICE) family of estimators have advanced the state of the art in OPE by breaking the \emph{curse of horizon}. However, the major bottleneck of applying DICE estimators lies in the difficulty of solving the saddle-point optimization involved, especially with neural network implementations. In this paper, we tackle this challenge by establishing a \emph{linear representation} of value function and stationary distribution correction ratio, \emph{i.e.}, primal and dual variables in the DICE framework, using the spectral decomposition of the transition operator. Such primal-dual representation not only bypasses the non-convex non-concave optimization in vanilla DICE, therefore enabling an computational efficient algorithm, but also paves the way for more efficient utilization of historical data. We highlight that our algorithm, \textbf{SpectralDICE}, is the first to leverage the linear representation of primal-dual variables that is both computation and sample efficient, the performance of which is supported by a rigorous theoretical sample complexity guarantee and a thorough empirical evaluation on various benchmarks. more »

Award ID(s):: 2401391 2403240

PAR ID:: 10600587

Author(s) / Creator(s):: Hu, Yang; Chen, Tianyi; Li, Na; Wang, Kai; Dai, Bo

Editor(s):: Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz

Publisher / Repository:: PMLR

Date Published:: 2025-05-03

Volume:: 258

Page Range / eLocation ID:: 3808-3816

Subject(s) / Keyword(s):: Spectral Representation Primal-Dual Off-policy Evaluation Reinforcement Learning

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on May 3, 2026
Conference Paper:
The DOI is not currently available.

More Like this