Off-line Estimation of Controlled Markov Chains: Minimaxity and Sample Complexity

Banerjee, Imon; Honnappa, Harsha; Rao, Vinayak

doi:10.1287/opre.2023.0046

Citation Details

This content will become publicly available on February 21, 2026

Off-line Estimation of Controlled Markov Chains: Minimaxity and Sample Complexity

New Insights into Off-line Estimation for Controlled Markov Chains Unveiled A team of researchers from Purdue and Northwestern Universities have unveiled new findings in off-line estimation for controlled Markov chains, addressing challenges in analyzing complex data generated under arbitrary dynamics. The study introduces a nonparametric estimator for transition probabilities, showcasing its robustness even in nonstationary, non-Markovian environments. The team developed precise sample complexity bounds, revealing a delicate interplay between mixing properties of the logging policy and data set size. Their analysis highlights how achieving optimal statistical risk depends on this trade-off, broadening the scope of off-line estimation under diverse conditions. Examples include ergodic and weakly ergodic chains as well as controlled chains with episodic or greedy controls. Significantly, this research confirms that the widely used estimator, which calculates state–action transition ratios, is minimax optimal, ensuring its reliability in general scenarios. This advancement paves the way for improved evaluation of stationary Markov control policies, marking a breakthrough in understanding complex off-line systems. more »

Award ID(s):: 2143752

PAR ID:: 10613259

Author(s) / Creator(s):: Banerjee, Imon; Honnappa, Harsha; Rao, Vinayak

Publisher / Repository:: INFORMS

Date Published:: 2025-02-21

Journal Name:: Operations Research

ISSN:: 0030-364X

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on February 21, 2026
Journal Article:
https://doi.org/10.1287/opre.2023.0046

More Like this