Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning Using Passive Langevin Dynamics

Snow, Luke; Krishnamurthy, Vikram

doi:10.1109/TIT.2025.3555479

Citation Details

This content will become publicly available on June 1, 2026

Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning Using Passive Langevin Dynamics

This paper provides a finite-sample analysis of a passive stochastic gradient Langevin dynamics (PSGLD) algo- rithm. This algorithm is designed to achieve adaptive inverse reinforcement learning (IRL). Adaptive IRL aims to estimate the cost function of a forward learner performing a stochastic gradient algorithm (e.g., policy gradient reinforcement learning) by observing their estimates in real-time. The PSGLD algorithm is considered passive because it incorporates noisy gradients provided by an external stochastic gradient algorithm (forward learner), of which it has no control. The PSGLD algorithm acts as a randomized sampler to achieve adaptive IRL by reconstructing the forward learner’s cost function nonparametrically from the stationary measure of a Langevin diffusion. This paper analyzes the non-asymptotic (finite-sample) performance; we provide explicit bounds on the 2-Wasserstein distance between PSGLD algorithm sample measure and the stationary measure encoding the cost function, and provide guarantees for a kernel density estimation scheme which reconstructs the cost function from empirical samples. Our analysis uses tools from the study of Markov diffusion operators. The derived bounds have both practical and theoretical significance. They provide finite-time guarantees for an adaptive IRL mechanism, and substantially generalize the analytical framework of a line of research in passive stochastic gradient algorithms. more »

Award ID(s):: 2312198

PAR ID:: 10607960

Author(s) / Creator(s):: Snow, Luke; Krishnamurthy, Vikram

Publisher / Repository:: IEEE

Date Published:: 2025-06-01

Journal Name:: IEEE Transactions on Information Theory

Volume:: 71

Issue:: 6

ISSN:: 0018-9448

Page Range / eLocation ID:: 4637 to 4670

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 1, 2026
Journal Article:
https://doi.org/10.1109/TIT.2025.3555479

More Like this