The ODE method for asymptotic statistics in stochastic approximation and reinforcement learning

Borkar, Vivek; Chen, Shuhang; Devraj, Adithya; Kontoyiannis, Ioannis; Meyn, Sean

doi:10.1214/24-AAP2132

Citation Details

This content will become publicly available on April 1, 2026

The ODE method for asymptotic statistics in stochastic approximation and reinforcement learning

The paper concerns the stochastic approximation recursion, \[ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) \,,\quad n\ge 0, \] where the {\em estimates} $$\{ \theta_n\} $$ evolve on $$\Re^d$$, and $$\bfPhi \eqdef \{ \Phi_n \}$$ is a stochastic process on a general state space, satisfying a conditional Markov property that allows for parameter-dependent noise. In addition to standard Lipschitz assumptions and conditions on the vanishing step-size sequence, it is assumed that the associated \textit{mean flow} $$ \ddt \odestate_t = \barf(\odestate_t)$$ is globally asymptotically stable, with stationary point denoted $$\theta^*$$. The main results are established under additional conditions on the mean flow and an extension of the Donsker-Varadhan Lyapunov drift condition known as~(DV3): (i) A Lyapunov function is constructed for the joint process $$\{\theta_n,\Phi_n\}$$ that implies convergence of the estimates in $$L_4$$. (ii) A functional central limit theorem (CLT) is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance $$\Expect [ z_n z_n^\transpose ]$$ to the asymptotic covariance $$\SigmaTheta$$ in the CLT, where $$z_n\eqdef (\theta_n-\theta^*)/\sqrt{\alpha_n}$$. (iii) The CLT holds for the normalized averaged parameters $$\zPR_n\eqdef \sqrt{n} (\thetaPR_n -\theta^*)$$, with $$\thetaPR_n \eqdef n^{-1} \sum_{k=1}^n\theta_k$$, subject to standard assumptions on the step-size. Moreover, the covariance of $$\zPR_n$$ converges to $$\SigmaPR$$, the minimal covariance of Polyak and Ruppert. (iv) An example is given where $$f$$ and $$\barf$$ are linear in $$\theta$$, and $$\bfPhi$$ is a geometrically ergodic Markov chain but does not satisfy~(DV3). While the algorithm is convergent, the second moment of $$\theta_n$$ is unbounded and in fact diverges. more »

Award ID(s):: 2306023

PAR ID:: 10611361

Author(s) / Creator(s):: Borkar, Vivek; Chen, Shuhang; Devraj, Adithya; Kontoyiannis, Ioannis; Meyn, Sean

Corporate Creator(s):: Adobe

Editor(s):: Ramanan, Kavita

Publisher / Repository:: Ann. Appl. Probab.

Date Published:: 2025-04-01

Journal Name:: The Annals of Applied Probability

Edition / Version:: 1

Volume:: 35

Issue:: 2

ISSN:: 1050-5164

Page Range / eLocation ID:: 1-47

Subject(s) / Keyword(s):: applications of Markov chains , reinforcement learning and adaptive control , stochastic approximation

Format(s):: Medium: X Size: 700kb Other: pdf

Size(s):: 700kb

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on April 1, 2026
Journal Article:
https://doi.org/10.1214/24-AAP2132

More Like this