NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The ODE method for asymptotic statistics in stochastic approximation and reinforcement learning

https://doi.org/10.1214/24-AAP2132

Borkar, Vivek; Chen, Shuhang; Devraj, Adithya; Kontoyiannis, Ioannis; Meyn, Sean (April 2025, The Annals of Applied Probability)
Ramanan, Kavita (Ed.)
The paper concerns the stochastic approximation recursion, \[ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) \,,\quad n\ge 0, \] where the {\em estimates} $$\{ \theta_n\} $$ evolve on $$\Re^d$$, and $$\bfPhi \eqdef \{ \Phi_n \}$$ is a stochastic process on a general state space, satisfying a conditional Markov property that allows for parameter-dependent noise. In addition to standard Lipschitz assumptions and conditions on the vanishing step-size sequence, it is assumed that the associated \textit{mean flow} $$ \ddt \odestate_t = \barf(\odestate_t)$$ is globally asymptotically stable, with stationary point denoted $$\theta^*$$. The main results are established under additional conditions on the mean flow and an extension of the Donsker-Varadhan Lyapunov drift condition known as~(DV3): (i) A Lyapunov function is constructed for the joint process $$\{\theta_n,\Phi_n\}$$ that implies convergence of the estimates in $$L_4$$. (ii) A functional central limit theorem (CLT) is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance $$\Expect [ z_n z_n^\transpose ]$$ to the asymptotic covariance $$\SigmaTheta$$ in the CLT, where $$z_n\eqdef (\theta_n-\theta^*)/\sqrt{\alpha_n}$$. (iii) The CLT holds for the normalized averaged parameters $$\zPR_n\eqdef \sqrt{n} (\thetaPR_n -\theta^*)$$, with $$\thetaPR_n \eqdef n^{-1} \sum_{k=1}^n\theta_k$$, subject to standard assumptions on the step-size. Moreover, the covariance of $$\zPR_n$$ converges to $$\SigmaPR$$, the minimal covariance of Polyak and Ruppert. (iv) An example is given where $$f$$ and $$\barf$$ are linear in $$\theta$$, and $$\bfPhi$$ is a geometrically ergodic Markov chain but does not satisfy~(DV3). While the algorithm is convergent, the second moment of $$\theta_n$$ is unbounded and in fact diverges.
more » « less
Free, publicly-accessible full text available April 1, 2026
Markovian Foundations for Quasi-Stochastic Approximation

https://doi.org/10.1137/23M1588172

Lauand, Caio Kalil; Meyn, Sean (February 2025, SIAM Journal on Control and Optimization)

Free, publicly-accessible full text available February 28, 2026
Markovian Foundations for Quasi-Stochastic Approximation in Two Timescales

https://doi.org/10.1109/CDC56724.2024.10886821

Lauand, Caio Kalil; Meyn, Sean (December 2024, Proceedings of the IEEE Conference on Decision Control)

Free, publicly-accessible full text available December 16, 2025
Reinforcement Learning Design for Quickest Change Detection

https://doi.org/10.1109/CDC56724.2024.10886003

Cooper, Austin; Meyn, Sean (December 2024, IEEE)

Free, publicly-accessible full text available December 16, 2025
Quickest Change Detection Using Mismatched CUSUM Extended Abstract

https://doi.org/10.1109/Allerton63246.2024.10735265

Cooper, Austin; Meyn, Sean (September 2024, IEEE)

Full Text Available
Revisiting Step-Size Assumptions in Stochastic Approximation

Lauand, Caio; Meyn, Sean (June 2024, arXiv)

Submitted for publication, and arXiv 2405.17834
more » « less
Full Text Available
Reinforcement Learning Design for Quickest Change Detection

Cooper, Austin; Meyn, Sean (March 2024, arXiv)

Submitted for publication, and arXiv preprint arXiv:2403.14109
more » « less
Full Text Available
The Projected Bellman Equation in Reinforcement Learning

https://doi.org/10.1109/TAC.2024.3409647

Meyn, Sean (January 2024, IEEE Transactions on Automatic Control)
Astolfi, Alessandro (Ed.)
Q-learning has become an important part of the reinforcement learning toolkit since its introduction in the dissertation of Chris Watkins in the 1980s. In the original tabular formulation, the goal is to compute exactly a solution to the discounted-cost optimality equation, and thereby obtain the optimal policy for a Markov Decision Process. The goal today is more modest: obtain an approximate solution within a prescribed function class. The standard algorithms are based on the same architecture as formulated in the 1980s, with the goal of finding a value function approximation that solves the so-called projected Bellman equation. While reinforcement learning has been an active research area for over four decades, there is little theory providing conditions for convergence of these Q-learning algorithms, or even existence of a solution to this equation. The purpose of this paper is to show that a solution to the projected Bellman equation does exist, provided the function class is linear and the input used for training is a form of epsilon-greedy policy with sufficiently small epsilon. Moreover, under these conditions it is shown that the Q-learning algorithm is stable, in terms of bounded parameter estimates. Convergence remains one of many open topics for research.
more » « less
Full Text Available

Search for: All records