skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on April 1, 2026

Title: The ODE method for asymptotic statistics in stochastic approximation and reinforcement learning
The paper concerns the stochastic approximation recursion, \[ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) \,,\quad n\ge 0, \] where the {\em estimates} $$\{ \theta_n\} $$ evolve on $$\Re^d$$, and $$\bfPhi \eqdef \{ \Phi_n \}$$ is a stochastic process on a general state space, satisfying a conditional Markov property that allows for parameter-dependent noise. In addition to standard Lipschitz assumptions and conditions on the vanishing step-size sequence, it is assumed that the associated \textit{mean flow} $$ \ddt \odestate_t = \barf(\odestate_t)$$ is globally asymptotically stable, with stationary point denoted $$\theta^*$$. The main results are established under additional conditions on the mean flow and an extension of the Donsker-Varadhan Lyapunov drift condition known as~(DV3): (i) A Lyapunov function is constructed for the joint process $$\{\theta_n,\Phi_n\}$$ that implies convergence of the estimates in $$L_4$$. (ii) A functional central limit theorem (CLT) is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance $$\Expect [ z_n z_n^\transpose ]$$ to the asymptotic covariance $$\SigmaTheta$$ in the CLT, where $$z_n\eqdef (\theta_n-\theta^*)/\sqrt{\alpha_n}$$. (iii) The CLT holds for the normalized averaged parameters $$\zPR_n\eqdef \sqrt{n} (\thetaPR_n -\theta^*)$$, with $$\thetaPR_n \eqdef n^{-1} \sum_{k=1}^n\theta_k$$, subject to standard assumptions on the step-size. Moreover, the covariance of $$\zPR_n$$ converges to $$\SigmaPR$$, the minimal covariance of Polyak and Ruppert. (iv) An example is given where $$f$$ and $$\barf$$ are linear in $$\theta$$, and $$\bfPhi$$ is a geometrically ergodic Markov chain but does not satisfy~(DV3). While the algorithm is convergent, the second moment of $$\theta_n$$ is unbounded and in fact diverges.  more » « less
Award ID(s):
2306023
PAR ID:
10611361
Author(s) / Creator(s):
; ; ; ;
Corporate Creator(s):
Editor(s):
Ramanan, Kavita
Publisher / Repository:
Ann. Appl. Probab.
Date Published:
Journal Name:
The Annals of Applied Probability
Edition / Version:
1
Volume:
35
Issue:
2
ISSN:
1050-5164
Page Range / eLocation ID:
1-47
Subject(s) / Keyword(s):
applications of Markov chains , reinforcement learning and adaptive control , stochastic approximation
Format(s):
Medium: X Size: 700kb Other: pdf
Size(s):
700kb
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We undertake a precise study of the asymptotic and non-asymptotic properties of stochastic approximation procedures with Polyak-Ruppert averaging for solving a linear system $$\bar{A} \theta = \bar{b}$$. When the matrix $$\bar{A}$$ is Hurwitz, we prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity. The CLT characterizes the exact asymptotic covariance matrix, which is the sum of the classical Polyak-Ruppert covariance and a correction term that scales with the step size. Under assumptions on the tail of the noise distribution, we prove a non-asymptotic concentration inequality whose main term matches the covariance in CLT in any direction, up to universal constants. When the matrix $$\bar{A}$$ is not Hurwitz but only has non-negative real parts in its eigenvalues, we prove that the averaged LSA procedure actually achieves an $O(1/T)$ rate in mean-squared error. Our results provide a more refined understanding of linear stochastic approximation in both the asymptotic and non-asymptotic settings. We also show various applications of the main results, including the study of momentum-based stochastic gradient methods as well as temporal difference algorithms in reinforcement learning. 
    more » « less
  2. Abernethy, Jacob; Agarwal, Agarwal (Ed.)
    Motivated by problems in controlled experiments, we study the discrepancy of random matrices with continuous entries where the number of columns $$n$$ is much larger than the number of rows $$m$$. Our first result shows that if $$\omega(1) = m = o(n)$$, a matrix with i.i.d. standard Gaussian entries has discrepancy $$\Theta(\sqrt{n} \, 2^{-n/m})$$ with high probability. This provides sharp guarantees for Gaussian discrepancy in a regime that had not been considered before in the existing literature. Our results also apply to a more general family of random matrices with continuous i.i.d. entries, assuming that $$m = O(n/\log{n})$$. The proof is non-constructive and is an application of the second moment method. Our second result is algorithmic and applies to random matrices whose entries are i.i.d. and have a Lipschitz density. We present a randomized polynomial-time algorithm that achieves discrepancy $$e^{-\Omega(\log^2(n)/m)}$$ with high probability, provided that $$m = O(\sqrt{\log{n}})$$. In the one-dimensional case, this matches the best known algorithmic guarantees due to Karmarkar–Karp. For higher dimensions $$2 \leq m = O(\sqrt{\log{n}})$$, this establishes the first efficient algorithm achieving discrepancy smaller than $$O( \sqrt{m} )$$. 
    more » « less
  3. Motivated by robust and quantile regression problems, we investigate the stochastic gradient descent (SGD) algorithm for minimizing an objective functionfthat is locally strongly convex with a sub--quadratic tail. This setting covers many widely used online statistical methods. We introduce a novel piecewise Lyapunov function that enables us to handle functionsfwith only first-order differentiability, which includes a wide range of popular loss functions such as Huber loss. Leveraging our proposed Lyapunov function, we derive finite-time moment bounds under general diminishing stepsizes, as well as constant stepsizes. We further establish the weak convergence, central limit theorem and bias characterization under constant stepsize, providing the first geometrical convergence result for sub--quadratic SGD. Our results have wide applications, especially in online statistical methods. In particular, we discuss two applications of our results. 1) Online robust regression: We consider a corrupted linear model with sub--exponential covariates and heavy--tailed noise. Our analysis provides convergence rates comparable to those for corrupted models with Gaussian covariates and noise. 2) Online quantile regression: Importantly, our results relax the common assumption in prior work that the conditional density is continuous and provide a more fine-grained analysis for the moment bounds. 
    more » « less
  4. null (Ed.)
    Stochastic approximation (SA) algorithms are recursive techniques used to obtain the roots of functions that can be expressed as expectations of a noisy parameterized family of functions. In this paper two new SA algorithms are introduced: 1) PolSA, an extension of Polyak's momentum technique with a specially designed matrix momentum, and 2) NeSA, which can either be regarded as a variant of Nesterov's acceleration method, or a simplification of PolSA. The rates of convergence of SA algorithms is well understood. Under special conditions, the mean square error of the parameter estimates is bounded by σ 2 /n+o(1/n), where σ 2 ≥ 0 is an identifiable constant. If these conditions fail, the rate is typically sub-linear. There are two well known SA algorithms that ensure a linear rate, with minimal value of variance, σ 2 : the Ruppert-Polyak averaging technique, and the stochastic Newton-Raphson (SNR) algorithm. It is demonstrated here that under mild technical assumptions, the PolSA algorithm also achieves this optimality criteria. This result is established via novel coupling arguments: It is shown that the parameter estimates obtained from the PolSA algorithm couple with those of the optimal variance (but computationally more expensive) SNR algorithm, at a rate O(1/n 2 ). The newly proposed algorithms are extended to a reinforcement learning setting to obtain new Q-learning algorithms, and numerical results confirm the coupling of PolSA and SNR. 
    more » « less
  5. In this work, we describe a generic approach to show convergence with high probability for both stochastic convex and non-convex optimization with sub-Gaussian noise. In previous works for convex optimization, either the convergence is only in expectation or the bound depends on the diameter of the domain. Instead, we show high probability convergence with bounds depending on the initial distance to the optimal solution. The algorithms use step sizes analogous to the standard settings and are universal to Lipschitz functions, smooth functions, and their linear combinations. The method can be applied to the non-convex case. We demonstrate an $$O((1+\sigma^{2}\log(1/\delta))/T+\sigma/\sqrt{T})$$ convergence rate when the number of iterations $$T$$ is known and an $$O((1+\sigma^{2}\log(T/\delta))/\sqrt{T})$$ convergence rate when $$T$$ is unknown for SGD, where $$1-\delta$$ is the desired success probability. These bounds improve over existing bounds in the literature. We also revisit AdaGrad-Norm (Ward et al., 2019) and show a new analysis to obtain a high probability bound that does not require the bounded gradient assumption made in previous works. The full version of our paper contains results for the standard per-coordinate AdaGrad. 
    more » « less