skip to main content


Title: Softmax policy gradient methods can take exponential time to converge
Abstract

The softmax policy gradient (PG) method, which performs gradient ascent under softmax policy parameterization, is arguably one of the de facto implementations of policy optimization in modern reinforcement learning. For$$\gamma $$γ-discounted infinite-horizon tabular Markov decision processes (MDPs), remarkable progress has recently been achieved towards establishing global convergence of softmax PG methods in finding a near-optimal policy. However, prior results fall short of delineating clear dependencies of convergence rates on salient parameters such as the cardinality of the state space$${\mathcal {S}}$$Sand the effective horizon$$\frac{1}{1-\gamma }$$11-γ, both of which could be excessively large. In this paper, we deliver a pessimistic message regarding the iteration complexity of softmax PG methods, despite assuming access to exact gradient computation. Specifically, we demonstrate that the softmax PG method with stepsize$$\eta $$ηcan take$$\begin{aligned} \frac{1}{\eta } |{\mathcal {S}}|^{2^{\Omega \big (\frac{1}{1-\gamma }\big )}} ~\text {iterations} \end{aligned}$$1η|S|2Ω(11-γ)iterationsto converge, even in the presence of a benign policy initialization and an initial state distribution amenable to exploration (so that the distribution mismatch coefficient is not exceedingly large). This is accomplished by characterizing the algorithmic dynamics over a carefully-constructed MDP containing only three actions. Our exponential lower bound hints at the necessity of carefully adjusting update rules or enforcing proper regularization in accelerating PG methods.

 
more » « less
Award ID(s):
2143215 2106778 2106739 1907661 2218773
NSF-PAR ID:
10392524
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Mathematical Programming
Volume:
201
Issue:
1-2
ISSN:
0025-5610
Format(s):
Medium: X Size: p. 707-802
Size(s):
["p. 707-802"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The free multiplicative Brownian motion$$b_{t}$$btis the large-Nlimit of the Brownian motion on$$\mathsf {GL}(N;\mathbb {C}),$$GL(N;C),in the sense of$$*$$-distributions. The natural candidate for the large-Nlimit of the empirical distribution of eigenvalues is thus the Brown measure of$$b_{t}$$bt. In previous work, the second and third authors showed that this Brown measure is supported in the closure of a region$$\Sigma _{t}$$Σtthat appeared in the work of Biane. In the present paper, we compute the Brown measure completely. It has a continuous density$$W_{t}$$Wton$$\overline{\Sigma }_{t},$$Σ¯t,which is strictly positive and real analytic on$$\Sigma _{t}$$Σt. This density has a simple form in polar coordinates:$$\begin{aligned} W_{t}(r,\theta )=\frac{1}{r^{2}}w_{t}(\theta ), \end{aligned}$$Wt(r,θ)=1r2wt(θ),where$$w_{t}$$wtis an analytic function determined by the geometry of the region$$\Sigma _{t}$$Σt. We show also that the spectral measure of free unitary Brownian motion$$u_{t}$$utis a “shadow” of the Brown measure of$$b_{t}$$bt, precisely mirroring the relationship between the circular and semicircular laws. We develop several new methods, based on stochastic differential equations and PDE, to prove these results.

     
    more » « less
  2. Abstract

    It has been recently established in David and Mayboroda (Approximation of green functions and domains with uniformly rectifiable boundaries of all dimensions.arXiv:2010.09793) that on uniformly rectifiable sets the Green function is almost affine in the weak sense, and moreover, in some scenarios such Green function estimates are equivalent to the uniform rectifiability of a set. The present paper tackles a strong analogue of these results, starting with the “flagship degenerate operators on sets with lower dimensional boundaries. We consider the elliptic operators$$L_{\beta ,\gamma } =- {\text {div}}D^{d+1+\gamma -n} \nabla $$Lβ,γ=-divDd+1+γ-nassociated to a domain$$\Omega \subset {\mathbb {R}}^n$$ΩRnwith a uniformly rectifiable boundary$$\Gamma $$Γof dimension$$d < n-1$$d<n-1, the now usual distance to the boundary$$D = D_\beta $$D=Dβgiven by$$D_\beta (X)^{-\beta } = \int _{\Gamma } |X-y|^{-d-\beta } d\sigma (y)$$Dβ(X)-β=Γ|X-y|-d-βdσ(y)for$$X \in \Omega $$XΩ, where$$\beta >0$$β>0and$$\gamma \in (-1,1)$$γ(-1,1). In this paper we show that the Green functionGfor$$L_{\beta ,\gamma }$$Lβ,γ, with pole at infinity, is well approximated by multiples of$$D^{1-\gamma }$$D1-γ, in the sense that the function$$\big | D\nabla \big (\ln \big ( \frac{G}{D^{1-\gamma }} \big )\big )\big |^2$$|D(ln(GD1-γ))|2satisfies a Carleson measure estimate on$$\Omega $$Ω. We underline that the strong and the weak results are different in nature and, of course, at the level of the proofs: the latter extensively used compactness arguments, while the present paper relies on some intricate integration by parts and the properties of the “magical distance function from David et al. (Duke Math J, to appear).

     
    more » « less
  3. Abstract

    Based on the recent development of the framework of Volterra rough paths (Harang and Tindel in Stoch Process Appl 142:34–78, 2021), we consider here the probabilistic construction of the Volterra rough path associated to the fractional Brownian motion with$$H>\frac{1}{2}$$H>12and for the standard Brownian motion. The Volterra kernelk(ts) is allowed to be singular, and behaving similar to$$|t-s|^{-\gamma }$$|t-s|-γfor some$$\gamma \ge 0$$γ0. The construction is done in both the Stratonovich and Itô senses. It is based on a modified Garsia–Rodemich–Romsey lemma which is of interest in its own right, as well as tools from Malliavin calculus. A discussion of challenges and potential extensions is provided.

     
    more » « less
  4. Abstract

    We consider integral area-minimizing 2-dimensional currents$T$Tin$U\subset \mathbf {R}^{2+n}$UR2+nwith$\partial T = Q\left [\!\![{\Gamma }\right ]\!\!]$T=QΓ, where$Q\in \mathbf {N} \setminus \{0\}$QN{0}and$\Gamma $Γis sufficiently smooth. We prove that, if$q\in \Gamma $qΓis a point where the density of$T$Tis strictly below$\frac{Q+1}{2}$Q+12, then the current is regular at$q$q. The regularity is understood in the following sense: there is a neighborhood of$q$qin which$T$Tconsists of a finite number of regular minimal submanifolds meeting transversally at$\Gamma $Γ(and counted with the appropriate integer multiplicity). In view of well-known examples, our result is optimal, and it is the first nontrivial generalization of a classical theorem of Allard for$Q=1$Q=1. As a corollary, if$\Omega \subset \mathbf {R}^{2+n}$ΩR2+nis a bounded uniformly convex set and$\Gamma \subset \partial \Omega $ΓΩa smooth 1-dimensional closed submanifold, then any area-minimizing current$T$Twith$\partial T = Q \left [\!\![{\Gamma }\right ]\!\!]$T=QΓis regular in a neighborhood of $\Gamma $Γ.

     
    more » « less
  5. Abstract

    Finite volume, weighted essentially non-oscillatory (WENO) schemes require the computation of a smoothness indicator. This can be expensive, especially in multiple space dimensions. We consider the use of the simple smoothness indicator$$\sigma ^{\textrm{S}}= \frac{1}{N_{\textrm{S}}-1}\sum _{j} ({\bar{u}}_{j} - {\bar{u}}_{m})^2$$σS=1NS-1j(u¯j-u¯m)2, where$$N_{\textrm{S}}$$NSis the number of mesh elements in the stencil,$${\bar{u}}_j$$u¯jis the local function average over mesh elementj, and indexmgives the target element. Reconstructions utilizing standard WENO weighting fail with this smoothness indicator. We develop a modification of WENO-Z weighting that gives a reliable and accurate reconstruction of adaptive order, which we denote as SWENOZ-AO. We prove that it attains the order of accuracy of the large stencil polynomial approximation when the solution is smooth, and drops to the order of the small stencil polynomial approximations when there is a jump discontinuity in the solution. Numerical examples in one and two space dimensions on general meshes verify the approximation properties of the reconstruction. They also show it to be about 10 times faster in two space dimensions than reconstructions using the classic smoothness indicator. The new reconstruction is applied to define finite volume schemes to approximate the solution of hyperbolic conservation laws. Numerical tests show results of the same quality as standard WENO schemes using the classic smoothness indicator, but with an overall speedup in the computation time of about 3.5–5 times in 2D tests. Moreover, the computational efficiency (CPU time versus error) is noticeably improved.

     
    more » « less