skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Softmax policy gradient methods can take exponential time to converge
Abstract The softmax policy gradient (PG) method, which performs gradient ascent under softmax policy parameterization, is arguably one of the de facto implementations of policy optimization in modern reinforcement learning. For$$\gamma $$ γ -discounted infinite-horizon tabular Markov decision processes (MDPs), remarkable progress has recently been achieved towards establishing global convergence of softmax PG methods in finding a near-optimal policy. However, prior results fall short of delineating clear dependencies of convergence rates on salient parameters such as the cardinality of the state space$${\mathcal {S}}$$ S and the effective horizon$$\frac{1}{1-\gamma }$$ 1 1 - γ , both of which could be excessively large. In this paper, we deliver a pessimistic message regarding the iteration complexity of softmax PG methods, despite assuming access to exact gradient computation. Specifically, we demonstrate that the softmax PG method with stepsize$$\eta $$ η can take$$\begin{aligned} \frac{1}{\eta } |{\mathcal {S}}|^{2^{\Omega \big (\frac{1}{1-\gamma }\big )}} ~\text {iterations} \end{aligned}$$ 1 η | S | 2 Ω ( 1 1 - γ ) iterations to converge, even in the presence of a benign policy initialization and an initial state distribution amenable to exploration (so that the distribution mismatch coefficient is not exceedingly large). This is accomplished by characterizing the algorithmic dynamics over a carefully-constructed MDP containing only three actions. Our exponential lower bound hints at the necessity of carefully adjusting update rules or enforcing proper regularization in accelerating PG methods.  more » « less
Award ID(s):
2143215 2106778 2106739 1907661 2218773 2221009 2218713 1900140 2100158 2014279
PAR ID:
10392524
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Mathematical Programming
Volume:
201
Issue:
1-2
ISSN:
0025-5610
Format(s):
Medium: X Size: p. 707-802
Size(s):
p. 707-802
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We consider the Cauchy problem for the logarithmically singular surface quasi-geostrophic (SQG) equation, introduced by Ohkitani,$$\begin{aligned} \begin{aligned} \partial _t \theta - \nabla ^\perp \log (10+(-\Delta )^{\frac{1}{2}})\theta \cdot \nabla \theta = 0, \end{aligned} \end{aligned}$$ t θ - log ( 10 + ( - Δ ) 1 2 ) θ · θ = 0 , and establish local existence and uniqueness of smooth solutions in the scale of Sobolev spaces with exponent decreasing with time. Such a decrease of the Sobolev exponent is necessary, as we have shown in the companion paper (Chae et al. in Illposedness via degenerate dispersion for generalized surface quasi-geostrophic equations with singular velocities,arXiv:2308.02120) that the problem is strongly ill-posed in any fixed Sobolev spaces. The time dependence of the Sobolev exponent can be removed when there is a dissipation term strictly stronger than log. These results improve wellposedness statements by Chae et al. (Comm Pure Appl Math 65(8):1037–1066, 2012). This well-posedness result can be applied to describe the long-time dynamics of the$$\delta $$ δ -SQG equations, defined by$$\begin{aligned} \begin{aligned} \partial _t \theta + \nabla ^\perp (10+(-\Delta )^{\frac{1}{2}})^{-\delta }\theta \cdot \nabla \theta = 0, \end{aligned} \end{aligned}$$ t θ + ( 10 + ( - Δ ) 1 2 ) - δ θ · θ = 0 , for all sufficiently small$$\delta >0$$ δ > 0 depending on the size of the initial data. For the same range of$$\delta $$ δ , we establish global well-posedness of smooth solutions to the dissipative SQG equations. 
    more » « less
  2. Abstract This article revisits the problem of global well-posedness for the generalized parabolic Anderson model on$$\mathbb {R}^+\times \mathbb {T}^2$$ R + × T 2 within the framework of paracontrolled calculus (Gubinelli et al. in Forum Math, 2015). The model is given by the equation:$$\begin{aligned} (\partial _t-\Delta ) u=F(u)\eta \end{aligned}$$ ( t - Δ ) u = F ( u ) η where$$\eta \in C^{-1-\kappa }$$ η C - 1 - κ with$$1/6>\kappa >0$$ 1 / 6 > κ > 0 , and$$F\in C_b^2(\mathbb {R})$$ F C b 2 ( R ) . Assume that$$\eta \in C^{-1-\kappa }$$ η C - 1 - κ and can be lifted to enhanced noise, we derive new a priori bounds. The key idea follows from the recent work by Chandra et al. (A priori bounds for 2-d generalised Parabolic Anderson Model,,2024), to represent the leading error term as a transport type term, and our techniques encompass the paracontrolled calculus, the maximum principle, and the localization approach (i.e. high-low frequency argument). 
    more » « less
  3. A<sc>bstract</sc> Using data samples of 983.0 fb−1and 427.9 fb−1accumulated with the Belle and Belle II detectors operating at the KEKB and SuperKEKB asymmetric-energye+ecolliders, singly Cabibbo-suppressed decays$$ {\Xi}_c^{+}\to p{K}_S^0 $$ Ξ c + p K S 0 ,$$ {\Xi}_c^{+}\to \Lambda {\pi}^{+} $$ Ξ c + Λ π + , and$$ {\Xi}_c^{+}\to {\Sigma}^0{\pi}^{+} $$ Ξ c + Σ 0 π + are observed for the first time. The ratios of branching fractions of$$ {\Xi}_c^{+}\to p{K}_S^0 $$ Ξ c + p K S 0 ,$$ {\Xi}_c^{+}\to \Lambda {\pi}^{+} $$ Ξ c + Λ π + , and$$ {\Xi}_c^{+}\to {\Sigma}^0{\pi}^{+} $$ Ξ c + Σ 0 π + relative to that of$$ {\Xi}_c^{+}\to {\Xi}^{-}{\pi}^{+}{\pi}^{+} $$ Ξ c + Ξ π + π + are measured to be$$ {\displaystyle \begin{array}{c}\frac{\mathcal{B}\left({\Xi}_c^{+}\to p{K}_S^0\right)}{\mathcal{B}\left({\Xi}_c^{+}\to {\Xi}^{-}{\pi}^{+}{\pi}^{+}\right)}=\left(2.47\pm 0.16\pm 0.07\right)\%,\\ {}\frac{\mathcal{B}\left({\Xi}_c^{+}\to \Lambda {\pi}^{+}\right)}{\mathcal{B}\left({\Xi}_c^{+}\to {\Xi}^{-}{\pi}^{+}{\pi}^{+}\right)}=\left(1.56\pm 0.14\pm 0.09\right)\%,\\ {}\frac{\mathcal{B}\left({\Xi}_c^{+}\to {\Sigma}^0{\pi}^{+}\right)}{\mathcal{B}\left({\Xi}_c^{+}\to {\Xi}^{-}{\pi}^{+}{\pi}^{+}\right)}=\left(4.13\pm 0.26\pm 0.22\right)\%.\end{array}} $$ B Ξ c + p K S 0 B Ξ c + Ξ π + π + = 2.47 ± 0.16 ± 0.07 % , B Ξ c + Λ π + B Ξ c + Ξ π + π + = 1.56 ± 0.14 ± 0.09 % , B Ξ c + Σ 0 π + B Ξ c + Ξ π + π + = 4.13 ± 0.26 ± 0.22 % . Multiplying these values by the branching fraction of the normalization channel,$$ \mathcal{B}\left({\Xi}_c^{+}\to {\Xi}^{-}{\pi}^{+}{\pi}^{+}\right)=\left(2.9\pm 1.3\right)\% $$ B Ξ c + Ξ π + π + = 2.9 ± 1.3 % , the absolute branching fractions are determined to be$$ {\displaystyle \begin{array}{c}\mathcal{B}\left({\Xi}_c^{+}\to p{K}_S^0\right)=\left(7.16\pm 0.46\pm 0.20\pm 3.21\right)\times {10}^{-4},\\ {}\mathcal{B}\left({\Xi}_c^{+}\to \Lambda {\pi}^{+}\right)=\left(4.52\pm 0.41\pm 0.26\pm 2.03\right)\times {10}^{-4},\\ {}\mathcal{B}\left({\Xi}_c^{+}\to {\Sigma}^0{\pi}^{+}\right)=\left(1.20\pm 0.08\pm 0.07\pm 0.54\right)\times {10}^{-3}.\end{array}} $$ B Ξ c + p K S 0 = 7.16 ± 0.46 ± 0.20 ± 3.21 × 10 4 , B Ξ c + Λ π + = 4.52 ± 0.41 ± 0.26 ± 2.03 × 10 4 , B Ξ c + Σ 0 π + = 1.20 ± 0.08 ± 0.07 ± 0.54 × 10 3 . The first and second uncertainties above are statistical and systematic, respectively, while the third ones arise from the uncertainty in$$ \mathcal{B}\left({\Xi}_c^{+}\to {\Xi}^{-}{\pi}^{+}{\pi}^{+}\right) $$ B Ξ c + Ξ π + π +
    more » « less
  4. A<sc>bstract</sc> A search for the fully reconstructed$$ {B}_s^0 $$ B s 0 → μ+μγdecay is performed at the LHCb experiment using proton-proton collisions at$$ \sqrt{s} $$ s = 13 TeV corresponding to an integrated luminosity of 5.4 fb−1. No significant signal is found and upper limits on the branching fraction in intervals of the dimuon mass are set$$ {\displaystyle \begin{array}{cc}\mathcal{B}\left({B}_s^0\to {\mu}^{+}{\mu}^{-}\gamma \right)<4.2\times {10}^{-8},& m\left({\mu}^{+}{\mu}^{-}\right)\in \left[2{m}_{\mu },1.70\right]\textrm{GeV}/{c}^2,\\ {}\mathcal{B}\left({B}_s^0\to {\mu}^{+}{\mu}^{-}\gamma \right)<7.7\times {10}^{-8},&\ m\left({\mu}^{+}{\mu}^{-}\right)\in \left[\textrm{1.70,2.88}\right]\textrm{GeV}/{c}^2,\\ {}\mathcal{B}\left({B}_s^0\to {\mu}^{+}{\mu}^{-}\gamma \right)<4.2\times {10}^{-8},& m\left({\mu}^{+}{\mu}^{-}\right)\in \left[3.92,{m}_{B_s^0}\right]\textrm{GeV}/{c}^2,\end{array}} $$ B B s 0 μ + μ γ < 4.2 × 10 8 , m μ + μ 2 m μ 1.70 GeV / c 2 , B B s 0 μ + μ γ < 7.7 × 10 8 , m μ + μ 1.70, 2.88 GeV / c 2 , B B s 0 μ + μ γ < 4.2 × 10 8 , m μ + μ 3.92 m B s 0 GeV / c 2 , at 95% confidence level. Additionally, upper limits are set on the branching fraction in the [2mμ,1.70] GeV/c2dimuon mass region excluding the contribution from the intermediateϕ(1020) meson, and in the region combining all dimuon-mass intervals. 
    more » « less
  5. A<sc>bstract</sc> We present a study of$$ {\Xi}_c^0\to {\Xi}^0{\pi}^0 $$ Ξ c 0 Ξ 0 π 0 ,$$ {\Xi}_c^0\to {\Xi}^0\eta $$ Ξ c 0 Ξ 0 η , and$$ {\Xi}_c^0\to {\Xi}^0{\eta}^{\prime } $$ Ξ c 0 Ξ 0 η decays using the Belle and Belle II data samples, which have integrated luminosities of 980 fb−1and 426 fb−1, respectively. We measure the following relative branching fractions$$ {\displaystyle \begin{array}{c}\mathcal{B}\left({\Xi}_c^0\to {\Xi}^0{\pi}^0\right)/\mathcal{B}\left({\Xi}_c^0\to {\Xi}^{-}{\pi}^{+}\right)=0.48\pm 0.02\left(\textrm{stat}\right)\pm 0.03\left(\textrm{syst}\right),\\ {}\mathcal{B}\left({\Xi}_c^0\to {\Xi}^0\eta \right)/\mathcal{B}\left({\Xi}_c^0\to {\Xi}^{-}{\pi}^{+}\right)=0.11\pm 0.01\left(\textrm{stat}\right)\pm 0.01\left(\textrm{syst}\right),\\ {}\mathcal{B}\left({\Xi}_c^0\to {\Xi}^0{\eta}^{\prime}\right)/\mathcal{B}\left({\Xi}_c^0\to {\Xi}^{-}{\pi}^{+}\right)=0.08\pm 0.02\left(\textrm{stat}\right)\pm 0.01\left(\textrm{syst}\right)\end{array}} $$ B Ξ c 0 Ξ 0 π 0 / B Ξ c 0 Ξ π + = 0.48 ± 0.02 stat ± 0.03 syst , B Ξ c 0 Ξ 0 η / B Ξ c 0 Ξ π + = 0.11 ± 0.01 stat ± 0.01 syst , B Ξ c 0 Ξ 0 η / B Ξ c 0 Ξ π + = 0.08 ± 0.02 stat ± 0.01 syst for the first time, where the uncertainties are statistical (stat) and systematic (syst). By multiplying by the branching fraction of the normalization mode,$$ \mathcal{B}\left({\Xi}_c^0\to {\Xi}^{-}{\pi}^{+}\right) $$ B Ξ c 0 Ξ π + , we obtain the following absolute branching fraction results$$ {\displaystyle \begin{array}{c}\mathcal{B}\left({\Xi}_c^0\to {\Xi}^0{\pi}^0\right)=\left(6.9\pm 0.3\left(\textrm{stat}\right)\pm 0.5\left(\textrm{syst}\right)\pm 1.3\left(\operatorname{norm}\right)\right)\times {10}^{-3},\\ {}\mathcal{B}\left({\Xi}_c^0\to {\Xi}^0\eta \right)=\left(1.6\pm 0.2\left(\textrm{stat}\right)\pm 0.2\left(\textrm{syst}\right)\pm 0.3\left(\operatorname{norm}\right)\right)\times {10}^{-3},\\ {}\mathcal{B}\left({\varXi}_c^0\to {\Xi}^0{\eta}^{\prime}\right)=\left(1.2\pm 0.3\left(\textrm{stat}\right)\pm 0.1\left(\textrm{syst}\right)\pm 0.2\left(\operatorname{norm}\right)\right)\times {10}^{-3},\end{array}} $$ B Ξ c 0 Ξ 0 π 0 = 6.9 ± 0.3 stat ± 0.5 syst ± 1.3 norm × 10 3 , B Ξ c 0 Ξ 0 η = 1.6 ± 0.2 stat ± 0.2 syst ± 0.3 norm × 10 3 , B Ξ c 0 Ξ 0 η = 1.2 ± 0.3 stat ± 0.1 syst ± 0.2 norm × 10 3 , where the third uncertainties are from$$ \mathcal{B}\left({\Xi}_c^0\to {\Xi}^{-}{\pi}^{+}\right) $$ B Ξ c 0 Ξ π + . The asymmetry parameter for$$ {\Xi}_c^0\to {\Xi}^0{\pi}^0 $$ Ξ c 0 Ξ 0 π 0 is measured to be$$ \alpha \left({\Xi}_c^0\to {\Xi}^0{\pi}^0\right)=-0.90\pm 0.15\left(\textrm{stat}\right)\pm 0.23\left(\textrm{syst}\right) $$ α Ξ c 0 Ξ 0 π 0 = 0.90 ± 0.15 stat ± 0.23 syst
    more » « less