The softmax policy gradient (PG) method, which performs gradient ascent under softmax policy parameterization, is arguably one of the de facto implementations of policy optimization in modern reinforcement learning. For
The free multiplicative Brownian motion
- Award ID(s):
- 2055340
- Publication Date:
- NSF-PAR ID:
- 10372851
- Journal Name:
- Probability Theory and Related Fields
- Volume:
- 184
- Issue:
- 1-2
- Page Range or eLocation-ID:
- p. 209-273
- ISSN:
- 0178-8051
- Publisher:
- Springer Science + Business Media
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -discounted infinite-horizon tabular Markov decision processes (MDPs), remarkable progress has recently been achieved towards establishing global convergence of softmax PG methods in finding a near-optimal policy. However, prior results fall short of delineating clear dependencies of convergence rates on salient parameters such as the cardinality of the state space$$\gamma $$ and the effective horizon$${\mathcal {S}}$$ , both of which could be excessively large. In this paper, we deliver a pessimistic message regarding the iteration complexity of softmax PG methods, despite assuming access to exact gradient computation. Specifically, we demonstrate that the softmax PG method with stepsize$$\frac{1}{1-\gamma }$$ can take$$\eta $$ to converge, even in the presence of a benign policy initialization and an initial state distribution amenable to exploration (so that the distribution mismatch coefficient is not exceedingly large). This is accomplished by characterizing the algorithmic dynamics over a carefully-constructed MDP containing only three actions. Our exponential lower bound hints at the necessity of carefully adjusting update rules or enforcing proper regularization inmore »$$\begin{aligned} \frac{1}{\eta } |{\mathcal {S}}|^{2^{\Omega \big (\frac{1}{1-\gamma }\big )}} ~\text {iterations} \end{aligned}$$ -
Abstract The numerical analysis of stochastic parabolic partial differential equations of the form
is surveyed, where$$\begin{aligned} du + A(u)\, dt = f \,dt + g \, dW, \end{aligned}$$ A is a nonlinear partial operator andW a Brownian motion. This manuscript unifies much of the theory developed over the last decade into a cohesive framework which integrates techniques for the approximation of deterministic partial differential equations with methods for the approximation of stochastic ordinary differential equations. The manuscript is intended to be accessible to audiences versed in either of these disciplines, and examples are presented to illustrate the applicability of the theory. -
Abstract Motivated by the Rudnick-Sarnak theorem we study limiting distribution of smoothed local correlations of the form
for the Circular United Ensemble of random matrices for sufficiently smooth test functions.$$\begin{aligned} \sum _{j_1, j_2, \ldots , j_n} f(N(\theta _{j_2}-\theta _{j_1}), N(\theta _{j_3}-\theta _{j_1}), \ldots , N(\theta _{j_n}-\theta _{j_1})) \end{aligned}$$ -
Abstract It has been recently established in David and Mayboroda (Approximation of green functions and domains with uniformly rectifiable boundaries of all dimensions.
arXiv:2010.09793 ) that on uniformly rectifiable sets the Green function is almost affine in the weak sense, and moreover, in some scenarios such Green function estimates are equivalent to the uniform rectifiability of a set. The present paper tackles a strong analogue of these results, starting with the “flagship degenerate operators on sets with lower dimensional boundaries. We consider the elliptic operators associated to a domain$$L_{\beta ,\gamma } =- {\text {div}}D^{d+1+\gamma -n} \nabla $$ with a uniformly rectifiable boundary$$\Omega \subset {\mathbb {R}}^n$$ of dimension$$\Gamma $$ , the now usual distance to the boundary$$d < n-1$$ given by$$D = D_\beta $$ for$$D_\beta (X)^{-\beta } = \int _{\Gamma } |X-y|^{-d-\beta } d\sigma (y)$$ , where$$X \in \Omega $$ and$$\beta >0$$ . In this paper we show that the Green function$$\gamma \in (-1,1)$$ G for , with pole at infinity, is well approximated by multiples of$$L_{\beta ,\gamma }$$ , in the sense that the function$$D^{1-\gamma }$$ satisfies a Carleson measure estimate on$$\big | D\nabla \big (\ln \big ( \frac{G}{D^{1-\gamma }} \big )\big )\big |^2$$ . We underline that the strong and the weak results are different in nature and, of course, at the levelmore »$$\Omega $$ -
Abstract Finite volume, weighted essentially non-oscillatory (WENO) schemes require the computation of a smoothness indicator. This can be expensive, especially in multiple space dimensions. We consider the use of the simple smoothness indicator
, where$$\sigma ^{\textrm{S}}= \frac{1}{N_{\textrm{S}}-1}\sum _{j} ({\bar{u}}_{j} - {\bar{u}}_{m})^2$$ is the number of mesh elements in the stencil,$$N_{\textrm{S}}$$ is the local function average over mesh element$${\bar{u}}_j$$ j , and indexm gives the target element. Reconstructions utilizing standard WENO weighting fail with this smoothness indicator. We develop a modification of WENO-Z weighting that gives a reliable and accurate reconstruction of adaptive order, which we denote as SWENOZ-AO. We prove that it attains the order of accuracy of the large stencil polynomial approximation when the solution is smooth, and drops to the order of the small stencil polynomial approximations when there is a jump discontinuity in the solution. Numerical examples in one and two space dimensions on general meshes verify the approximation properties of the reconstruction. They also show it to be about 10 times faster in two space dimensions than reconstructions using the classic smoothness indicator. The new reconstruction is applied to define finite volume schemes to approximate the solution of hyperbolic conservation laws. Numerical tests show results of the same quality as standard WENOmore »