- NSF-PAR ID:
- 10284150
- Date Published:
- Journal Name:
- Journal of Inverse and Ill-posed Problems
- Volume:
- 0
- Issue:
- 0
- ISSN:
- 0928-0219
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
We study model-free reinforcement learning (RL) algorithms for infinite-horizon average-reward Markov decision process (MDP), which is more appropriate for applications that involve continuing operations not divided into episodes. In contrast to episodic/discounted MDPs, theoretical understanding of model-free RL algorithms is relatively inadequate for the average-reward setting. In this paper, we consider both the online setting and the setting with access to a simulator. We develop computationally efficient model-free algorithms that achieve sharper guarantees on regret/sample complexity compared with existing results. In the online setting, we design an algorithm, UCB-AVG, based on an optimistic variant of variance-reduced Q-learning. We show that UCB-AVG achieves a regret bound $\widetilde{O}(S^5A^2sp(h^*)\sqrt{T})$ after $T$ steps, where $S\times A$ is the size of state-action space, and $sp(h^*)$ the span of the optimal bias function. Our result provides the first computationally efficient model-free algorithm that achieves the optimal dependence in $T$ (up to log factors) for weakly communicating MDPs, which is necessary for low regret. In contrast, prior results either are suboptimal in $T$ or require strong assumptions of ergodicity or uniformly mixing of MDPs. In the simulator setting, we adapt the idea of UCB-AVG to develop a model-free algorithm that finds an $\epsilon$-optimal policy with sample complexity $\widetilde{O}(SAsp^2(h^*)\epsilon^{-2} + S^2Asp(h^*)\epsilon^{-1}).$ This sample complexity is near-optimal for weakly communicating MDPs, in view of the minimax lower bound $\Omega(SAsp(^*)\epsilon^{-2})$. Existing work mainly focuses on ergodic MDPs and the results typically depend on $t_{mix},$ the worst-case mixing time induced by a policy. We remark that the diameter $D$ and mixing time $t_{mix}$ are both lower bounded by $sp(h^*)$, and $t_{mix}$ can be arbitrarily large for certain MDPs. On the technical side, our approach integrates two key ideas: learning an $\gamma$-discounted MDP as an approximation, and leveraging reference-advantage decomposition for variance in optimistic Q-learning. As recognized in prior work, a naive approximation by discounted MDPs results in suboptimal guarantees. A distinguishing feature of our method is maintaining estimates of value-difference between state pairs to provide a sharper bound on the variance of reference advantage. We also crucially use a careful choice of the discounted factor $\gamma$ to balance approximation error due to discounting and the statistical learning error, and we are able to maintain a good-quality reference value function with $O(SA)$ space complexity.more » « less
-
Abstract It has been recently established in David and Mayboroda (Approximation of green functions and domains with uniformly rectifiable boundaries of all dimensions.
arXiv:2010.09793 ) that on uniformly rectifiable sets the Green function is almost affine in the weak sense, and moreover, in some scenarios such Green function estimates are equivalent to the uniform rectifiability of a set. The present paper tackles a strong analogue of these results, starting with the “flagship degenerate operators on sets with lower dimensional boundaries. We consider the elliptic operators associated to a domain$$L_{\beta ,\gamma } =- {\text {div}}D^{d+1+\gamma -n} \nabla $$ with a uniformly rectifiable boundary$$\Omega \subset {\mathbb {R}}^n$$ of dimension$$\Gamma $$ , the now usual distance to the boundary$$d < n-1$$ given by$$D = D_\beta $$ for$$D_\beta (X)^{-\beta } = \int _{\Gamma } |X-y|^{-d-\beta } d\sigma (y)$$ , where$$X \in \Omega $$ and$$\beta >0$$ . In this paper we show that the Green function$$\gamma \in (-1,1)$$ G for , with pole at infinity, is well approximated by multiples of$$L_{\beta ,\gamma }$$ , in the sense that the function$$D^{1-\gamma }$$ satisfies a Carleson measure estimate on$$\big | D\nabla \big (\ln \big ( \frac{G}{D^{1-\gamma }} \big )\big )\big |^2$$ . We underline that the strong and the weak results are different in nature and, of course, at the level of the proofs: the latter extensively used compactness arguments, while the present paper relies on some intricate integration by parts and the properties of the “magical distance function from David et al. (Duke Math J, to appear).$$\Omega $$ -
Abstract We investigate the rigidity of global minimizers u ≥ 0 u\ge 0 of the Alt-Phillips functional involving negative power potentials ∫ Ω ( ∣ ∇ u ∣ 2 + u − γ χ { u > 0 } ) d x , γ ∈ ( 0 , 2 ) , \mathop{\int }\limits_{\Omega }(| \nabla u{| }^{2}+{u}^{-\gamma }{\chi }_{\left\{u\gt 0\right\}}){\rm{d}}x,\hspace{1.0em}\gamma \in \left(0,2), when the exponent γ \gamma is close to the extremes of the admissible values. In particular, we show that global minimizers in R n {{\mathbb{R}}}^{n} are one-dimensional if γ \gamma is close to 2 and n ≤ 7 n\le 7 , or if γ \gamma is close to 0 and n ≤ 4 n\le 4 .more » « less
-
Abstract Using extensive numerical simulation of the Navier–Stokes equations, we study the transition from the Darcy’s law for slow flow of fluids through a disordered porous medium to the nonlinear flow regime in which the effect of inertia cannot be neglected. The porous medium is represented by two-dimensional slices of a three-dimensional image of a sandstone. We study the problem over wide ranges of porosity and the Reynolds number, as well as two types of boundary conditions, and compute essential features of fluid flow, namely, the strength of the vorticity, the effective permeability of the pore space, the frictional drag, and the relationship between the macroscopic pressure gradient
and the fluid velocity$${\varvec{\nabla }}P$$ v . The results indicate that when the Reynolds number Re is low enough that the Darcy’s law holds, the magnitude of the vorticity is nearly zero. As Re increases, however, so also does$$\omega _z$$ , and its rise from nearly zero begins at the same Re at which the Darcy’s law breaks down. We also show that a nonlinear relation between the macroscopic pressure gradient and the fluid velocity$$\omega _z$$ v , given by, , provides accurate representation of the numerical data, where$$-{\varvec{\nabla }}P=(\mu /K_e)\textbf{v}+\beta _n\rho |\textbf{v}|^2\textbf{v}$$ and$$\mu$$ are the fluid’s viscosity and density,$$\rho$$ is the effective Darcy permeability in the linear regime, and$$K_e$$ is a generalized nonlinear resistance. Theoretical justification for the relation is presented, and its predictions are also compared with those of the Forchheimer’s equation.$$\beta _n$$ -
null (Ed.)Most of existing statistical theories on deep neural networks have sample complexities cursed by the data dimension and therefore cannot well explain the empirical success of deep learning on high-dimensional data. To bridge this gap, we propose to exploit the low-dimensional structures of the real world datasets and establish theoretical guarantees of convolutional residual networks (ConvResNet) in terms of function approximation and statistical recovery for binary classification problem. Specifically, given the data lying on a 𝑑-dimensional manifold isometrically embedded in ℝ^𝐷, we prove that if the network architecture is properly chosen, ConvResNets can (1) approximate Besov functions on manifolds with arbitrary accuracy, and (2) learn a classifier by minimizing the empirical logistic risk, which gives an excess risk in the order of 𝑛−2s/(2s+d), where 𝑠 is a smoothness parameter. This implies that the sample complexity depends on the intrinsic dimension 𝑑, instead of the data dimension 𝐷. Our results demonstrate that ConvResNets are adaptive to low-dimensional structures of data sets.more » « less