skip to main content


Title: Coarse-Grained Smoothness for Reinforcement Learning in Metric Spaces
Principled decision-making in continuous state-action spaces is impossible without some assumptions. A common approach is to assume Lipschitz continuity of the Q-function. We show that, unfortunately, this property fails to hold in many typical domains. We propose a new coarse-grained smoothness definition that generalizes the notion of Lipschitz continuity, is more widely applicable, and allows us to compute significantly tighter bounds on Q-functions, leading to improved learning. We provide a theoretical analysis of our new smoothness definition, and discuss its implications and impact on control and exploration in continuous domains.  more » « less
Award ID(s):
1844960 1717569 1955361
NSF-PAR ID:
10404720
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the 26th International Conference on Artificial Intelligence and Statistics
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Image registration is an essential task in medical image analysis. We propose two novel unsupervised diffeomorphic image registration networks, which use deep Residual Networks (ResNets) as numerical approximations of the underlying continuous diffeomorphic setting governed by ordinary differential equations (ODEs), viewed as a Eulerian discretization scheme. While considering the ODE-based parameterizations of diffeomorphisms, we consider both stationary and non-stationary (time varying) velocity fields as the driving velocities to solve the ODEs, which give rise to our two proposed architectures for diffeomorphic registration. We also employ Lipschitz-continuity on the Residual Networks in both architectures to define the admissible Hilbert space of velocity fields as a Reproducing Kernel Hilbert Spaces (RKHS) and regularize the smoothness of the velocity fields. We apply both registration networks to align and segment the OASIS brain MRI dataset. Experimental results demonstrate that our models are computational efficient and achieve comparable registration results with a smoother deformation field. 
    more » « less
  2. Abstract Let Ω ⊂ ℝ n + 1 {\Omega\subset\mathbb{R}^{n+1}} , n ≥ 2 {n\geq 2} , be a 1-sided non-tangentially accessible domain (aka uniform domain), that is, Ω satisfies the interior Corkscrew and Harnack chain conditions, which are respectively scale-invariant/quantitative versions of openness and path-connectedness. Let us assume also that Ω satisfies the so-called capacity density condition, a quantitative version of the fact that all boundary points are Wiener regular. Consider L 0 ⁢ u = - div ⁢ ( A 0 ⁢ ∇ ⁡ u ) {L_{0}u=-\mathrm{div}(A_{0}\nabla u)} , L ⁢ u = - div ⁢ ( A ⁢ ∇ ⁡ u ) {Lu=-\mathrm{div}(A\nabla u)} , two real (non-necessarily symmetric) uniformly elliptic operators in Ω, and write ω L 0 {\omega_{L_{0}}} , ω L {\omega_{L}} for the respective associated elliptic measures. The goal of this program is to find sufficient conditions guaranteeing that ω L {\omega_{L}} satisfies an A ∞ {A_{\infty}} -condition or a RH q {\mathrm{RH}_{q}} -condition with respect to ω L 0 {\omega_{L_{0}}} . In this paper we establish that if the discrepancy of the two matrices satisfies a natural Carleson measure condition with respect to ω L 0 {\omega_{L_{0}}} , then ω L ∈ A ∞ ⁢ ( ω L 0 ) {\omega_{L}\in A_{\infty}(\omega_{L_{0}})} . Additionally, we can prove that ω L ∈ RH q ⁢ ( ω L 0 ) {\omega_{L}\in\mathrm{RH}_{q}(\omega_{L_{0}})} for some specific q ∈ ( 1 , ∞ ) {q\in(1,\infty)} , by assuming that such Carleson condition holds with a sufficiently small constant. This “small constant” case extends previous work of Fefferman–Kenig–Pipher and Milakis–Pipher together with the last author of the present paper who considered symmetric operators in Lipschitz and bounded chord-arc domains, respectively. Here we go beyond those settings, our domains satisfy a capacity density condition which is much weaker than the existence of exterior Corkscrew balls. Moreover, their boundaries need not be Ahlfors regular and the restriction of the n -dimensional Hausdorff measure to the boundary could be even locally infinite. The “large constant” case, that is, the one on which we just assume that the discrepancy of the two matrices satisfies a Carleson measure condition, is new even in the case of nice domains (such as the unit ball, the upper-half space, or non-tangentially accessible domains) and in the case of symmetric operators. We emphasize that our results hold in the absence of a nice surface measure: all the analysis is done with the underlying measure ω L 0 {\omega_{L_{0}}} , which behaves well in the scenarios we are considering. When particularized to the setting of Lipschitz, chord-arc, or 1-sided chord-arc domains, our methods allow us to immediately recover a number of existing perturbation results as well as extend some of them. 
    more » « less
  3. Abstract

    More than three decades ago, Boyd and Balakrishnan established a regularity result for the two-norm of a transfer function at maximizers. Their result extends easily to the statement that the maximum eigenvalue of a univariate real analytic Hermitian matrix family is twice continuously differentiable, with Lipschitz second derivative, at all local maximizers, a property that is useful in several applications that we describe. We also investigate whether this smoothness property extends to max functions more generally. We show that the pointwise maximum of a finite set ofq-times continuously differentiable univariate functions must have zero derivative at a maximizer for$$q=1$$q=1, but arbitrarily close to the maximizer, the derivative may not be defined, even when$$q=3$$q=3and the maximizer is isolated.

     
    more » « less
  4. In the Hidden-Parameter MDP (HiP-MDP) framework, a family of reinforcement learning tasks is generated by varying hidden parameters specifying the dynamics and reward function for each individual task. The HiP-MDP is a natural model for families of tasks in which meta- and lifelong-reinforcement learning approaches can succeed. Given a learned context encoder that infers the hidden parameters from previous experience, most existing algorithms fall into two categories: model transfer and policy transfer, depending on which function the hidden parameters are used to parameterize. We characterize the robustness of model and policy transfer algorithms with respect to hidden parameter estimation error. We first show that the value function of HiP-MDPs is Lipschitz continuous under certain conditions. We then derive regret bounds for both settings through the lens of Lipschitz continuity. Finally, we empirically corroborate our theoretical analysis by varying the hyper-parameters governing the Lipschitz constants of two continuous control problems; the resulting performance is consistent with our theoretical results. 
    more » « less
  5. Summary

    We study the properties of points in [0,1]d generated by applying Hilbert's space filling curve to uniformly distributed points in [0, 1]. For deterministic sampling we obtain a discrepancy of O(n−1/d) for d⩾2. For random stratified sampling, and scrambled van der Corput points, we derive a mean-squared error of O(n−1−2/d) for integration of Lipschitz continuous integrands, when d⩾3. These rates are the same as those obtained by sampling on d-dimensional grids and they show a deterioration with increasing d. The rate for Lipschitz functions is, however, the best possible at that level of smoothness and is better than plain independent and identically distributed sampling. Unlike grids, space filling curve sampling provides points at any desired sample size, and the van der Corput version is extensible in n. We also introduce a class of piecewise Lipschitz functions whose discontinuities are in rectifiable sets described via Minkowski content. Although these functions may have infinite variation in the sense of Hardy and Krause, they can be integrated with a mean-squared error of O(n−1−1/d). It was previously known only that the rate was o(n−1). Other space filling curves, such as those due to Sierpinski and Peano, also attain these rates, whereas upper bounds for the Lebesgue curve are somewhat worse, as if the dimension were log2(3) times as high.

     
    more » « less