Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
We study the convergence rate of discretized Riemannian Hamiltonian Monte Carlo on sampling from distributions in the form of e^{−f(x)} on a convex body M ⊂ R^n. We show that for distributions in the form of e−^{a x} on a polytope with m constraints, the convergence rate of a family of commonly-used integrators is independent of ∥a∥_2 and the geometry of the polytope. In particular, the implicit midpoint method (IMM) and the generalized Leapfrog method (LM) have a mixing time of mn^3 to achieve ϵ total variation distance to the target distribution. These guarantees are based on a general bound on the convergence rate for densities of the form e^{−f(x)} in terms of parameters of the manifold and the integrator. Our theoretical guarantee complements the empirical results of our old result, which shows that RHMC with IMM can sample ill-conditioned, non-smooth and constrained distributions in very high dimension efficiently in practice.more » « less
-
We study algorithms using randomized value functions for exploration in reinforcement learning. This type of algorithms enjoys appealing empirical performance. We show that when we use 1) a single random seed in each episode, and 2) a Bernstein-type magnitude of noise, we obtain a worst-case O~(H√SAT) regret bound for episodic time-inhomogeneous Markov Decision Process where S is the size of state space, A is the size of action space, H is the planning horizon and T is the number of interactions. This bound polynomially improves all existing bounds for algorithms based on randomized value functions, and for the first time, matches the Ω(H√SAT) lower bound up to logarithmic factors. Our result highlights that randomized exploration can be near-optimal, which was previously achieved only by optimistic algorithms. To achieve the desired result, we develop 1) a new clipping operation to ensure both the probability of being optimistic and the probability of being pessimistic are lower bounded by a constant, and 2) a new recursive formula for the absolute value of estimation errors to analyze the regret.more » « less