skip to main content


Search for: All records

Creators/Authors contains: "Shen, Ruoqi"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. We study the convergence rate of discretized Riemannian Hamiltonian Monte Carlo on sampling from distributions in the form of e^{−f(x)} on a convex body M ⊂ R^n. We show that for distributions in the form of e−^{a x} on a polytope with m constraints, the convergence rate of a family of commonly-used integrators is independent of ∥a∥_2 and the geometry of the polytope. In particular, the implicit midpoint method (IMM) and the generalized Leapfrog method (LM) have a mixing time of mn^3 to achieve ϵ total variation distance to the target distribution. These guarantees are based on a general bound on the convergence rate for densities of the form e^{−f(x)} in terms of parameters of the manifold and the integrator. Our theoretical guarantee complements the empirical results of our old result, which shows that RHMC with IMM can sample ill-conditioned, non-smooth and constrained distributions in very high dimension efficiently in practice. 
    more » « less
    Free, publicly-accessible full text available June 12, 2024
  2. We study algorithms using randomized value functions for exploration in reinforcement learning. This type of algorithms enjoys appealing empirical performance. We show that when we use 1) a single random seed in each episode, and 2) a Bernstein-type magnitude of noise, we obtain a worst-case O~(H√SAT) regret bound for episodic time-inhomogeneous Markov Decision Process where S is the size of state space, A is the size of action space, H is the planning horizon and T is the number of interactions. This bound polynomially improves all existing bounds for algorithms based on randomized value functions, and for the first time, matches the Ω(H√SAT) lower bound up to logarithmic factors. Our result highlights that randomized exploration can be near-optimal, which was previously achieved only by optimistic algorithms. To achieve the desired result, we develop 1) a new clipping operation to ensure both the probability of being optimistic and the probability of being pessimistic are lower bounded by a constant, and 2) a new recursive formula for the absolute value of estimation errors to analyze the regret. 
    more » « less
  3. null (Ed.)