 NSFPAR ID:
 10346735
 Date Published:
 Journal Name:
 Biometrika
 Volume:
 109
 Issue:
 1
 ISSN:
 00063444
 Page Range / eLocation ID:
 181 to 194
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

We propose a fast stochastic Hamilton Monte Carlo (HMC) method, for sampling from a smooth and strongly logconcave distribution. At the core of our proposed method is a variance reduction technique inspired by the recent advance in stochastic optimization. We show that, to achieve $\epsilon$ accuracy in 2Wasserstein distance, our algorithm achieves $\tilde O\big(n+\kappa^{2}d^{1/2}/\epsilon+\kappa^{4/3}d^{1/3}n^{2/3}/\epsilon^{2/3}%\wedge\frac{\kappa^2L^{2}d\sigma^2}{\epsilon^2} \big)$ gradient complexity (i.e., number of component gradient evaluations), which outperforms the stateoftheart HMC and stochastic gradient HMC methods in a wide regime. We also extend our algorithm for sampling from smooth and general logconcave distributions, and prove the corresponding gradient complexity as well. Experiments on both synthetic and real data demonstrate the superior performance of our algorithm.more » « less

We study the convergence rate of discretized Riemannian Hamiltonian Monte Carlo on sampling from distributions in the form of e^{−f(x)} on a convex body M ⊂ R^n. We show that for distributions in the form of e−^{a x} on a polytope with m constraints, the convergence rate of a family of commonlyused integrators is independent of ∥a∥_2 and the geometry of the polytope. In particular, the implicit midpoint method (IMM) and the generalized Leapfrog method (LM) have a mixing time of mn^3 to achieve ϵ total variation distance to the target distribution. These guarantees are based on a general bound on the convergence rate for densities of the form e^{−f(x)} in terms of parameters of the manifold and the integrator. Our theoretical guarantee complements the empirical results of our old result, which shows that RHMC with IMM can sample illconditioned, nonsmooth and constrained distributions in very high dimension efficiently in practice.more » « less

This paper concerns error bounds for recursive equations subject to Markovian disturbances. Motivating examples abound within the fields of Markov chain Monte Carlo (MCMC) and Reinforcement Learning (RL), and many of these algorithms can be interpreted as special cases of stochastic approximation (SA). It is argued that it is not possible in general to obtain a Hoeffding bound on the error sequence, even when the underlying Markov chain is reversible and geometrically ergodic, such as the M/M/1 queue. This is motivation for the focus on mean square error bounds for parameter estimates. It is shown that mean square error achieves the optimal rate of $O(1/n)$, subject to conditions on the stepsize sequence. Moreover, the exact constants in the rate are obtained, which is of great value in algorithm designmore » « less

We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution p over n elements, an explicitly given distribution q, and parameters 0< epsilon, delta < 1, we wish to distinguish, with probability at least 1delta, whether the distributions are identical versus epsilonfar in total variation distance. Most prior work focused on the case that delta = Omega(1), for which the sample complexity of identity testing is known to be Theta(sqrt{n}/epsilon^2). Given such an algorithm, one can achieve arbitrarily small values of delta via blackbox amplification, which multiplies the required number of samples by Theta(log(1/delta)). We show that blackbox amplification is suboptimal for any delta = o(1), and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is Theta((1/epsilon^2) (sqrt{n log(1/delta)} + log(1/delta))) for any n, epsilon, and delta. For the special case of uniformity testing, where the given distribution is the uniform distribution U_n over the domain, our new tester is surprisingly simple: to test whether p = U_n versus d_{TV} (p, U_n) >= epsilon, we simply threshold d_{TV}({p^}, U_n), where {p^} is the empirical probability distribution. The fact that this simple "plugin" estimator is sampleoptimal is surprising, even in the constant delta case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of epsilon and delta. An important contribution of this work lies in the analysis techniques that we introduce in this context. First, we exploit an underlying strong convexity property to bound from below the expectation gap in the completeness and soundness cases. Second, we give a new, fast method for obtaining provably correct empirical estimates of the true worstcase failure probability for a broad class of uniformity testing statistics over all possible input distributions  including all previously studied statistics for this problem. We believe that our novel analysis techniques will be useful for other distribution testing problems as well.more » « less

Summary We study the properties of points in [0,1]d generated by applying Hilbert's space filling curve to uniformly distributed points in [0, 1]. For deterministic sampling we obtain a discrepancy of O(n−1/d) for d⩾2. For random stratified sampling, and scrambled van der Corput points, we derive a meansquared error of O(n−1−2/d) for integration of Lipschitz continuous integrands, when d⩾3. These rates are the same as those obtained by sampling on ddimensional grids and they show a deterioration with increasing d. The rate for Lipschitz functions is, however, the best possible at that level of smoothness and is better than plain independent and identically distributed sampling. Unlike grids, space filling curve sampling provides points at any desired sample size, and the van der Corput version is extensible in n. We also introduce a class of piecewise Lipschitz functions whose discontinuities are in rectifiable sets described via Minkowski content. Although these functions may have infinite variation in the sense of Hardy and Krause, they can be integrated with a meansquared error of O(n−1−1/d). It was previously known only that the rate was o(n−1). Other space filling curves, such as those due to Sierpinski and Peano, also attain these rates, whereas upper bounds for the Lebesgue curve are somewhat worse, as if the dimension were log2(3) times as high.