skip to main content

Title: Stratification and optimal resampling for sequential Monte Carlo
Summary Sequential Monte Carlo algorithms are widely accepted as powerful computational tools for making inference with dynamical systems. A key step in sequential Monte Carlo is resampling, which plays the role of steering the algorithm towards the future dynamics. Several strategies have been used in practice, including multinomial resampling, residual resampling, optimal resampling, stratified resampling and optimal transport resampling. In one-dimensional cases, we show that optimal transport resampling is equivalent to stratified resampling on the sorted particles, and both strategies minimize the resampling variance as well as the expected squared energy distance between the original and resampled empirical distributions. For general $d$-dimensional cases, we show that if the particles are first sorted using the Hilbert curve, the variance of stratified resampling is $O(m^{-(1+2/d)})$, an improvement over the best previously known rate of $O(m^{-(1+1/d)})$, where $m$ is the number of resampled particles. We show that this improved rate is optimal for ordered stratified resampling schemes, as conjectured in Gerber et al. (2019). We also present an almost-sure bound on the Wasserstein distance between the original and Hilbert-curve-resampled empirical distributions. In light of these results, we show that for dimension $d>1$ the mean square error of sequential quasi-Monte Carlo with $n$ particles can be $O(n^{-1-4/\{d(d+4)\}})$ if Hilbert curve resampling is used and a specific low-discrepancy set is chosen. To our knowledge, this is the first known convergence rate lower than $o(n^{-1})$.  more » « less
Award ID(s):
1903139 2015411
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Page Range / eLocation ID:
181 to 194
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We propose a fast stochastic Hamilton Monte Carlo (HMC) method, for sampling from a smooth and strongly log-concave distribution. At the core of our proposed method is a variance reduction technique inspired by the recent advance in stochastic optimization. We show that, to achieve $\epsilon$ accuracy in 2-Wasserstein distance, our algorithm achieves $\tilde O\big(n+\kappa^{2}d^{1/2}/\epsilon+\kappa^{4/3}d^{1/3}n^{2/3}/\epsilon^{2/3}%\wedge\frac{\kappa^2L^{-2}d\sigma^2}{\epsilon^2} \big)$ gradient complexity (i.e., number of component gradient evaluations), which outperforms the state-of-the-art HMC and stochastic gradient HMC methods in a wide regime. We also extend our algorithm for sampling from smooth and general log-concave distributions, and prove the corresponding gradient complexity as well. Experiments on both synthetic and real data demonstrate the superior performance of our algorithm. 
    more » « less
  2. We study the convergence rate of discretized Riemannian Hamiltonian Monte Carlo on sampling from distributions in the form of e^{−f(x)} on a convex body M ⊂ R^n. We show that for distributions in the form of e−^{a x} on a polytope with m constraints, the convergence rate of a family of commonly-used integrators is independent of ∥a∥_2 and the geometry of the polytope. In particular, the implicit midpoint method (IMM) and the generalized Leapfrog method (LM) have a mixing time of mn^3 to achieve ϵ total variation distance to the target distribution. These guarantees are based on a general bound on the convergence rate for densities of the form e^{−f(x)} in terms of parameters of the manifold and the integrator. Our theoretical guarantee complements the empirical results of our old result, which shows that RHMC with IMM can sample ill-conditioned, non-smooth and constrained distributions in very high dimension efficiently in practice. 
    more » « less
  3. This paper concerns error bounds for recursive equations subject to Markovian disturbances. Motivating examples abound within the fields of Markov chain Monte Carlo (MCMC) and Reinforcement Learning (RL), and many of these algorithms can be interpreted as special cases of stochastic approximation (SA). It is argued that it is not possible in general to obtain a Hoeffding bound on the error sequence, even when the underlying Markov chain is reversible and geometrically ergodic, such as the M/M/1 queue. This is motivation for the focus on mean square error bounds for parameter estimates. It is shown that mean square error achieves the optimal rate of $O(1/n)$, subject to conditions on the step-size sequence. Moreover, the exact constants in the rate are obtained, which is of great value in algorithm design 
    more » « less
  4. We study the problem of testing identity against a given distribution with a focus on the high confidence regime. More precisely, given samples from an unknown distribution p over n elements, an explicitly given distribution q, and parameters 0< epsilon, delta < 1, we wish to distinguish, with probability at least 1-delta, whether the distributions are identical versus epsilon-far in total variation distance. Most prior work focused on the case that delta = Omega(1), for which the sample complexity of identity testing is known to be Theta(sqrt{n}/epsilon^2). Given such an algorithm, one can achieve arbitrarily small values of delta via black-box amplification, which multiplies the required number of samples by Theta(log(1/delta)). We show that black-box amplification is suboptimal for any delta = o(1), and give a new identity tester that achieves the optimal sample complexity. Our new upper and lower bounds show that the optimal sample complexity of identity testing is Theta((1/epsilon^2) (sqrt{n log(1/delta)} + log(1/delta))) for any n, epsilon, and delta. For the special case of uniformity testing, where the given distribution is the uniform distribution U_n over the domain, our new tester is surprisingly simple: to test whether p = U_n versus d_{TV} (p, U_n) >= epsilon, we simply threshold d_{TV}({p^}, U_n), where {p^} is the empirical probability distribution. The fact that this simple "plug-in" estimator is sample-optimal is surprising, even in the constant delta case. Indeed, it was believed that such a tester would not attain sublinear sample complexity even for constant values of epsilon and delta. An important contribution of this work lies in the analysis techniques that we introduce in this context. First, we exploit an underlying strong convexity property to bound from below the expectation gap in the completeness and soundness cases. Second, we give a new, fast method for obtaining provably correct empirical estimates of the true worst-case failure probability for a broad class of uniformity testing statistics over all possible input distributions - including all previously studied statistics for this problem. We believe that our novel analysis techniques will be useful for other distribution testing problems as well. 
    more » « less
  5. Summary

    We study the properties of points in [0,1]d generated by applying Hilbert's space filling curve to uniformly distributed points in [0, 1]. For deterministic sampling we obtain a discrepancy of O(n−1/d) for d⩾2. For random stratified sampling, and scrambled van der Corput points, we derive a mean-squared error of O(n−1−2/d) for integration of Lipschitz continuous integrands, when d⩾3. These rates are the same as those obtained by sampling on d-dimensional grids and they show a deterioration with increasing d. The rate for Lipschitz functions is, however, the best possible at that level of smoothness and is better than plain independent and identically distributed sampling. Unlike grids, space filling curve sampling provides points at any desired sample size, and the van der Corput version is extensible in n. We also introduce a class of piecewise Lipschitz functions whose discontinuities are in rectifiable sets described via Minkowski content. Although these functions may have infinite variation in the sense of Hardy and Krause, they can be integrated with a mean-squared error of O(n−1−1/d). It was previously known only that the rate was o(n−1). Other space filling curves, such as those due to Sierpinski and Peano, also attain these rates, whereas upper bounds for the Lebesgue curve are somewhat worse, as if the dimension were log2(3) times as high.

    more » « less