skip to main content


Title: On the Ergodicity, Bias and Asymptotic Normality of Randomized Midpoint Sampling Method
The randomized midpoint method, proposed by (Shen and Lee, 2019), has emerged as an optimal discretization procedure for simulating the continuous time underdamped Langevin diffusion. In this paper, we analyze several probabilistic properties of the randomized midpoint discretization method, considering both overdamped and underdamped Langevin dynamics. We first characterize the stationary distribution of the discrete chain obtained with constant step-size discretization and show that it is biased away from the target distribution. Notably, the step-size needs to go to zero to obtain asymptotic unbiasedness. Next, we establish the asymptotic normality of numerical integration using the randomized midpoint method and highlight the relative advantages and disadvantages over other discretizations. Our results collectively provide several insights into the behavior of the randomized midpoint discretization method, including obtaining confidence intervals for numerical integrations.  more » « less
Award ID(s):
1934568
NSF-PAR ID:
10281912
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Stochastic gradient descent with momentum (SGDm) is one of the most popular optimization algorithms in deep learning. While there is a rich theory of SGDm for convex problems, the theory is considerably less developed in the context of deep learning where the problem is non-convex and the gradient noise might exhibit a heavy-tailed behavior, as empirically observed in recent studies. In this study, we consider a \emph{continuous-time} variant of SGDm, known as the underdamped Langevin dynamics (ULD), and investigate its asymptotic properties under heavy-tailed perturbations. Supported by recent studies from statistical physics, we argue both theoretically and empirically that the heavy-tails of such perturbations can result in a bias even when the step-size is small, in the sense that \emph{the optima of stationary distribution} of the dynamics might not match \emph{the optima of the cost function to be optimized}. As a remedy, we develop a novel framework, which we coin as \emph{fractional} ULD (FULD), and prove that FULD targets the so-called Gibbs distribution, whose optima exactly match the optima of the original cost. We observe that the Euler discretization of FULD has noteworthy algorithmic similarities with \emph{natural gradient} methods and \emph{gradient clipping}, bringing a new perspective on understanding their role in deep learning. We support our theory with experiments conducted on a synthetic model and neural networks. 
    more » « less
  2. The thermal radiative transfer (TRT) equations form an integro-differential system that describes the propagation and collisional interactions of photons. Computing accurate and efficient numerical solutions TRT are challenging for several reasons, the first of which is that TRT is defined on a high-dimensional phase space that includes the independent variables of time, space, and velocity. In order to reduce the dimensionality of the phase space, classical approaches such as the P$_N$ (spherical harmonics) or the S$_N$ (discrete ordinates) ansatz are often used in the literature. In this work, we introduce a novel approach: the hybrid discrete (H$^T_N$) approximation to the radiative thermal transfer equations. This approach acquires desirable properties of both P$_N$ and S$_N$, and indeed reduces to each of these approximations in various limits: H$^1_N$ $\equiv$ P$_N$ and H$^T_0$ $\equiv$ S$_T$. We prove that H$^T_N$ results in a system of hyperbolic partial differential equations for all $T\ge 1$ and $N\ge 0$. Another challenge in solving the TRT system is the inherent stiffness due to the large timescale separation between propagation and collisions, especially in the diffusive (i.e., highly collisional) regime. This stiffness challenge can be partially overcome via implicit time integration, although fully implicit methods may become computationally expensive due to the strong nonlinearity and system size. On the other hand, explicit time-stepping schemes that are not also asymptotic-preserving in the highly collisional limit require resolving the mean-free path between collisions, making such schemes prohibitively expensive. In this work we develop a numerical method that is based on a nodal discontinuous Galerkin discretization in space, coupled with a semi-implicit discretization in time. In particular, we make use of a second order explicit Runge-Kutta scheme for the streaming term and an implicit Euler scheme for the material coupling term. Furthermore, in order to solve the material energy equation implicitly after each predictor and corrector step, we linearize the temperature term using a Taylor expansion; this avoids the need for an iterative procedure, and therefore improves efficiency. In order to reduce unphysical oscillation, we apply a slope limiter after each time step. Finally, we conduct several numerical experiments to verify the accuracy, efficiency, and robustness of the H$^T_N$ ansatz and the numerical discretizations. 
    more » « less
  3. Semiflexible slender filaments are ubiquitous in nature and cell biology, including in the cytoskeleton, where reorganization of actin filaments allows the cell to move and divide. Most methods for simulating semiflexible inextensible fibers/polymers are based on discrete (bead-link or blob-link) models, which become prohibitively expensive in the slender limit when hydrodynamics is accounted for. In this paper, we develop a novel coarse-grained approach for simulating fluctuating slender filaments with hydrodynamic interactions. Our approach is tailored to relatively stiff fibers whose persistence length is comparable to or larger than their length and is based on three major contributions. First, we discretize the filament centerline using a coarse non-uniform Chebyshev grid, on which we formulate a discrete constrained Gibbs–Boltzmann (GB) equilibrium distribution and overdamped Langevin equation for the evolution of unit-length tangent vectors. Second, we define the hydrodynamic mobility at each point on the filament as an integral of the Rotne–Prager–Yamakawa kernel along the centerline and apply a spectrally accurate “slender-body” quadrature to accurately resolve the hydrodynamics. Third, we propose a novel midpoint temporal integrator, which can correctly capture the Ito drift terms that arise in the overdamped Langevin equation. For two separate examples, we verify that the equilibrium distribution for the Chebyshev grid is a good approximation of the blob-link one and that our temporal integrator for overdamped Langevin dynamics samples the equilibrium GB distribution for sufficiently small time step sizes. We also study the dynamics of relaxation of an initially straight filament and find that as few as 12 Chebyshev nodes provide a good approximation to the dynamics while allowing a time step size two orders of magnitude larger than a resolved blob-link simulation. We conclude by applying our approach to a suspension of cross-linked semiflexible fibers (neglecting hydrodynamic interactions between fibers), where we study how semiflexible fluctuations affect bundling dynamics. We find that semiflexible filaments bundle faster than rigid filaments even when the persistence length is large, but show that semiflexible bending fluctuations only further accelerate agglomeration when the persistence length and fiber length are of the same order. 
    more » « less
  4. The technique of modifying the geometry of a problem from Euclidean to Hessian metric has proved to be quite effective in optimization, and has been the subject of study for sampling. The Mirror Langevin Diffusion (MLD) is a sampling analogue of mirror flow in continuous time, and it has nice convergence properties under log-Sobolev or Poincare inequalities relative to the Hessian metric. In discrete time, a simple discretization of MLD is the Mirror Langevin Algorithm (MLA), which was shown to have a biased convergence guarantee with a non-vanishing bias term (does not go to zero as step size goes to zero). This raised the question of whether we need a better analysis or a better discretization to achieve a vanishing bias. Here we study the Mirror Langevin Algorithm and show it indeed has a vanishing bias. We apply mean-square analysis to show the mixing time bound for MLA under the modified self-concordance condition. 
    more » « less
  5. Stochastic gradient Hamiltonian Monte Carlo (SGHMC) is a variant of stochastic gradients with momentum where a controlled and properly scaled Gaussian noise is added to the stochastic gradients to steer the iterates toward a global minimum. Many works report its empirical success in practice for solving stochastic nonconvex optimization problems; in particular, it has been observed to outperform overdamped Langevin Monte Carlo–based methods, such as stochastic gradient Langevin dynamics (SGLD), in many applications. Although the asymptotic global convergence properties of SGHMC are well known, its finite-time performance is not well understood. In this work, we study two variants of SGHMC based on two alternative discretizations of the underdamped Langevin diffusion. We provide finite-time performance bounds for the global convergence of both SGHMC variants for solving stochastic nonconvex optimization problems with explicit constants. Our results lead to nonasymptotic guarantees for both population and empirical risk minimization problems. For a fixed target accuracy level on a class of nonconvex problems, we obtain complexity bounds for SGHMC that can be tighter than those available for SGLD. 
    more » « less