skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Large-Scale Wasserstein Gradient Flows
Wasserstein gradient flows provide a powerful means of understanding and solving many diffusion equations. Specifically, Fokker-Planck equations, which model the diffusion of probability measures, can be understood as gradient descent over entropy functionals in Wasserstein space. This equivalence, introduced by Jordan, Kinderlehrer and Otto, inspired the so-called JKO scheme to approximate these diffusion processes via an implicit discretization of the gradient flow in Wasserstein space. Solving the optimization problem associated with each JKO step, however, presents serious computational challenges. We introduce a scalable method to approximate Wasserstein gradient flows, targeted to machine learning applications. Our approach relies on input-convex neural networks (ICNNs) to discretize the JKO steps, which can be optimized by stochastic gradient descent. Contrarily to previous work, our method does not require domain discretization or particle simulation. As a result, we can sample from the measure at each time step of the diffusion and compute its probability density. We demonstrate the performance of our algorithm by computing diffusions following the Fokker-Planck equation and apply it to unnormalized density sampling as well as nonlinear filtering.  more » « less
Award ID(s):
1838071
PAR ID:
10310374
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
NeurIPS
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a discretization-free scalable framework for solving a large class of mass-conserving partial differential equations (PDEs), including the time-dependent Fokker-Planck equation and the Wasserstein gradient flow. The main observation is that the time-varying velocity field of the PDE solution needs to be self-consistent: it must satisfy a fixed-point equation involving the probability flow characterized by the same velocity field. Instead of directly minimizing the residual of the fixed-point equation with neural parameterization, we use an iterative formulation with a biased gradient estimator that bypasses significant computational obstacles with strong empirical performance. Compared to existing approaches, our method does not suffer from temporal or spatial discretization, covers a wider range of PDEs, and scales to high dimensions. Experimentally, our method recovers analytical solutions accurately when they are available and achieves superior performance in high dimensions with less training time compared to alternatives. 
    more » « less
  2. This paper studies computational methods for quasi-stationary distributions (QSDs). We first proposed a data-driven solver that solves Fokker–Planck equations for QSDs. Similar to the case of Fokker–Planck equations for invariant probability measures, we set up an optimization problem that minimizes the distance from a low-accuracy reference solution, under the constraint of satisfying the linear relation given by the discretized Fokker–Planck operator. Then we use coupling method to study the sensitivity of a QSD against either the change of boundary condition or the diffusion coefficient. The 1-Wasserstein distance between a QSD and the corresponding invariant probability measure can be quantitatively estimated. Some numerical results about both computation of QSDs and their sensitivity analysis are provided. 
    more » « less
  3. Particle-based Bayesian inference methods by sampling from a partition-free target (posterior) distribution, e.g., Stein variational gradient descent (SVGD), have attracted significant attention. We propose a path-guided particle-based sampling (PGPS) method based on a novel Logweighted Shrinkage (LwS) density path linking an initial distribution to the target distribution. We propose to utilize a Neural network to learn a vector field motivated by the Fokker-Planck equation of the designed density path. Particles, initiated from the initial distribution, evolve according to the ordinary differential equation defined by the vector field. The distribution of these particles is guided along a density path from the initial distribution to the target distribution. The proposed LwS density path allows for an efficient search of modes of the target distribution while canonical methods fail. We theoretically analyze the Wasserstein distance of the distribution of the PGPS-generated samples and the target distribution due to approximation and discretization errors. Practically, the proposed PGPS-LwS method demonstrates higher Bayesian inference accuracy and better calibration ability in experiments conducted on both synthetic and real-world Bayesian learning tasks, compared to baselines, such as SVGD and Langevin dynamics, etc. 
    more » « less
  4. We revisit the variational characterization of conservative di↵usion as entropic gra- dient flow and provide for it a probabilistic interpretation based on stochastic calculus. It was shown by Jordan, Kinderlehrer, and Otto that, for diffusions of Langevin–Smoluchowski type, the Fokker–Planck probability density flow maximizes the rate of relative entropy dissipation, as mea- sured by the distance traveled in the ambient space of probability measures with finite second moments, in terms of the quadratic Wasserstein metric. We obtain novel, stochastic-process ver- sions of these features, valid along almost every trajectory of the dffusive motion in the backwards direction of time, using a very direct perturbation analysis. By averaging our trajectorial results with respect to the underlying measure on path space, we establish the maximal rate of entropy dissipation along the Fokker–Planck flow and measure exactly the deviation from this maximum that corresponds to any given perturbation. A bonus of our trajectorial approach is that it derives the HWI inequality relating relative entropy (H), Wasserstein distance (W), and relative Fisher information (I). 
    more » « less
  5. An approximate analytical solution is derived for a certain class of stochastic differential equations with constant diffusion, but nonlinear drift coefficients. Specifically, a closed form expression is derived for the response process transition probability density function (PDF) based on the concept of the Wiener path integral and on a Cauchy–Schwarz inequality treatment. This is done in conjunction with formulating and solving an error minimisation problem by relying on the associated Fokker–Planck equation operator. The developed technique, which requires minimal computational cost for the determination of the response process PDF, exhibits satisfactory accuracy and is capable of capturing the salient features of the PDF as demonstrated by comparisons with pertinent Monte Carlo simulation data. In addition to the mathematical merit of the approximate analytical solution, the derived PDF can be used also as a benchmark for assessing the accuracy of alternative, more computationally demanding, numerical solution techniques. Several examples are provided for assessing the reliability of the proposed approximation. 
    more » « less