skip to main content

Title: Quantitative stability and error estimates for optimal transport plans
Abstract Optimal transport maps and plans between two absolutely continuous measures $\mu$ and $\nu$ can be approximated by solving semidiscrete or fully discrete optimal transport problems. These two problems ensue from approximating $\mu$ or both $\mu$ and $\nu$ by Dirac measures. Extending an idea from Gigli (2011, On Hölder continuity-in-time of the optimal transport map towards measures along a curve. Proc. Edinb. Math. Soc. (2), 54, 401–409), we characterize how transport plans change under the perturbation of both $\mu$ and $\nu$. We apply this insight to prove error estimates for semidiscrete and fully discrete algorithms in terms of errors solely arising from approximating measures. We obtain weighted $L^2$ error estimates for both types of algorithms with a convergence rate $O(h^{1/2})$. This coincides with the rate in Theorem 5.4 in Berman (2018, Convergence rates for discretized Monge–Ampère equations and quantitative stability of optimal transport. Preprint available at arXiv:1803.00785) for semidiscrete methods, but the error notion is different.
Award ID(s):
Publication Date:
Journal Name:
IMA Journal of Numerical Analysis
Page Range or eLocation-ID:
1941 to 1965
Sponsoring Org:
National Science Foundation
More Like this
  1. Braverman, Mark (Ed.)
    We present a framework for speeding up the time it takes to sample from discrete distributions $\mu$ defined over subsets of size $k$ of a ground set of $n$ elements, in the regime where $k$ is much smaller than $n$. We show that if one has access to estimates of marginals $\mathbb{P}_{S\sim \mu}[i\in S]$, then the task of sampling from $\mu$ can be reduced to sampling from related distributions $\nu$ supported on size $k$ subsets of a ground set of only $n^{1-\alpha}\cdot \operatorname{poly}(k)$ elements. Here, $1/\alpha\in [1, k]$ is the parameter of entropic independence for $\mu$. Further, our algorithm only requires sparsified distributions $\nu$ that are obtained by applying a sparse (mostly $0$) external field to $\mu$, an operation that for many distributions $\mu$ of interest, retains algorithmic tractability of sampling from $\nu$. This phenomenon, which we dub domain sparsification, allows us to pay a one-time cost of estimating the marginals of $\mu$, and in return reduce the amortized cost needed to produce many samples from the distribution $\mu$, as is often needed in upstream tasks such as counting and inference. For a wide range of distributions where $\alpha=\Omega(1)$, our result reduces the domain size, and as a corollary, themore »cost-per-sample, by a $\operatorname{poly}(n)$ factor. Examples include monomers in a monomer-dimer system, non-symmetric determinantal point processes, and partition-constrained Strongly Rayleigh measures. Our work significantly extends the reach of prior work of Anari and Derezi\'nski who obtained domain sparsification for distributions with a log-concave generating polynomial (corresponding to $\alpha=1$). As a corollary of our new analysis techniques, we also obtain a less stringent requirement on the accuracy of marginal estimates even for the case of log-concave polynomials; roughly speaking, we show that constant-factor approximation is enough for domain sparsification, improving over $O(1/k)$ relative error established in prior work.« less
  2. Abstract

    An adaptive, adversarial methodology is developed for the optimal transport problem between two distributions $\mu $ and $\nu $, known only through a finite set of independent samples $(x_i)_{i=1..n}$ and $(y_j)_{j=1..m}$. The methodology automatically creates features that adapt to the data, thus avoiding reliance on a priori knowledge of the distributions underlying the data. Specifically, instead of a discrete point-by-point assignment, the new procedure seeks an optimal map $T(x)$ defined for all $x$, minimizing the Kullback–Leibler divergence between $(T(x_i))$ and the target $(y_j)$. The relative entropy is given a sample-based, variational characterization, thereby creating an adversarial setting: as one player seeks to push forward one distribution to the other, the second player develops features that focus on those areas where the two distributions fail to match. The procedure solves local problems that seek the optimal transfer between consecutive, intermediate distributions between $\mu $ and $\nu $. As a result, maps of arbitrary complexity can be built by composing the simple maps used for each local problem. Displaced interpolation is used to guarantee global from local optimality. The procedure is illustrated through synthetic examples in one and two dimensions.

  3. An adaptive mesh refinement method for solving optimal control problems is developed. The method employs orthogonal collocation at Legendre–Gauss–Radau points, and adjusts both the mesh size and the degree of the approximating polynomials in the refinement process. A previously derived convergence rate is used to guide the refinement process. The method brackets discontinuities and improves solution accuracy by checking for large increases in higher-order derivatives of the state. In regions between discontinuities, where the solution is smooth, the error in the approximation is reduced by increasing the degree of the approximating polynomial. On mesh intervals where the error tolerance has been met, mesh density may be reduced either by merging adjacent mesh intervals or lowering the degree of the approximating polynomial. Finally, the method is demonstrated on two examples from the open literature and its performance is compared against a previously developed adaptive method.
  4. In this paper, we apply two fully-discrete local discontinuous Galerkin (LDG) methods to the compressible wormhole propagation. We will prove the stability and error estimates of the schemes. Traditional LDG methods use the diffusion term to control of convection term to obtain the stability for some linear equations. However, the variables in wormhole propagation are coupled together and the whole system is highly nonlinear. Therefore, it is extremely difficult to obtain the stability for fully-discrete LDG methods. To fix this gap, we introduce a new auxiliary variable including both the convection and diffusion terms. Moreover, we also construct a special time integration for the porosity, leading to physically relevant numerical approximations and controllable growth rate of the porosity. With a reasonable growth rate, it is possible to handle the time level mismatch in the first-order fully discrete scheme and obtain the stability of the scheme. For the whole system, we will prove that under weak temporal-spatial conditions, the optimal error estimates for the pressure, velocity, porosity and concentration under different norms can be obtained. Numerical experiments are also given to verify the theoretical results.
  5. Stochastic approximation (SA) algorithms are recursive techniques used to obtain the roots of functions that can be expressed as expectations of a noisy parameterized family of functions. In this paper two new SA algorithms are introduced: 1) PolSA, an extension of Polyak's momentum technique with a specially designed matrix momentum, and 2) NeSA, which can either be regarded as a variant of Nesterov's acceleration method, or a simplification of PolSA. The rates of convergence of SA algorithms is well understood. Under special conditions, the mean square error of the parameter estimates is bounded by σ 2 /n+o(1/n), where σ 2 ≥ 0 is an identifiable constant. If these conditions fail, the rate is typically sub-linear. There are two well known SA algorithms that ensure a linear rate, with minimal value of variance, σ 2 : the Ruppert-Polyak averaging technique, and the stochastic Newton-Raphson (SNR) algorithm. It is demonstrated here that under mild technical assumptions, the PolSA algorithm also achieves this optimality criteria. This result is established via novel coupling arguments: It is shown that the parameter estimates obtained from the PolSA algorithm couple with those of the optimal variance (but computationally more expensive) SNR algorithm, at a rate O(1/n 2more »). The newly proposed algorithms are extended to a reinforcement learning setting to obtain new Q-learning algorithms, and numerical results confirm the coupling of PolSA and SNR.« less