skip to main content

Title: Linear optimal transport embedding: provable Wasserstein classification for certain rigid transformations and perturbations
Abstract Discriminating between distributions is an important problem in a number of scientific fields. This motivated the introduction of Linear Optimal Transportation (LOT), which embeds the space of distributions into an $L^2$-space. The transform is defined by computing the optimal transport of each distribution to a fixed reference distribution and has a number of benefits when it comes to speed of computation and to determining classification boundaries. In this paper, we characterize a number of settings in which LOT embeds families of distributions into a space in which they are linearly separable. This is true in arbitrary dimension, and for families of distributions generated through perturbations of shifts and scalings of a fixed distribution. We also prove conditions under which the $L^2$ distance of the LOT embedding between two distributions in arbitrary dimension is nearly isometric to Wasserstein-2 distance between those distributions. This is of significant computational benefit, as one must only compute $N$ optimal transport maps to define the $N^2$ pairwise distances between $N$ distributions. We demonstrate the benefits of LOT on a number of distribution classification problems.  more » « less
Award ID(s):
2111322 1819222 2012266
Author(s) / Creator(s):
Date Published:
Journal Name:
Information and Inference: A Journal of the IMA
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    In this paper we study supervised learning tasks on the space of probability measures. We approach this problem by embedding the space of probability measures into$$L^2$$L2spaces using the optimal transport framework. In the embedding spaces, regular machine learning techniques are used to achieve linear separability. This idea has proved successful in applications and when the classes to be separated are generated by shifts and scalings of a fixed measure. This paper extends the class of elementary transformations suitable for the framework to families of shearings, describing conditions under which two classes of sheared distributions can be linearly separated. We furthermore give necessary bounds on the transformations to achieve a pre-specified separation level, and show how multiple embeddings can be used to allow for larger families of transformations. We demonstrate our results on image classification tasks.

    more » « less
  2. Summary Sequential Monte Carlo algorithms are widely accepted as powerful computational tools for making inference with dynamical systems. A key step in sequential Monte Carlo is resampling, which plays the role of steering the algorithm towards the future dynamics. Several strategies have been used in practice, including multinomial resampling, residual resampling, optimal resampling, stratified resampling and optimal transport resampling. In one-dimensional cases, we show that optimal transport resampling is equivalent to stratified resampling on the sorted particles, and both strategies minimize the resampling variance as well as the expected squared energy distance between the original and resampled empirical distributions. For general $d$-dimensional cases, we show that if the particles are first sorted using the Hilbert curve, the variance of stratified resampling is $O(m^{-(1+2/d)})$, an improvement over the best previously known rate of $O(m^{-(1+1/d)})$, where $m$ is the number of resampled particles. We show that this improved rate is optimal for ordered stratified resampling schemes, as conjectured in Gerber et al. (2019). We also present an almost-sure bound on the Wasserstein distance between the original and Hilbert-curve-resampled empirical distributions. In light of these results, we show that for dimension $d>1$ the mean square error of sequential quasi-Monte Carlo with $n$ particles can be $O(n^{-1-4/\{d(d+4)\}})$ if Hilbert curve resampling is used and a specific low-discrepancy set is chosen. To our knowledge, this is the first known convergence rate lower than $o(n^{-1})$. 
    more » « less
  3. A great number of robotics applications demand the rearrangement of many mobile objects, for example, organizing products on store shelves, shuffling containers at shipping ports, reconfiguring fleets of mobile robots, and so on. To boost the efficiency/throughput in systems designed for solving these rearrangement problems, it is essential to minimize the number of atomic operations that are involved, for example, the pick-n-places of individual objects. However, this optimization task poses a rather difficult challenge due to the complex inter-dependency between the objects, especially when they are tightly packed together. In this work, in tackling the aforementioned challenges, we have developed a novel algorithmic tool, called Rubik Tables, that provides a clean abstraction of object rearrangement problems as the proxy problem of shuffling items stored in a table or lattice. In its basic form, a Rubik Table is an n × n table containing n2items. We show that the reconfiguration of items in such a Rubik Table can be achieved using at most n column and n row shuffles in the partially labeled setting, where each column (resp., row) shuffle may arbitrarily permute the items stored in a column (resp., row) of the table. When items are fully distinguishable, additional n shuffles are needed. Rubik Tables allow many generalizations, for example, adding an additional depth dimension or extending to higher dimensions. Using Rubik Table results, we have designed a first constant-factor optimal algorithm for stack rearrangement problems where items are stored in stacks, accessible only from the top. We show that, for nd items stored in n stacks of depth d each, using one empty stack as the swap space, O( nd) stack pop-push operations are sufficient for an arbitrary reconfiguration of the stacks where [Formula: see text] for arbitrary fixed m > 0. Rubik Table results also allow the development of constant-factor optimal solutions for solving multi-robot motion planning problems under extreme robot density. These algorithms based on Rubik Table results run in low-polynomial time.

    more » « less
  4. We consider the problem of estimating the optimal transport map between two probability distributions, P and Q in Rd, on the basis of i.i.d. samples. All existing statistical analyses of this problem require the assumption that the transport map is Lipschitz, a strong requirement that, in particular, excludes any examples where the transport map is discontinuous. As a first step towards developing estimation procedures for discontinuous maps, we consider the important special case where the data distribution Q is a discrete measure supported on a finite number of points in Rd. We study a computationally efficient estimator initially proposed by Pooladian and Niles-Weed (2021), based on entropic optimal transport, and show in the semi-discrete setting that it converges at the minimax-optimal rate n−1/2, independent of dimension. Other standard map estimation techniques both lack finite-sample guarantees in this setting and provably suffer from the curse of dimensionality. We confirm these results in numerical experiments, and provide experiments for other settings, not covered by our theory, which indicate that the entropic estimator is a promising methodology for other discontinuous transport map estimation problems. 
    more » « less
  5. This work proposes a new computational framework for learning a structured generative model for real-world datasets. In particular, we propose to learn a Closed-loop Transcriptionbetween a multi-class, multi-dimensional data distribution and a Linear discriminative representation (CTRL) in the feature space that consists of multiple independent multi-dimensional linear subspaces. In particular, we argue that the optimal encoding and decoding mappings sought can be formulated as a two-player minimax game between the encoder and decoderfor the learned representation. A natural utility function for this game is the so-called rate reduction, a simple information-theoretic measure for distances between mixtures of subspace-like Gaussians in the feature space. Our formulation draws inspiration from closed-loop error feedback from control systems and avoids expensive evaluating and minimizing of approximated distances between arbitrary distributions in either the data space or the feature space. To a large extent, this new formulation unifies the concepts and benefits of Auto-Encoding and GAN and naturally extends them to the settings of learning a both discriminative and generative representation for multi-class and multi-dimensional real-world data. Our extensive experiments on many benchmark imagery datasets demonstrate tremendous potential of this new closed-loop formulation: under fair comparison, visual quality of the learned decoder and classification performance of the encoder is competitive and arguably better than existing methods based on GAN, VAE, or a combination of both. Unlike existing generative models, the so-learned features of the multiple classes are structured instead of hidden: different classes are explicitly mapped onto corresponding independent principal subspaces in the feature space, and diverse visual attributes within each class are modeled by the independent principal components within each subspace. 
    more » « less