skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on January 22, 2026

Title: Linearized Wasserstein Barycenters: Synthesis, Analysis, Representational Capacity, and Applications
We propose the extit{linear barycentric coding model (LBCM)} that utilizes the linear optimal transport (LOT) metric for analysis and synthesis of probability measures. We provide a closed-form solution to the variational problem characterizing the probability measures in the LBCM and establish equivalence of the LBCM to the set of Wasserstein-2 barycenters in the special case of compatible measures. Computational methods for synthesizing and analyzing measures in the LBCM are developed with finite sample guarantees. One of our main theoretical contributions is to identify an LBCM, expressed in terms of a simple family, which is sufficient to express all probability measures on the interval [0,1]. We show that a natural analogous construction of an LBCM in ℝ2 fails, and we leave it as an open problem to identify the proper extension in more than one dimension. We conclude by demonstrating the utility of LBCM for covariance estimation and data imputation.  more » « less
Award ID(s):
2019786
PAR ID:
10620811
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Open Review
Date Published:
Format(s):
Medium: X
Location:
https://openreview.net/forum?id=MWCHedOm2L
Sponsoring Org:
National Science Foundation
More Like this
  1. The classical static Schrödinger Bridge (SSB) problem, which seeks the most likely stochastic evolution between two marginal probability measures, has been studied extensively in the optimal transport and statistical physics communities, and more recently in machine learning communities in the surge of generative models. The standard approach to solve SSB is to first identify its Kantorovich dual and use Sinkhorn's algorithm to find the optimal potential functions. While the original SSB is only a strictly convex minimization problem, this approach is known to warrant linear convergence under mild assumptions. In this work, we consider a generalized SSB allowing any strictly increasing divergence functional, far generalizing the entropy functional x log(x) in the standard SSB. This problem naturally arises in a wide range of seemingly unrelated problems in entropic optimal transport, random graphs/matrices, and combinatorics. We establish Kantorovich duality and linear convergence of Sinkhorn's algorithm for the generalized SSB problem under mild conditions. Our results provide a new rigorous foundation for understanding Sinkhorn-type iterative methods in the context of large-scale generalized Schrödinger bridges. 
    more » « less
  2. null (Ed.)
    Computer-aided methods, based on the entropic linear program framework, have been shown to be effective in assisting the study of information theoretic fundamental limits of information systems. One key element that significantly impacts their computation efficiency and applicability is the reduction of variables, based on problem-specific symmetry and dependence relations. In this work, we propose using the disjoint-set data structure to algorithmically identify the reduction mapping, instead of relying on exhaustive enumeration in the equivalence classification. Based on this reduced linear program, we consider four techniques to investigate the fundamental limits of information systems: (1) computing an outer bound for a given linear combination of information measures and providing the values of information measures at the optimal solution; (2) efficiently computing a polytope tradeoff outer bound between two information quantities; (3) producing a proof (as a weighted sum of known information inequalities) for a computed outer bound; and (4) providing the range for information quantities between which the optimal value does not change, i.e., sensitivity analysis. A toolbox, with an efficient JSON format input frontend, and either Gurobi or Cplex as the linear program solving engine, is implemented and open-sourced. 
    more » « less
  3. Discrete and continuous frames can be considered as positive operator-valued measures (POVMs) that have integral representations using rank-one operators. However, not every POVM has an integral representation. One goal of this paper is to examine the POVMs that have finite-rank integral representations. More precisely, we present a necessary and sufficient condition under which a positive operator-valued measure $$F: \Omega \to B(H)$$ has an integral representation of the form $$F(E) =\sum_{k=1}^{m} \int_{E}\, G_{k}(\omega)\otimes G_{k}(\omega) d\mu(\omega)$$ for some weakly measurable maps $$G_{k} \ (1\leq k\leq m) $$ from a measurable space $$\Omega$$ to a Hilbert space $$\mathcal{H}$$ and some positive measure $$\mu$$ on $$\Omega$$. Similar characterizations are also obtained for projection-valued measures. As special consequences of our characterization we settle negatively a problem of Ehler and Okoudjou about probability frame representations of probability POVMs, and prove that an integral representable probability POVM can be dilated to a integral representable projection-valued measure if and only if the corresponding measure is purely atomic. 
    more » « less
  4. Wasserstein distances form a family of metrics on spaces of probability measures that have recently seen many applications. However, statistical analysis in these spaces is complex due to the nonlinearity of Wasserstein spaces. One potential solution to this problem is Linear Optimal Transport (LOT). This method allows one to find a Euclidean embedding, called {\it LOT embedding}, of measures in some Wasserstein spaces, but some information is lost in this embedding. So, to understand whether statistical analysis relying on LOT embeddings can make valid inferences about original data, it is helpful to quantify how well these embeddings describe that data. To answer this question, we present a decomposition of the {\it Fr\'echet variance} of a set of measures in the 2-Wasserstein space, which allows one to compute the percentage of variance explained by LOT embeddings of those measures. We then extend this decomposition to the Fused Gromov-Wasserstein setting. We also present several experiments that explore the relationship between the dimension of the LOT embedding, the percentage of variance explained by the embedding, and the classification accuracy of machine learning classifiers built on the embedded data. We use the MNIST handwritten digits dataset, IMDB-50000 dataset, and Diffusion Tensor MRI images for these experiments. Our results illustrate the effectiveness of low dimensional LOT embeddings in terms of the percentage of variance explained and the classification accuracy of models built on the embedded data. 
    more » « less
  5. We study probability measures on partitions based on symmetric Grothendieck polynomials. These deformations of Schur polynomials introduced in the K-theory of Grassmannians share many common properties. Our Grothendieck measures are analogs of the Schur measures on partitions introduced by Okounkov (Sel Math 7(1):57–81, 2001). Despite the similarity of determinantal formulas for the probability weights of Schur and Grothendieck measures, we demonstrate that Grothendieck measures are not determinantal point processes. This question is related to the principal minor assignment problem in algebraic geometry, and we employ a determinantal test first obtained by Nanson in 1897 for the 4 × 4 problem. We also propose a procedure for getting Nanson-like determinantal tests for matrices of any size n ≥ 4, which appear new for n ≥ 5. By placing the Grothendieck measures into a new framework of tilted biorthogonal ensembles generalizing a rich class of determinantal processes introduced by Borodin (Nucl Phys B 536:704–732, 1998), we identify Grothendieck random partitions as a cross-section of a Schur process, a determinantal process in two dimensions. This identification expresses the correlation functions of Grothendieck measures through sums of Fredholm determinants, which are not immediately suitable for asymptotic analysis. A more direct approach allows us to obtain a limit shape result for the Grothendieck random partitions. The limit shape curve is not particularly explicit as it arises as a cross-section of the limit shape surface for the Schur process. The gradient of this surface is expressed through the argument of a complex root of a cubic equation. 
    more » « less