Distributional data have become increasingly prominent in modern signal processing, highlighting the necessity of computing optimal transport (OT) maps across multiple probability distributions. Nevertheless, recent studies on neural OT methods predominantly focused on the efficient computation of a single map between two distributions. To address this challenge, we introduce a novel approach to learning transport maps for new empirical distributions. Specifically, we employ the transformer architecture to produce embeddings from distributional data of varying length; these embeddings are then fed into a hypernetwork to generate neural OT maps. Various numerical experiments were conducted to validate the embeddings and the generated OT maps.
more »
« less
Optimal Transport using GANs for Lineage Tracing
In this paper, we present Super-OT, a novel approach to computational lineage tracing that combines a supervised learning framework with optimal transport based on Generative Adver-sarial Networks (GANs). Unlike previous ap-proaches to lineage tracing, Super-OT has the flexibility to integrate paired data. We bench-mark Super-OT based on single-cell RNA-seq data against Waddington-OT, a popular approach for lineage tracing that also employs optimal trans-port. We show that Super-OT achieves gains overWaddington-OT in predicting the class outcome of cells during differentiation, since it allows the inte-gration of additional information during training.
more »
« less
- Award ID(s):
- 1651995
- PAR ID:
- 10232194
- Date Published:
- Journal Name:
- Proceedings of Machine Learning Research
- Volume:
- 119
- ISSN:
- 2640-3498
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Optimal transport (OT) methods seek a transformation map (or plan) between two probability measures, such that the transformation has the minimum transportation cost. Such a minimum transport cost, with a certain power transform, is called the Wasserstein distance. Recently, OT methods have drawn great attention in statistics, machine learning, and computer science, especially in deep generative neural networks. Despite its broad applications, the estimation of high‐dimensional Wasserstein distances is a well‐known challenging problem owing to the curse‐of‐dimensionality. There are some cutting‐edge projection‐based techniques that tackle high‐dimensional OT problems. Three major approaches of such techniques are introduced, respectively, the slicing approach, the iterative projection approach, and the projection robust OT approach. Open challenges are discussed at the end of the review. This article is categorized under:Statistical and Graphical Methods of Data Analysis > Dimension ReductionStatistical Learning and Exploratory Methods of the Data Sciences > Manifold Learningmore » « less
-
Optimal transport (OT) measures distances between distributions in a way that depends on the geometry of the sample space. In light of recent advances in computational OT, OT distances are widely used as loss functions in machine learning. Despite their prevalence and advantages, OT loss functions can be extremely sensitive to outliers. In fact, a single adversarially-picked outlier can increase the standard W2-distance arbitrarily. To address this issue, we propose an outlier-robust formulation of OT. Our formulation is convex but challenging to scale at a first glance. Our main contribution is deriving an \emph{equivalent} formulation based on cost truncation that is easy to incorporate into modern algorithms for computational OT. We demonstrate the benefits of our formulation in mean estimation problems under the Huber contamination model in simulations and outlier detection tasks on real data.more » « less
-
Network alignment is a critical steppingstone behind a variety of multi-network mining tasks. Most of the existing methods essentially optimize a Frobenius-like distance or ranking-based loss, ignoring the underlying geometry of graph data. Optimal transport (OT), together with Wasserstein distance, has emerged to be a powerful approach accounting for the underlying geometry explicitly. Promising as it might be, the state-of-the-art OT-based alignment methods suffer from two fundamental limitations, including (1) effectiveness due to the insufficient use of topology and consistency information and (2) scalability due to the non-convex formulation and repeated computationally costly loss calculation. In this paper, we propose a position-aware regularized optimal transport framework for network alignment named PARROT. To tackle the effectiveness issue, the proposed PARROT captures topology information by random walk with restart, with three carefully designed consistency regularization terms. To tackle the scalability issue, the regularized OT problem is decomposed into a series of convex subproblems and can be efficiently solved by the proposed constrained proximal point method with guaranteed convergence. Extensive experiments show that our algorithm achieves significant improvements in both effectiveness and scalability, outperforming the state-of-the-art network alignment methods and speeding up existing OT-based methods by up to 100 times.more » « less
-
During mammalian development, the left and right ventricles arise from early populations of cardiac progenitors known as the first and second heart fields, respectively. While these populations have been extensively studied in non-human model systems, their identification and study in vivo human tissues have been limited due to the ethical and technical limitations of accessing gastrulation-stage human embryos. Human-induced pluripotent stem cells (hiPSCs) present an exciting alternative for modeling early human embryogenesis due to their well-established ability to differentiate into all embryonic germ layers. Here, we describe the development of a TBX5/MYL2 lineage tracing reporter system that allows for the identification of FHF- progenitors and their descendants including left ventricular cardiomyocytes. Furthermore, using single-cell RNA sequencing (scRNA-seq) with oligonucleotide-based sample multiplexing, we extensively profiled differentiating hiPSCs across 12 timepoints in two independent iPSC lines. Surprisingly, our reporter system and scRNA-seq analysis revealed a predominance of FHF differentiation using the small molecule Wnt-based 2D differentiation protocol. We compared this data with existing murine and 3D cardiac organoid scRNA-seq data and confirmed the dominance of left ventricular cardiomyocytes (>90%) in our hiPSC-derived progeny. Together, our work provides the scientific community with a powerful new genetic lineage tracing approach as well as a single-cell transcriptomic atlas of hiPSCs undergoing cardiac differentiation.more » « less
An official website of the United States government

