 Award ID(s):
 2210583
 NSFPAR ID:
 10508718
 Editor(s):
 Krause, Andreas; Brunskill, Emma; Cho, Kyunghyun; Engelhardt, Barbara; Sabato, Sivan; Scarlett, Jonathan
 Publisher / Repository:
 PMLR
 Date Published:
 Journal Name:
 Proceedings of Machine Learning Research
 Volume:
 202
 ISSN:
 26403498
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

We consider the problem of estimating the optimal transport map between two probability distributions, P and Q in Rd, on the basis of i.i.d. samples. All existing statistical analyses of this problem require the assumption that the transport map is Lipschitz, a strong requirement that, in particular, excludes any examples where the transport map is discontinuous. As a first step towards developing estimation procedures for discontinuous maps, we consider the important special case where the data distribution Q is a discrete measure supported on a finite number of points in Rd. We study a computationally efficient estimator initially proposed by Pooladian and NilesWeed (2021), based on entropic optimal transport, and show in the semidiscrete setting that it converges at the minimaxoptimal rate n−1/2, independent of dimension. Other standard map estimation techniques both lack finitesample guarantees in this setting and provably suffer from the curse of dimensionality. We confirm these results in numerical experiments, and provide experiments for other settings, not covered by our theory, which indicate that the entropic estimator is a promising methodology for other discontinuous transport map estimation problems.more » « less

A. Oh ; T. Naumann ; A. Globerson ; K. Saenko ; M. Hardt ; S. Levine (Ed.)In the theory of lossy compression, the ratedistortion (RD) function R(D) describes how much a data source can be compressed (in bitrate) at any given level of fidelity (distortion). Obtaining R(D) for a given data source establishes the fundamental performance limit for all compression algorithms. We propose a new method to estimate R(D) from the perspective of optimal transport. Unlike the classic BlahutArimoto algorithm which fixes the support of the reproduction distribution in advance, our Wasserstein gradient descent algorithm learns the support of the optimal reproduction distribution by moving particles. We prove its local convergence and analyze the sample complexity of our RD estimator based on a connection to entropic optimal transport. Experimentally, we obtain comparable or tighter bounds than stateoftheart neural network methods on lowrate sources while requiring considerably less tuning and computation effort. We also highlight a connection to maximumlikelihood deconvolution and introduce a new class of sources that can be used as test cases with known solutions to the RD problem.more » « less

null (Ed.)Abstract Estimating the mean of a probability distribution using i.i.d. samples is a classical problem in statistics, wherein finitesample optimal estimators are sought under various distributional assumptions. In this paper, we consider the problem of mean estimation when independent samples are drawn from $d$dimensional nonidentical distributions possessing a common mean. When the distributions are radially symmetric and unimodal, we propose a novel estimator, which is a hybrid of the modal interval, shorth and median estimators and whose performance adapts to the level of heterogeneity in the data. We show that our estimator is near optimal when data are i.i.d. and when the fraction of ‘lownoise’ distributions is as small as $\varOmega \left (\frac{d \log n}{n}\right )$, where $n$ is the number of samples. We also derive minimax lower bounds on the expected error of any estimator that is agnostic to the scales of individual data points. Finally, we extend our theory to linear regression. In both the mean estimation and regression settings, we present computationally feasible versions of our estimators that run in time polynomial in the number of data points.more » « less

null (Ed.)Summary We consider the problem of approximating smoothing spline estimators in a nonparametric regression model. When applied to a sample of size $n$, the smoothing spline estimator can be expressed as a linear combination of $n$ basis functions, requiring $O(n^3)$ computational time when the number $d$ of predictors is two or more. Such a sizeable computational cost hinders the broad applicability of smoothing splines. In practice, the fullsample smoothing spline estimator can be approximated by an estimator based on $q$ randomly selected basis functions, resulting in a computational cost of $O(nq^2)$. It is known that these two estimators converge at the same rate when $q$ is of order $O\{n^{2/(pr+1)}\}$, where $p\in [1,2]$ depends on the true function and $r > 1$ depends on the type of spline. Such a $q$ is called the essential number of basis functions. In this article, we develop a more efficient basis selection method. By selecting basis functions corresponding to approximately equally spaced observations, the proposed method chooses a set of basis functions with great diversity. The asymptotic analysis shows that the proposed smoothing spline estimator can decrease $q$ to around $O\{n^{1/(pr+1)}\}$ when $d\leq pr+1$. Applications to synthetic and realworld datasets show that the proposed method leads to a smaller prediction error than other basis selection methods.more » « less

null (Ed.)
We consider the regression problem of estimating functions on $ \mathbb{R}^D $ but supported on a $ d $dimensional manifold $ \mathcal{M} ~~\subset \mathbb{R}^D $ with $ d \ll D $. Drawing ideas from multiresolution analysis and nonlinear approximation, we construct lowdimensional coordinates on $ \mathcal{M} $ at multiple scales, and perform multiscale regression by local polynomial fitting. We propose a datadriven wavelet thresholding scheme that automatically adapts to the unknown regularity of the function, allowing for efficient estimation of functions exhibiting nonuniform regularity at different locations and scales. We analyze the generalization error of our method by proving finite sample bounds in high probability on rich classes of priors. Our estimator attains optimal learning rates (up to logarithmic factors) as if the function was defined on a known Euclidean domain of dimension $ d $, instead of an unknown manifold embedded in $ \mathbb{R}^D $. The implemented algorithm has quasilinear complexity in the sample size, with constants linear in $ D $ and exponential in $ d $. Our work therefore establishes a new framework for regression on lowdimensional sets embedded in high dimensions, with fast implementation and strong theoretical guarantees.