skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Uncoupled isotonic regression via minimum Wasserstein deconvolution
Abstract Isotonic regression is a standard problem in shape-constrained estimation where the goal is to estimate an unknown non-decreasing regression function $$f$$ from independent pairs $$(x_i, y_i)$$ where $${\mathbb{E}}[y_i]=f(x_i), i=1, \ldots n$$. While this problem is well understood both statistically and computationally, much less is known about its uncoupled counterpart, where one is given only the unordered sets $$\{x_1, \ldots , x_n\}$$ and $$\{y_1, \ldots , y_n\}$$. In this work, we leverage tools from optimal transport theory to derive minimax rates under weak moments conditions on $$y_i$$ and to give an efficient algorithm achieving optimal rates. Both upper and lower bounds employ moment-matching arguments that are also pertinent to learning mixtures of distributions and deconvolution.  more » « less
Award ID(s):
1712596 1838071
PAR ID:
10128161
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Information and Inference: A Journal of the IMA
Volume:
8
Issue:
4
ISSN:
2049-8772
Page Range / eLocation ID:
p. 691-717
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract When k and s are natural numbers and $${\mathbf h}\in {\mathbb Z}^k$$, denote by $$J_{s,k}(X;\,{\mathbf h})$$ the number of integral solutions of the system $$ \sum_{i=1}^s(x_i^j-y_i^j)=h_j\quad (1\leqslant j\leqslant k), $$ with $$1\leqslant x_i,y_i\leqslant X$$. When $$s\lt k(k+1)/2$$ and $$(h_1,\ldots ,h_{k-1})\ne {\mathbf 0}$$, Brandes and Hughes have shown that $$J_{s,k}(X;\,{\mathbf h})=o(X^s)$$. In this paper we improve on quantitative aspects of this result, and, subject to an extension of the main conjecture in Vinogradov’s mean value theorem, we obtain an asymptotic formula for $$J_{s,k}(X;\,{\mathbf h})$$ in the critical case $s=k(k+1)/2$. The latter requires minor arc estimates going beyond square-root cancellation. 
    more » « less
  2. Chan, Timothy; Fischer, Johannes; Iacono, John; Herman, Grzegorz (Ed.)
    We study the problem of robust multivariate polynomial regression: let p: ℝⁿ → ℝ be an unknown n-variate polynomial of degree at most d in each variable. We are given as input a set of random samples (𝐱_i,y_i) ∈ [-1,1]ⁿ × ℝ that are noisy versions of (𝐱_i,p(𝐱_i)). More precisely, each 𝐱_i is sampled independently from some distribution χ on [-1,1]ⁿ, and for each i independently, y_i is arbitrary (i.e., an outlier) with probability at most ρ < 1/2, and otherwise satisfies |y_i-p(𝐱_i)| ≤ σ. The goal is to output a polynomial p̂, of degree at most d in each variable, within an 𝓁_∞-distance of at most O(σ) from p. Kane, Karmalkar, and Price [FOCS'17] solved this problem for n = 1. We generalize their results to the n-variate setting, showing an algorithm that achieves a sample complexity of O_n(dⁿlog d), where the hidden constant depends on n, if χ is the n-dimensional Chebyshev distribution. The sample complexity is O_n(d^{2n}log d), if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most O(σ), and the run-time depends on log(1/σ). In the setting where each 𝐱_i and y_i are known up to N bits of precision, the run-time’s dependence on N is linear. We also show that our sample complexities are optimal in terms of dⁿ. Furthermore, we show that it is possible to have the run-time be independent of 1/σ, at the cost of a higher sample complexity. 
    more » « less
  3. We study the problem of robust multivariate polynomial regression: let p\colon\mathbb{R}^n\to\mathbb{R} be an unknown n-variate polynomial of degree at most d in each variable. We are given as input a set of random samples (\mathbf{x}_i,y_i) \in [-1,1]^n \times \mathbb{R} that are noisy versions of (\mathbf{x}_i,p(\mathbf{x}_i)). More precisely, each \mathbf{x}_i is sampled independently from some distribution \chi on [-1,1]^n, and for each i independently, y_i is arbitrary (i.e., an outlier) with probability at most \rho < 1/2, and otherwise satisfies |y_i-p(\mathbf{x}_i)|\leq\sigma. The goal is to output a polynomial \hat{p}, of degree at most d in each variable, within an \ell_\infty-distance of at most O(\sigma) from p. Kane, Karmalkar, and Price [FOCS'17] solved this problem for n=1. We generalize their results to the n-variate setting, showing an algorithm that achieves a sample complexity of O_n(d^n\log d), where the hidden constant depends on n, if \chi is the n-dimensional Chebyshev distribution. The sample complexity is O_n(d^{2n}\log d), if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most O(\sigma), and the run-time depends on \log(1/\sigma). In the setting where each \mathbf{x}_i and y_i are known up to N bits of precision, the run-time's dependence on N is linear. We also show that our sample complexities are optimal in terms of d^n. Furthermore, we show that it is possible to have the run-time be independent of 1/\sigma, at the cost of a higher sample complexity. 
    more » « less
  4. Abstract In the (special) smoothing spline problem one considers a variational problem with a quadratic data fidelity penalty and Laplacian regularization. Higher order regularity can be obtained via replacing the Laplacian regulariser with a poly-Laplacian regulariser. The methodology is readily adapted to graphs and here we consider graph poly-Laplacian regularization in a fully supervised, non-parametric, noise corrupted, regression problem. In particular, given a dataset$$\{x_i\}_{i=1}^n$$ { x i } i = 1 n and a set of noisy labels$$\{y_i\}_{i=1}^n\subset \mathbb {R}$$ { y i } i = 1 n R we let$$u_n{:}\{x_i\}_{i=1}^n\rightarrow \mathbb {R}$$ u n : { x i } i = 1 n R be the minimizer of an energy which consists of a data fidelity term and an appropriately scaled graph poly-Laplacian term. When$$y_i = g(x_i)+\xi _i$$ y i = g ( x i ) + ξ i , for iid noise$$\xi _i$$ ξ i , and using the geometric random graph, we identify (with high probability) the rate of convergence of$$u_n$$ u n togin the large data limit$$n\rightarrow \infty $$ n . Furthermore, our rate is close to the known rate of convergence in the usual smoothing spline model. 
    more » « less
  5. We first introduce and study the notion of multi-weighted blow-ups, which is later used to systematically construct an explicit yet efficient algorithm for functorial logarithmic resolution in characteristic zero, in the sense of Hironaka. Specifically, for a singular, reduced closed subscheme $$X$$ of a smooth scheme $$Y$$ over a field of characteristic zero, we resolve the singularities of $$X$$ by taking proper transforms $$X_i \subset Y_i$$ along a sequence of multi-weighted blow-ups $$Y_N \to Y_{N-1} \to \dotsb \to Y_0 = Y$$ which satisfies the following properties: (i) the $$Y_i$$ are smooth Artin stacks with simple normal crossing exceptional loci; (ii) at each step we always blow up the worst singular locus of $$X_i$$, and witness on $$X_{i+1}$$ an immediate improvement in singularities; (iii) and finally, the singular locus of $$X$$ is transformed into a simple normal crossing divisor on $$X_N$$. Comment: Final published version 
    more » « less