skip to main content


Title: Uncoupled isotonic regression via minimum Wasserstein deconvolution
Abstract

Isotonic regression is a standard problem in shape-constrained estimation where the goal is to estimate an unknown non-decreasing regression function $f$ from independent pairs $(x_i, y_i)$ where ${\mathbb{E}}[y_i]=f(x_i), i=1, \ldots n$. While this problem is well understood both statistically and computationally, much less is known about its uncoupled counterpart, where one is given only the unordered sets $\{x_1, \ldots , x_n\}$ and $\{y_1, \ldots , y_n\}$. In this work, we leverage tools from optimal transport theory to derive minimax rates under weak moments conditions on $y_i$ and to give an efficient algorithm achieving optimal rates. Both upper and lower bounds employ moment-matching arguments that are also pertinent to learning mixtures of distributions and deconvolution.

 
more » « less
Award ID(s):
1712596 1838071
NSF-PAR ID:
10128161
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Information and Inference: A Journal of the IMA
Volume:
8
Issue:
4
ISSN:
2049-8772
Page Range / eLocation ID:
p. 691-717
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Consider a lattice of n sites arranged around a ring, with the $n$ sites occupied by particles of weights $\{1,2,\ldots ,n\}$; the possible arrangements of particles in sites thus correspond to the $n!$ permutations in $S_n$. The inhomogeneous totally asymmetric simple exclusion process (or TASEP) is a Markov chain on $S_n$, in which two adjacent particles of weights $i<j$ swap places at rate $x_i - y_{n+1-j}$ if the particle of weight $j$ is to the right of the particle of weight $i$. (Otherwise, nothing happens.) When $y_i=0$ for all $i$, the stationary distribution was conjecturally linked to Schubert polynomials [18], and explicit formulas for steady state probabilities were subsequently given in terms of multiline queues [4, 5]. In the case of general $y_i$, Cantini [7] showed that $n$ of the $n!$ states have probabilities proportional to double Schubert polynomials. In this paper, we introduce the class of evil-avoiding permutations, which are the permutations avoiding the patterns $2413, 4132, 4213,$ and $3214$. We show that there are $\frac {(2+\sqrt {2})^{n-1}+(2-\sqrt {2})^{n-1}}{2}$ evil-avoiding permutations in $S_n$, and for each evil-avoiding permutation $w$, we give an explicit formula for the steady state probability $\psi _w$ as a product of double Schubert polynomials. (Conjecturally, all other probabilities are proportional to a positive sum of at least two Schubert polynomials.) When $y_i=0$ for all $i$, we give multiline queue formulas for the $\textbf {z}$-deformed steady state probabilities and use this to prove the monomial factor conjecture from [18]. Finally, we show that the Schubert polynomials arising in our formulas are flagged Schur functions, and we give a bijection in this case between multiline queues and semistandard Young tableaux.

     
    more » « less
  2. Abstract

    When k and s are natural numbers and ${\mathbf h}\in {\mathbb Z}^k$, denote by $J_{s,k}(X;\,{\mathbf h})$ the number of integral solutions of the system $$ \sum_{i=1}^s(x_i^j-y_i^j)=h_j\quad (1\leqslant j\leqslant k), $$ with $1\leqslant x_i,y_i\leqslant X$. When $s\lt k(k+1)/2$ and $(h_1,\ldots ,h_{k-1})\ne {\mathbf 0}$, Brandes and Hughes have shown that $J_{s,k}(X;\,{\mathbf h})=o(X^s)$. In this paper we improve on quantitative aspects of this result, and, subject to an extension of the main conjecture in Vinogradov’s mean value theorem, we obtain an asymptotic formula for $J_{s,k}(X;\,{\mathbf h})$ in the critical case $s=k(k+1)/2$. The latter requires minor arc estimates going beyond square-root cancellation.

     
    more » « less
  3. Abstract

    In the (special) smoothing spline problem one considers a variational problem with a quadratic data fidelity penalty and Laplacian regularization. Higher order regularity can be obtained via replacing the Laplacian regulariser with a poly-Laplacian regulariser. The methodology is readily adapted to graphs and here we consider graph poly-Laplacian regularization in a fully supervised, non-parametric, noise corrupted, regression problem. In particular, given a dataset$$\{x_i\}_{i=1}^n$${xi}i=1nand a set of noisy labels$$\{y_i\}_{i=1}^n\subset \mathbb {R}$${yi}i=1nRwe let$$u_n{:}\{x_i\}_{i=1}^n\rightarrow \mathbb {R}$$un:{xi}i=1nRbe the minimizer of an energy which consists of a data fidelity term and an appropriately scaled graph poly-Laplacian term. When$$y_i = g(x_i)+\xi _i$$yi=g(xi)+ξi, for iid noise$$\xi _i$$ξi, and using the geometric random graph, we identify (with high probability) the rate of convergence of$$u_n$$untogin the large data limit$$n\rightarrow \infty $$n. Furthermore, our rate is close to the known rate of convergence in the usual smoothing spline model.

     
    more » « less
  4. Abstract

    We consider the problem of covering multiple submodular constraints. Given a finite ground setN, a weight function$$w: N \rightarrow \mathbb {R}_+$$w:NR+,rmonotone submodular functions$$f_1,f_2,\ldots ,f_r$$f1,f2,,froverNand requirements$$k_1,k_2,\ldots ,k_r$$k1,k2,,krthe goal is to find a minimum weight subset$$S \subseteq N$$SNsuch that$$f_i(S) \ge k_i$$fi(S)kifor$$1 \le i \le r$$1ir. We refer to this problem asMulti-Submod-Coverand it was recently considered by Har-Peled and Jones (Few cuts meet many point sets. CoRR.arxiv:abs1808.03260Har-Peled and Jones 2018) who were motivated by an application in geometry. Even with$$r=1$$r=1Multi-Submod-Covergeneralizes the well-known Submodular Set Cover problem (Submod-SC), and it can also be easily reduced toSubmod-SC. A simple greedy algorithm gives an$$O(\log (kr))$$O(log(kr))approximation where$$k = \sum _i k_i$$k=ikiand this ratio cannot be improved in the general case. In this paper, motivated by several concrete applications, we consider two ways to improve upon the approximation given by the greedy algorithm. First, we give a bicriteria approximation algorithm forMulti-Submod-Coverthat covers each constraint to within a factor of$$(1-1/e-\varepsilon )$$(1-1/e-ε)while incurring an approximation of$$O(\frac{1}{\epsilon }\log r)$$O(1ϵlogr)in the cost. Second, we consider the special case when each$$f_i$$fiis a obtained from a truncated coverage function and obtain an algorithm that generalizes previous work on partial set cover (Partial-SC), covering integer programs (CIPs) and multiple vertex cover constraints Bera et al. (Theoret Comput Sci 555:2–8 Bera et al. 2014). Both these algorithms are based on mathematical programming relaxations that avoid the limitations of the greedy algorithm. We demonstrate the implications of our algorithms and related ideas to several applications ranging from geometric covering problems to clustering with outliers. Our work highlights the utility of the high-level model and the lens of submodularity in addressing this class of covering problems.

     
    more » « less
  5. The noise sensitivity of a Boolean function f: {0,1}^n - > {0,1} is one of its fundamental properties. For noise parameter delta, the noise sensitivity is denoted as NS_{delta}[f]. This quantity is defined as follows: First, pick x = (x_1,...,x_n) uniformly at random from {0,1}^n, then pick z by flipping each x_i independently with probability delta. NS_{delta}[f] is defined to equal Pr [f(x) != f(z)]. Much of the existing literature on noise sensitivity explores the following two directions: (1) Showing that functions with low noise-sensitivity are structured in certain ways. (2) Mathematically showing that certain classes of functions have low noise sensitivity. Combined, these two research directions show that certain classes of functions have low noise sensitivity and therefore have useful structure. The fundamental importance of noise sensitivity, together with this wealth of structural results, motivates the algorithmic question of approximating NS_{delta}[f] given an oracle access to the function f. We show that the standard sampling approach is essentially optimal for general Boolean functions. Therefore, we focus on estimating the noise sensitivity of monotone functions, which form an important subclass of Boolean functions, since many functions of interest are either monotone or can be simply transformed into a monotone function (for example the class of unate functions consists of all the functions that can be made monotone by reorienting some of their coordinates [O'Donnell, 2014]). Specifically, we study the algorithmic problem of approximating NS_{delta}[f] for monotone f, given the promise that NS_{delta}[f] >= 1/n^{C} for constant C, and for delta in the range 1/n <= delta <= 1/2. For such f and delta, we give a randomized algorithm performing O((min(1,sqrt{n} delta log^{1.5} n))/(NS_{delta}[f]) poly (1/epsilon)) queries and approximating NS_{delta}[f] to within a multiplicative factor of (1 +/- epsilon). Given the same constraints on f and delta, we also prove a lower bound of Omega((min(1,sqrt{n} delta))/(NS_{delta}[f] * n^{xi})) on the query complexity of any algorithm that approximates NS_{delta}[f] to within any constant factor, where xi can be any positive constant. Thus, our algorithm's query complexity is close to optimal in terms of its dependence on n. We introduce a novel descending-ascending view of noise sensitivity, and use it as a central tool for the analysis of our algorithm. To prove lower bounds on query complexity, we develop a technique that reduces computational questions about query complexity to combinatorial questions about the existence of "thin" functions with certain properties. The existence of such "thin" functions is proved using the probabilistic method. These techniques also yield new lower bounds on the query complexity of approximating other fundamental properties of Boolean functions: the total influence and the bias. 
    more » « less