skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Estimating location parameters in sample-heterogeneous distributions
Abstract Estimating the mean of a probability distribution using i.i.d. samples is a classical problem in statistics, wherein finite-sample optimal estimators are sought under various distributional assumptions. In this paper, we consider the problem of mean estimation when independent samples are drawn from $$d$$-dimensional non-identical distributions possessing a common mean. When the distributions are radially symmetric and unimodal, we propose a novel estimator, which is a hybrid of the modal interval, shorth and median estimators and whose performance adapts to the level of heterogeneity in the data. We show that our estimator is near optimal when data are i.i.d. and when the fraction of ‘low-noise’ distributions is as small as $$\varOmega \left (\frac{d \log n}{n}\right )$$, where $$n$$ is the number of samples. We also derive minimax lower bounds on the expected error of any estimator that is agnostic to the scales of individual data points. Finally, we extend our theory to linear regression. In both the mean estimation and regression settings, we present computationally feasible versions of our estimators that run in time polynomial in the number of data points.  more » « less
Award ID(s):
1749857 1907786
PAR ID:
10289004
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Information and Inference: A Journal of the IMA
ISSN:
2049-8772
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Krause, Andreas; Brunskill, Emma; Cho, Kyunghyun; Engelhardt, Barbara; Sabato, Sivan; Scarlett, Jonathan (Ed.)
    We consider the problem of estimating the optimal transport map between two probability distributions, P and Q in R^d, on the basis of i.i.d. samples. All existing statistical analyses of this problem require the assumption that the transport map is Lipschitz, a strong requirement that, in particular, excludes any examples where the transport map is discontinuous. As a first step towards developing estimation procedures for discontinuous maps, we consider the important special case where the data distribution Q is a discrete measure supported on a finite number of points in R^d. We study a computationally efficient estimator initially proposed by Pooladian & Niles-Weed (2021), based on entropic optimal transport, and show in the semi-discrete setting that it converges at the minimax-optimal rate n^{−1/2}, independent of dimension. Other standard map estimation techniques both lack finite-sample guarantees in this setting and provably suffer from the curse of dimensionality. We confirm these results in numerical experiments, and provide experiments for other settings, not covered by our theory, which indicate that the entropic estimator is a promising methodology for other discontinuous transport map estimation problems. 
    more » « less
  2. In classical statistics, a well known paradigm consists in establishing asymptotic equivalence between an experiment of i.i.d. observations and a Gaussian shift experiment, with the aim of obtaining optimal estimators in the former complicated model from the latter simpler model. In particular, a statistical experiment consisting of n i.i.d. observations from d-dimensional multinomial distributions can be well approximated by an experiment consisting of d − 1 dimensional Gaussian distributions. In a quantum version of the result, it has been shown that a collection of n qudits (d-dimensional quantum states) of full rank can be well approximated by a quantum system containing a classical part, which is a d − 1 dimensional Gaussian distribution, and a quantum part containing an ensemble of d(d − 1)/2 shifted thermal states. In this paper, we obtain a generalization of this result when the qudits are not of full rank. We show that when the rank of the qudits is r, then the limiting experiment consists of an r − 1 dimensional Gaussian distribution and an ensemble of both shifted pure and shifted thermal states. For estimation purposes, we establish an asymptotic minimax result in the limiting Gaussian model. Analogous results are then obtained for estimation of a low rank qudit from an ensemble of identically prepared, independent quantum systems, using the local asymptotic equivalence result. We also consider the problem of estimation of a linear functional of the quantum state. We construct an estimator for the functional, analyze the risk and use quantum local asymptotic equivalence to show that our estimator is also optimal in the minimax sense. 
    more » « less
  3. We consider the problem of estimating the optimal transport map between two probability distributions, P and Q in Rd, on the basis of i.i.d. samples. All existing statistical analyses of this problem require the assumption that the transport map is Lipschitz, a strong requirement that, in particular, excludes any examples where the transport map is discontinuous. As a first step towards developing estimation procedures for discontinuous maps, we consider the important special case where the data distribution Q is a discrete measure supported on a finite number of points in Rd. We study a computationally efficient estimator initially proposed by Pooladian and Niles-Weed (2021), based on entropic optimal transport, and show in the semi-discrete setting that it converges at the minimax-optimal rate n−1/2, independent of dimension. Other standard map estimation techniques both lack finite-sample guarantees in this setting and provably suffer from the curse of dimensionality. We confirm these results in numerical experiments, and provide experiments for other settings, not covered by our theory, which indicate that the entropic estimator is a promising methodology for other discontinuous transport map estimation problems. 
    more » « less
  4. Location estimation is one of the most basic questions in parametric statistics. Suppose we have a known distribution density f , and we get n i.i.d. samples from f (x − μ) for some unknown shift μ. The task is to estimate μ to high accuracy with high probability. The maximum likelihood estimator (MLE) is known to be asymptotically optimal as n → ∞, but what is possible for finite n? In this paper, we give two location estimators that are optimal under different criteria: 1) an estimator that has minimax-optimal estimation error subject to succeeding with probability 1 − ¶ and 2) a confidence interval estimator which, subject to its output interval containing μ with probability at least 1 − ¶, has the minimum expected squared interval width among all shift-invariant estimators. The latter construction can be generalized to minimizing the expectation of any loss function on the interval width. 
    more » « less
  5. Location estimation is one of the most basic questions in parametric statistics. Suppose we have a known distribution density f, and we get n i.i.d. samples from f(x − µ) for some unknown shift µ. The task is to estimate µ to high accuracy with high probability. The maximum likelihood estimator (MLE) is known to be asymptotically optimal as n → ∞, but what is possible for finite n? In this paper, we give two location estimators that are optimal under different criteria: 1) an estimator that has minimax-optimal estimation error subject to succeeding with probability 1 − δ and 2) a confidence interval estimator which, subject to its output interval containing µ with probability at least 1 − δ, has the minimum expected squared interval width among all shift-invariant estimators. The latter construction can be generalized to minimizing the expectation of any loss function on the interval width. 
    more » « less