skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The geometric median and applications to robust mean estimation
This paper is devoted to the statistical properties of the geometric median, a robust measure of centrality for multivariate data, as well as its applications to the problem of mean estimation via the median of means principle. Our main theoretical results include (a) the upper bound for the distance between the mean and the median for general absolutely continuous distributions in $$\mathbb R^d$$, and examples of specific classes of distributions for which these bounds do not depend on the ambient dimension $$d$$; (b) exponential deviation inequalities for the distance between the sample and the population versions of the geometric median, which again depend only on the trace-type quantities and not on the ambient dimension. As a corollary, we deduce the improved bounds for the multivariate median of means estimator that hold for large classes of heavy-tailed distributions.  more » « less
Award ID(s):
2045068 1908905
PAR ID:
10423179
Author(s) / Creator(s):
;
Publisher / Repository:
SIAM
Date Published:
Journal Name:
SIAM Journal on Mathematics of Data Science (SIMODS)
ISSN:
2331-8422
Subject(s) / Keyword(s):
geometric median median of means heavy tails
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Lightness and sparsity are two natural parameters for Euclidean (1+ε)-spanners. Classical results show that, when the dimension d ∈ ℕ and ε > 0 are constant, every set S of n points in d-space admits an (1+ε)-spanners with O(n) edges and weight proportional to that of the Euclidean MST of S. Tight bounds on the dependence on ε > 0 for constant d ∈ ℕ have been established only recently. Le and Solomon (FOCS 2019) showed that Steiner points can substantially improve the lightness and sparsity of a (1+ε)-spanner. They gave upper bounds of Õ(ε^{-(d+1)/2}) for the minimum lightness in dimensions d ≥ 3, and Õ(ε^{-(d-1))/2}) for the minimum sparsity in d-space for all d ≥ 1. They obtained lower bounds only in the plane (d = 2). Le and Solomon (ESA 2020) also constructed Steiner (1+ε)-spanners of lightness O(ε^{-1}logΔ) in the plane, where Δ ∈ Ω(log n) is the spread of S, defined as the ratio between the maximum and minimum distance between a pair of points. In this work, we improve several bounds on the lightness and sparsity of Euclidean Steiner (1+ε)-spanners. Using a new geometric analysis, we establish lower bounds of Ω(ε^{-d/2}) for the lightness and Ω(ε^{-(d-1)/2}) for the sparsity of such spanners in Euclidean d-space for all d ≥ 2. We use the geometric insight from our lower bound analysis to construct Steiner (1+ε)-spanners of lightness O(ε^{-1}log n) for n points in Euclidean plane. 
    more » « less
  2. We prove Fuk-Nagaev and Rosenthal-type inequalities for the sums of indepen- dent random matrices, focusing on the situation when the norms of the matrices possess finite moments of only low orders. Our bounds depend on the “intrinsic” dimensional char- acteristics such as the effective rank, as opposed to the dimension of the ambient space. We illustrate the advantages of such results in several applications, including new moment inequalities for the sample covariance operators of heavy-tailed distributions. Moreover, we demonstrate that our techniques yield sharpened versions of the moment inequalities for empirical processes. 
    more » « less
  3. It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parametrizations: neural networks in the linear regime, and neural networks with no structural constraints. However, for the main parametrization of interest —non-linear but regular networks— no tight characterization has yet been achieved, despite significant developments. We take a step in this direction by considering depth-2 neural networks trained by SGD in the mean-field regime. We consider functions on binary inputs that depend on a latent low-dimensional subspace (i.e., small number of coordinates). This regime is of interest since it is poorly under- stood how neural networks routinely tackle high-dimensional datasets and adapt to latent low- dimensional structure without suffering from the curse of dimensionality. Accordingly, we study SGD-learnability with O(d) sample complexity in a large ambient dimension d. Our main results characterize a hierarchical property —the merged-staircase property— that is both necessary and nearly sufficient for learning in this setting. We further show that non-linear training is necessary: for this class of functions, linear methods on any feature map (e.g., the NTK) are not capable of learning efficiently. The key tools are a new “dimension-free” dynamics approximation result that applies to functions defined on a latent space of low-dimension, a proof of global convergence based on polynomial identity testing, and an improvement of lower bounds against linear methods for non-almost orthogonal functions. 
    more » « less
  4. null (Ed.)
    This paper is devoted to the estimators of the mean that provide strong non-asymptotic guarantees under minimal assumptions on the underlying distribution. The main ideas behind proposed techniques are based on bridging the notions of symmetry and robustness. We show that existing methods, such as median-of-means and Catoni’s estimators, can often be viewed as special cases of our construction. The main contribution of the paper is the proof of uniform bounds for the deviations of the stochastic process defined by proposed estimators. Moreover, we extend our results to the case of adversarial contamination where a constant fraction of the observations is arbitrarily corrupted. Finally, we apply our methods to the problem of robust multivariate mean estimation and show that obtained inequalities achieve optimal dependence on the proportion of corrupted samples. 
    more » « less
  5. Computation of the interleaving distance between persistence modules is a central task in topological data analysis. For $$1$$-D persistence modules, thanks to the isometry theorem, this can be done by computing the bottleneck distance with known efficient algorithms. The question is open for most $$n$$-D persistence modules, $n>1$$, because of the well recognized complications of the indecomposables. Here, we consider a reasonably complicated class called {\em $$2$$-D interval decomposable} modules whose indecomposables may have a description of non-constant complexity. We present a polynomial time algorithm to compute the bottleneck distance for these modules from indecomposables, which bounds the interleaving distance from above, and give another algorithm to compute a new distance called {\em dimension distance} that bounds it from below. 
    more » « less