Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract This paper investigates robust versions of the general empirical risk minimization algorithm, one of the core techniques underlying modern statistical methods. Success of the empirical risk minimization is based on the fact that for a ‘well-behaved’ stochastic process $$\left \{ f(X), \ f\in \mathscr F\right \}$$ indexed by a class of functions $$f\in \mathscr F$$, averages $$\frac{1}{N}\sum _{j=1}^N f(X_j)$$ evaluated over a sample $$X_1,\ldots ,X_N$$ of i.i.d. copies of $$X$$ provide good approximation to the expectations $$\mathbb E f(X)$$, uniformly over large classes $$f\in \mathscr F$$. However, this might no longer be true if the marginal distributions of the process are heavy tailed or if the sample contains outliers. We propose a version of empirical risk minimization based on the idea of replacing sample averages by robust proxies of the expectations and obtain high-confidence bounds for the excess risk of resulting estimators. In particular, we show that the excess risk of robust estimators can converge to $$0$$ at fast rates with respect to the sample size $$N$$, referring to the rates faster than $$N^{-1/2}$$. We discuss implications of the main results to the linear and logistic regression problems and evaluate the numerical performance of proposed methods on simulated and real data.more » « less
-
This paper investigates asymptotic properties of algorithms that can be viewed as robust analogues of the classical empirical risk minimization. These strategies are based on replacing the usual empirical average by a robust proxy of the mean, such as a variant of the median of means estimator. It is well known by now that the excess risk of resulting estimators often converges to zero at optimal rates under much weaker assumptions than those required by their classical counterparts. However, less is known about the asymptotic properties of the estimators themselves, for instance, whether robust analogues of the maximum likelihood estimators are asymptotically efficient. We make a step towards answering these questions and show that for a wide class of parametric problems, minimizers of the appropriately defined robust proxy of the risk converge to the minimizers of the true risk at the same rate, and often have the same asymptotic variance, as the estimators obtained by minimizing the usual empirical risk. Finally, we discuss the computational aspects of the problem and demonstrate the numerical performance of the methods under consideration in numerical experiments.more » « lessFree, publicly-accessible full text available March 1, 2026
-
We consider the high-dimensional linear regression model and assume that a fraction of the measurements are altered by an adversary with complete knowledge of the data and the underlying distribution. We are interested in a scenario where dense additive noise is heavy-tailed while the measurement vectors follow a sub-Gaussian distribution. Within this framework, we establish minimax lower bounds for the performance of an arbitrary estimator that depend on the the fraction of corrupted observations as well as the tail behavior of the additive noise. Moreover, we design a modification of the so-called Square-Root Slope estimator with several desirable features: (a) it is provably robust to adversarial contamination, and satisfies performance guarantees in the form of sub-Gaussian deviation inequalities that match the lower error bounds, up to logarithmic factors; (b) it is fully adaptive with respect to the unknown sparsity level and the variance of the additive noise, and (c) it is computationally tractable as a solution of a convex optimization problem. To analyze performance of the proposed estimator, we prove several properties of matrices with sub-Gaussian rows that may be of independent interest.more » « less
-
This paper is devoted to the statistical properties of the geometric median, a robust measure of centrality for multivariate data, as well as its applications to the problem of mean estimation via the median of means principle. Our main theoretical results include (a) the upper bound for the distance between the mean and the median for general absolutely continuous distributions in $$\mathbb R^d$$, and examples of specific classes of distributions for which these bounds do not depend on the ambient dimension $$d$$; (b) exponential deviation inequalities for the distance between the sample and the population versions of the geometric median, which again depend only on the trace-type quantities and not on the ambient dimension. As a corollary, we deduce the improved bounds for the multivariate median of means estimator that hold for large classes of heavy-tailed distributions.more » « less
-
The goal of this note is to present a modification of the popular median of means estimator that achieves sub-Gaussian deviation bounds with nearly optimal constants under minimal assumptions on the underlying distribution. We build on the recent work on the topic and prove that desired guarantees can be attained under weaker requirements.more » « less
-
This paper addresses the following question: given a sample of i.i.d. random variables with finite variance, can one construct an estimator of the unknown mean that performs nearly as well as if the data were normally distributed? One of the most popular examples achieving this goal is the median of means estimator. However, it is inefficient in a sense that the constants in the resulting bounds are suboptimal. We show that a permutation-invariant modification of the median of means estimator admits deviation guarantees that are sharp up to $1+o(1)$ factor if the underlying distribution possesses more than $$\frac{3+\sqrt{5}}{2}\approx 2.62$$ moments and is absolutely continuous with respect to the Lebesgue measure. This result yields potential improvements for a variety of algorithms that rely on the median of means estimator as a building block. At the core of our argument is are the new deviation inequalities for the U-statistics of order that is allowed to grow with the sample size, a result that could be of independent interest.more » « less
-
In the problem of online portfolio selection as formulated by Cover (1991), the trader repeatedly distributes her capital over d assets in each of T>1 rounds, with the goal of maximizing the total return. Cover proposed an algorithm, termed Universal Portfolios, that performs nearly as well as the best (in hindsight) static assignment of a portfolio, with an O(dlog(T)) regret in terms of the logarithmic return. Without imposing any restrictions on the market this guarantee is known to be worst-case optimal, and no other algorithm attaining it has been discovered so far. Unfortunately, Cover's algorithm crucially relies on computing certain d-dimensional integral which must be approximated in any implementation; this results in a prohibitive O(d^4(T+d)^14) per-round runtime for the fastest known implementation due to Kalai and Vempala (2002). We propose an algorithm for online portfolio selection that admits essentially the same regret guarantee as Universal Portfolios -- up to a constant factor and replacement of log(T) with log(T+d) -- yet has a drastically reduced runtime of O(d^(T+d)) per round. The selected portfolio minimizes the current logarithmic loss regularized by the log-determinant of its Hessian -- equivalently, the hybrid logarithmic-volumetric barrier of the polytope specified by the asset return vectors. As such, our work reveals surprising connections of online portfolio selection with two classical topics in optimization theory: cutting-plane and interior-point algorithms.more » « less
-
The topic of robustness is experiencing a resurgence of interest in the statistical and machine learning communities. In particular, robust algorithms making use of the so-called median of means estimator were shown to satisfy strong performance guarantees for many problems, including estimation of the mean, covariance structure as well as linear regression. In this work, we propose an extension of the median of means principle to the Bayesian framework, leading to the notion of the robust posterior distribution. In particular, we (a) quantify robustness of this posterior to outliers, (b) show that it satisfies a version of the Bernstein-von Mises theorem that connects Bayesian credible sets to the traditional confidence intervals, and (c) demonstrate that our approach performs well in applications.more » « less
-
null (Ed.)The question of fast convergence in the classical problem of high dimensional linear regression has been extensively studied. Arguably, one of the fastest procedures in practice is Iterative Hard Thresholding (IHT). Still, IHT relies strongly on the knowledge of the true sparsity parameter s. In this paper, we present a novel fast procedure for estimation in the high dimensional linear regression. Taking advantage of the interplay between estimation, support recovery and optimization we achieve both optimal statistical accuracy and fast convergence. The main advantage of our procedure is that it is fully adaptive, making it more practical than state of the art IHT methods. Our procedure achieves optimal statistical accuracy faster than, for instance, classical algorithms for the Lasso. Moreover, we establish sharp optimal results for both estimation and support recovery. As a consequence, we present a new iterative hard thresholding algorithm for high dimensional linear regression that is scaled minimax optimal (achieves the estimation error of the oracle that knows the sparsity pattern if possible), fast and adaptive.more » « less
An official website of the United States government

Full Text Available