skip to main content


Title: Excess risk bounds in robust empirical risk minimization
Abstract

This paper investigates robust versions of the general empirical risk minimization algorithm, one of the core techniques underlying modern statistical methods. Success of the empirical risk minimization is based on the fact that for a ‘well-behaved’ stochastic process $\left \{ f(X), \ f\in \mathscr F\right \}$ indexed by a class of functions $f\in \mathscr F$, averages $\frac{1}{N}\sum _{j=1}^N f(X_j)$ evaluated over a sample $X_1,\ldots ,X_N$ of i.i.d. copies of $X$ provide good approximation to the expectations $\mathbb E f(X)$, uniformly over large classes $f\in \mathscr F$. However, this might no longer be true if the marginal distributions of the process are heavy tailed or if the sample contains outliers. We propose a version of empirical risk minimization based on the idea of replacing sample averages by robust proxies of the expectations and obtain high-confidence bounds for the excess risk of resulting estimators. In particular, we show that the excess risk of robust estimators can converge to $0$ at fast rates with respect to the sample size $N$, referring to the rates faster than $N^{-1/2}$. We discuss implications of the main results to the linear and logistic regression problems and evaluate the numerical performance of proposed methods on simulated and real data.

 
more » « less
Award ID(s):
1908905 1712956
NSF-PAR ID:
10204547
Author(s) / Creator(s):
;
Publisher / Repository:
Information and Inference: A Journal of the IMA, Volume 10, Issue 4, December 2021
Date Published:
Journal Name:
Information and Inference: A Journal of the IMA
Volume:
10
Issue:
4
ISSN:
2049-8772
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    This paper investigates asymptotic properties of a class of algorithms that can be viewed as robust analogues of the classical empirical risk minimization. These strategies are based on replacing the usual empirical average by a robust proxy of the mean, such as the (version of) the median-of-means estimator. It is well known by now that the excess risk of resulting estimators often converges to 0 at optimal rates under much weaker assumptions than those required by their “classical” counterparts. However, much less is known about the asymptotic properties of the estimators themselves, for instance, whether robust analogues of the maximum likelihood estimators are asymptotically efficient. We make a step towards answering these questions and show that for a wide class of parametric problems, minimizers of the appropriately defined robust proxy of the risk converge to the minimizers of the true risk at the same rate, and often have the same asymptotic variance, as the estimators obtained by minimizing the usual empirical risk. Moreover, our results show that robust algorithms based on the so-called “min-max” type procedures in many cases provably outperform, is the asymptotic sense, algorithms based on direct risk minimization. 
    more » « less
  2. We consider the problem of learning the underlying structure of a general discrete pairwise Markov network. Existing approaches that rely on empirical risk minimization may perform poorly in settings with noisy or scarce data. To overcome these limitations, we propose a computationally efficient and robust learning method for this problem with near-optimal sample complexities. Our approach builds upon distributionally robust optimization (DRO) and maximum conditional log-likelihood. The proposed DRO estimator minimizes the worst-case risk over an ambiguity set of adversarial distributions within bounded transport cost or f-divergence of the empirical data distribution. We show that the primal minimax learning problem can be efficiently solved by leveraging sufficient statistics and greedy maximization in the ostensibly intractable dual formulation. Based on DRO’s approximation to Lipschitz and variance regularization, we derive near-optimal sample complexities matching existing results. Extensive empirical evidence with different corruption models corroborates the effectiveness of the proposed methods. 
    more » « less
  3. Developing simple, sample-efficient learning algorithms for robust classification is a pressing issue in today's tech-dominated world, and current theoretical techniques requiring exponential sample complexity and complicated improper learning rules fall far from answering the need. In this work we study the fundamental paradigm of (robust) empirical risk minimization (RERM), a simple process in which the learner outputs any hypothesis minimizing its training error. RERM famously fails to robustly learn VC classes (Montasser et al., 2019a), a bound we show extends even to `nice' settings such as (bounded) halfspaces. As such, we study a recent relaxation of the robust model called tolerant robust learning (Ashtiani et al., 2022) where the output classifier is compared to the best achievable error over slightly larger perturbation sets. We show that under geometric niceness conditions, a natural tolerant variant of RERM is indeed sufficient for γ-tolerant robust learning VC classes over ℝd, and requires only Õ (VC(H)dlogDγδϵ2) samples for robustness regions of (maximum) diameter D. 
    more » « less
  4. This paper proposes a novel non-parametric multidimensional convex regression estimator which is designed to be robust to adversarial perturbations in the empirical measure. We minimize over convex functions the maximum (over Wasserstein perturbations of the empirical measure) of the absolute regression errors. The inner maximization is solved in closed form resulting in a regularization penalty involves the norm of the gradient. We show consistency of our estimator and a rate of convergence of order O˜(n−1/d), matching the bounds of alternative estimators based on square-loss minimization. Contrary to all of the existing results, our convergence rates hold without imposing compactness on the underlying domain and with no a priori bounds on the underlying convex function or its gradient norm. 
    more » « less
  5. Abstract

    The genetic effective population size,Ne, can be estimated from the average gametic disequilibrium () between pairs of loci, but such estimates require evaluation of assumptions and currently have few methods to estimate confidence intervals.speed‐neis a suite ofmatlabcomputer code functions to estimatefromwith a graphical user interface and a rich set of outputs that aid in understanding data patterns and comparing multiple estimators.speed‐neincludes functions to either generate or input simulated genotype data to facilitate comparative studies ofestimators under various population genetic scenarios.speed‐newas validated with data simulated under both time‐forward and time‐backward coalescent models of genetic drift. Three classes of estimators were compared with simulated data to examine several general questions: what are the impacts of microsatellite null alleles on,how should missing data be treated, and does disequilibrium contributed by reduced recombination among some loci in a sample impact. Estimators differed greatly in precision in the scenarios examined, and a widely employedestimator exhibited the largest variances among replicate data sets.speed‐neimplements several jackknife approaches to estimate confidence intervals, and simulated data showed that jackknifing over loci and jackknifing over individuals provided ~95% confidence interval coverage for some estimators and should be useful for empirical studies.speed‐neprovides an open‐source extensible tool for estimation offrom empirical genotype data and to conduct simulations of both microsatellite and single nucleotide polymorphism (SNP) data types to develop expectations and to compareestimators.

     
    more » « less