This paper investigates asymptotic properties of algorithms that can be viewed as robust analogues of the classical empirical risk minimization. These strategies are based on replacing the usual empirical average by a robust proxy of the mean, such as a variant of the median of means estimator. It is well known by now that the excess risk of resulting estimators often converges to zero at optimal rates under much weaker assumptions than those required by their classical counterparts. However, less is known about the asymptotic properties of the estimators themselves, for instance, whether robust analogues of the maximum likelihood estimators are asymptotically efficient. We make a step towards answering these questions and show that for a wide class of parametric problems, minimizers of the appropriately defined robust proxy of the risk converge to the minimizers of the true risk at the same rate, and often have the same asymptotic variance, as the estimators obtained by minimizing the usual empirical risk. Finally, we discuss the computational aspects of the problem and demonstrate the numerical performance of the methods under consideration in numerical experiments.
more »
« less
Excess risk bounds in robust empirical risk minimization
Abstract This paper investigates robust versions of the general empirical risk minimization algorithm, one of the core techniques underlying modern statistical methods. Success of the empirical risk minimization is based on the fact that for a ‘well-behaved’ stochastic process $$\left \{ f(X), \ f\in \mathscr F\right \}$$ indexed by a class of functions $$f\in \mathscr F$$, averages $$\frac{1}{N}\sum _{j=1}^N f(X_j)$$ evaluated over a sample $$X_1,\ldots ,X_N$$ of i.i.d. copies of $$X$$ provide good approximation to the expectations $$\mathbb E f(X)$$, uniformly over large classes $$f\in \mathscr F$$. However, this might no longer be true if the marginal distributions of the process are heavy tailed or if the sample contains outliers. We propose a version of empirical risk minimization based on the idea of replacing sample averages by robust proxies of the expectations and obtain high-confidence bounds for the excess risk of resulting estimators. In particular, we show that the excess risk of robust estimators can converge to $$0$$ at fast rates with respect to the sample size $$N$$, referring to the rates faster than $$N^{-1/2}$$. We discuss implications of the main results to the linear and logistic regression problems and evaluate the numerical performance of proposed methods on simulated and real data.
more »
« less
- PAR ID:
- 10204547
- Publisher / Repository:
- Information and Inference: A Journal of the IMA, Volume 10, Issue 4, December 2021
- Date Published:
- Journal Name:
- Information and Inference: A Journal of the IMA
- Volume:
- 10
- Issue:
- 4
- ISSN:
- 2049-8772
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In this work, we explore Hypothesis Transfer Learning (HTL) under adversarial attacks. In this setting, a learner has access to a training dataset of size n from an underlying distribution D and a set of auxiliary hypotheses. These auxiliary hypotheses, which can be viewed as prior information originating either from expert knowledge or as pre-trained foundation models, are employed as an initialization for the learning process. Our goal is to develop an adversarially robust model for D. We begin by examining an adversarial variant of the regularized empirical risk minimization learning rule that we term A-RERM. Assuming a nonnegative smooth loss function with a strongly convex regularizer, we establish a bound on the robust generalization error of the hypothesis returned by A-RERM in terms of the robust empirical loss and the quality of the initialization. If the initialization is good, i.e., there exists a weighted combination of auxiliary hypotheses with a small robust population loss, the bound exhibits a fast rate of O(1/n). Otherwise, we get the standard rate of O(1/ √ n). Additionally, we provide a bound on the robust excess risk which is similar in nature, albeit with a slightly worse rate. We also consider solving the problem using a practical variant, namely proximal stochastic adversarial training, and present a bound that depends on the initialization. This bound has the same dependence on the sample size as the ARERM bound, except for an additional term that depends on the size of the adversarial perturbation.more » « less
-
We consider the problem of learning the underlying structure of a general discrete pairwise Markov network. Existing approaches that rely on empirical risk minimization may perform poorly in settings with noisy or scarce data. To overcome these limitations, we propose a computationally efficient and robust learning method for this problem with near-optimal sample complexities. Our approach builds upon distributionally robust optimization (DRO) and maximum conditional log-likelihood. The proposed DRO estimator minimizes the worst-case risk over an ambiguity set of adversarial distributions within bounded transport cost or f-divergence of the empirical data distribution. We show that the primal minimax learning problem can be efficiently solved by leveraging sufficient statistics and greedy maximization in the ostensibly intractable dual formulation. Based on DRO’s approximation to Lipschitz and variance regularization, we derive near-optimal sample complexities matching existing results. Extensive empirical evidence with different corruption models corroborates the effectiveness of the proposed methods.more » « less
-
Developing simple, sample-efficient learning algorithms for robust classification is a pressing issue in today's tech-dominated world, and current theoretical techniques requiring exponential sample complexity and complicated improper learning rules fall far from answering the need. In this work we study the fundamental paradigm of (robust) empirical risk minimization (RERM), a simple process in which the learner outputs any hypothesis minimizing its training error. RERM famously fails to robustly learn VC classes (Montasser et al., 2019a), a bound we show extends even to `nice' settings such as (bounded) halfspaces. As such, we study a recent relaxation of the robust model called tolerant robust learning (Ashtiani et al., 2022) where the output classifier is compared to the best achievable error over slightly larger perturbation sets. We show that under geometric niceness conditions, a natural tolerant variant of RERM is indeed sufficient for γ-tolerant robust learning VC classes over ℝd, and requires only Õ (VC(H)dlogDγδϵ2) samples for robustness regions of (maximum) diameter D.more » « less
-
This paper proposes a novel non-parametric multidimensional convex regression estimator which is designed to be robust to adversarial perturbations in the empirical measure. We minimize over convex functions the maximum (over Wasserstein perturbations of the empirical measure) of the absolute regression errors. The inner maximization is solved in closed form resulting in a regularization penalty involves the norm of the gradient. We show consistency of our estimator and a rate of convergence of order O˜(n−1/d), matching the bounds of alternative estimators based on square-loss minimization. Contrary to all of the existing results, our convergence rates hold without imposing compactness on the underlying domain and with no a priori bounds on the underlying convex function or its gradient norm.more » « less
An official website of the United States government

