skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on March 1, 2026

Title: Asymptotic Normality of Robust Risk Minimizers
This paper investigates asymptotic properties of algorithms that can be viewed as robust analogues of the classical empirical risk minimization. These strategies are based on replacing the usual empirical average by a robust proxy of the mean, such as a variant of the median of means estimator. It is well known by now that the excess risk of resulting estimators often converges to zero at optimal rates under much weaker assumptions than those required by their classical counterparts. However, less is known about the asymptotic properties of the estimators themselves, for instance, whether robust analogues of the maximum likelihood estimators are asymptotically efficient. We make a step towards answering these questions and show that for a wide class of parametric problems, minimizers of the appropriately defined robust proxy of the risk converge to the minimizers of the true risk at the same rate, and often have the same asymptotic variance, as the estimators obtained by minimizing the usual empirical risk. Finally, we discuss the computational aspects of the problem and demonstrate the numerical performance of the methods under consideration in numerical experiments.  more » « less
Award ID(s):
1908905 1712956 2045068
PAR ID:
10204546
Author(s) / Creator(s):
Publisher / Repository:
Academia Sinica
Date Published:
Journal Name:
Statistica Sinica
ISSN:
1017-0405
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper studies inference in randomized controlled trials with covariate‐adaptive randomization when there are multiple treatments. More specifically, we study in this setting inference about the average effect of one or more treatments relative to other treatments or a control. As in Bugni, Canay, and Shaikh (2018), covariate‐adaptive randomization refers to randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve “balance” within each stratum. Importantly, in contrast to Bugni, Canay, and Shaikh (2018), we not only allow for multiple treatments, but further allow for the proportion of units being assigned to each of the treatments to vary across strata. We first study the properties of estimators derived from a “fully saturated” linear regression, that is, a linear regression of the outcome on all interactions between indicators for each of the treatments and indicators for each of the strata. We show that tests based on these estimators using the usual heteroskedasticity‐consistent estimator of the asymptotic variance are invalid in the sense that they may have limiting rejection probability under the null hypothesis strictly greater than the nominal level; on the other hand, tests based on these estimators and suitable estimators of the asymptotic variance that we provide are exact in the sense that they have limiting rejection probability under the null hypothesis equal to the nominal level. For the special case in which the target proportion of units being assigned to each of the treatments does not vary across strata, we additionally consider tests based on estimators derived from a linear regression with “strata fixed effects,” that is, a linear regression of the outcome on indicators for each of the treatments and indicators for each of the strata. We show that tests based on these estimators using the usual heteroskedasticity‐consistent estimator of the asymptotic variance are conservative in the sense that they have limiting rejection probability under the null hypothesis no greater than and typically strictly less than the nominal level, but tests based on these estimators and suitable estimators of the asymptotic variance that we provide are exact, thereby generalizing results in Bugni, Canay, and Shaikh (2018) for the case of a single treatment to multiple treatments. A simulation study and an empirical application illustrate the practical relevance of our theoretical results. 
    more » « less
  2. Summary Estimators based on Wasserstein distributionally robust optimization are obtained as solutions of min-max problems in which the statistician selects a parameter minimizing the worst-case loss among all probability models within a certain distance from the underlying empirical measure in a Wasserstein sense. While motivated by the need to identify optimal model parameters or decision choices that are robust to model misspecification, these distributionally robust estimators recover a wide range of regularized estimators, including square-root lasso and support vector machines, among others. This paper studies the asymptotic normality of these distributionally robust estimators as well as the properties of an optimal confidence region induced by the Wasserstein distributionally robust optimization formulation. In addition, key properties of min-max distributionally robust optimization problems are also studied; for example, we show that distributionally robust estimators regularize the loss based on its derivative, and we also derive general sufficient conditions which show the equivalence between the min-max distributionally robust optimization problem and the corresponding max-min formulation. 
    more » « less
  3. null (Ed.)
    The Bayesian formulation of inverse problems is attractive for three primary reasons: it provides a clear modelling framework; it allows for principled learning of hyperparameters; and it can provide uncertainty quantification. The posterior distribution may in principle be sampled by means of MCMC or SMC methods, but for many problems it is computationally infeasible to do so. In this situation maximum a posteriori (MAP) estimators are often sought. Whilst these are relatively cheap to compute, and have an attractive variational formulation, a key drawback is their lack of invariance under change of parameterization; it is important to study MAP estimators, however, because they provide a link with classical optimization approaches to inverse problems and the Bayesian link may be used to improve upon classical optimization approaches. The lack of invariance of MAP estimators under change of parameterization is a particularly significant issue when hierarchical priors are employed to learn hyperparameters. In this paper we study the effect of the choice of parameterization on MAP estimators when a conditionally Gaussian hierarchical prior distribution is employed. Specifically we consider the centred parameterization, the natural parameterization in which the unknown state is solved for directly, and the noncentred parameterization, which works with a whitened Gaussian as the unknown state variable, and arises naturally when considering dimension-robust MCMC algorithms; MAP estimation is well-defined in the nonparametric setting only for the noncentred parameterization. However, we show that MAP estimates based on the noncentred parameterization are not consistent as estimators of hyperparameters; conversely, we show that limits of finite-dimensional centred MAP estimators are consistent as the dimension tends to infinity. We also consider empirical Bayesian hyperparameter estimation, show consistency of these estimates, and demonstrate that they are more robust with respect to noise than centred MAP estimates. An underpinning concept throughout is that hyperparameters may only be recovered up to measure equivalence, a well-known phenomenon in the context of the Ornstein–Uhlenbeck process. The applicability of the results is demonstrated concretely with the study of hierarchical Whittle–Matérn and ARD priors. 
    more » « less
  4. null (Ed.)
    We study convex empirical risk minimization for high-dimensional inference in binary linear classification under both discriminative binary linear models, as well as generative Gaussian-mixture models. Our first result sharply predicts the statistical performance of such estimators in the proportional asymptotic regime under isotropic Gaussian features. Importantly, the predictions hold for a wide class of convex loss functions, which we exploit to prove bounds on the best achievable performance. Notably, we show that the proposed bounds are tight for popular binary models (such as signed and logistic) and for the Gaussian-mixture model by constructing appropriate loss functions that achieve it. Our numerical simulations suggest that the theory is accurate even for relatively small problem dimensions and that it enjoys a certain universality property. 
    more » « less
  5. Since its development, the minimax framework has been one of the cornerstones of theoretical statistics, and has contributed to the popularity of many well-known estimators, such as the regularized M-estimators for high-dimensional problems. In this paper, we will first show through the example of sparse Gaussian sequence model, that the theoretical results under the classical minimax framework are insufficient for explaining empirical observations. In particular, both hard and soft thresholding estimators are (asymptotically) minimax, however, in practice they often exhibit sub-optimal performances at various signal-to-noise ratio (SNR) levels. The first contribution of this paper is to demonstrate that this issue can be resolved if the signal-to-noise ratio is taken into account in the construction of the parameter space. We call the resulting minimax framework the signal-to-noise ratio aware minimaxity. The second contribution of this paper is to showcase how one can use higher-order asymptotics to obtain accurate approximations of the SNR-aware minimax risk and discover minimax estimators. The theoretical findings obtained from this refined minimax framework provide new insights and practical guidance for the estimation of sparse signals. 
    more » « less