skip to main content


Title: Locally Robust Semiparametric Estimation
Many economic and causal parameters depend on nonparametric or high dimensional first steps. We give a general construction of locally robust/orthogonal moment functions for GMM, where first steps have no effect, locally, on average moment functions. Using these orthogonal moments reduces model selection and regularization bias, as is important in many applications, especially for machine learning first steps. Also, associated standard errors are robust to misspecification when there is the same number of moment functions as parameters of interest. We use these orthogonal moments and cross-fitting to construct debiased machine learning estimators of functions of high dimensional conditional quantiles and of dynamic discrete choice parameters with high dimensional state variables. We show that additional first steps needed for the orthogonal moment functions have no effect, globally, on average orthogonal moment functions. We give a general approach to estimating those additional first steps.We characterize double robustness and give a variety of new doubly robust moment functions.We give general and simple regularity conditions for asymptotic theory.  more » « less
Award ID(s):
1757140
NSF-PAR ID:
10469095
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Imbens, G.
Publisher / Repository:
Econometric Society
Date Published:
Journal Name:
Econometrica
Volume:
90
Issue:
4
ISSN:
1468-0262
Page Range / eLocation ID:
https://doi.org/10.3982/ECTA16294
Subject(s) / Keyword(s):
Local robustness, orthogonal moments, double robustness, semiparametric estimation, bias, GMM.
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. There are many economic parameters that depend on nonparametric first steps. Examples include games, dynamic discrete choice, average exact consumer surplus, and treatment effects. Often estimators of these parameters are asymptotically equivalent to a sample average of an object referred to as the influence function. The influence function is useful in local policy analysis, in evaluating local sensitivity of estimators, and constructing debiased machine learning estimators. We show that the influence function is a Gateaux derivative with respect to a smooth deviation evaluated at a point mass. This result generalizes the classic Von Mises (1947) and Hampel (1974) calculation to estimators that depend on smooth nonparametric first steps. We give explicit influence functions for first steps that satisfy exogenous or endogenous orthogonality conditions. We use these results to generalize the omitted variable bias formula for regression to policy analysis for and sensitivity to structural changes. We apply this analysis and find no sensitivity to endogeneity of average equivalent variation estimates in a gasoline demand application. 
    more » « less
  2. Many causal and structural effects depend on regressions. Examples include policy effects, average derivatives, regression decompositions, average treatment effects, causal mediation, and parameters of economic structural models. The regressions may be high‐dimensional, making machine learning useful. Plugging machine learners into identifying equations can lead to poor inference due to bias from regularization and/or model selection. This paper gives automatic debiasing for linear and nonlinear functions of regressions. The debiasing is automatic in using Lasso and the function of interest without the full form of the bias correction. The debiasing can be applied to any regression learner, including neural nets, random forests, Lasso, boosting, and other high‐dimensional methods. In addition to providing the bias correction, we give standard errors that are robust to misspecification, convergence rates for the bias correction, and primitive conditions for asymptotic inference for estimators of a variety of estimators of structural and causal effects. The automatic debiased machine learning is used to estimate the average treatment effect on the treated for the NSW job training data and to estimate demand elasticities from Nielsen scanner data while allowing preferences to be correlated with prices and income. 
    more » « less
  3. Gustau Camps-Valls ; Francisco J. R. Ruiz ; Isabel Valera (Ed.)
    Robins et al. (2008) introduced a class of influence functions (IFs) which could be used to obtain doubly robust moment functions for the corresponding parameters. However, that class does not include the IF of parameters for which the nuisance functions are solutions to integral equations. Such parameters are particularly important in the field of causal inference, specifically in the recently proposed proximal causal inference framework of Tchetgen Tchetgen et al. (2020), which allows for estimating the causal effect in the presence of latent confounders. In this paper, we first extend the class of Robins et al. to include doubly robust IFs in which the nuisance functions are solutions to integral equations. Then we demonstrate that the double robustness property of these IFs can be leveraged to construct estimating equations for the nuisance functions, which enables us to solve the integral equations without resorting to parametric models. We frame the estimation of the nuisance functions as a minimax optimization problem. We provide convergence rates for the nuisance functions and conditions required for asymptotic linearity of the estimator of the parameter of interest. The experiment results demonstrate that our proposed methodology leads to robust and high-performance estimators for average causal effect in the proximal causal inference framework. 
    more » « less
  4. This paper introduces a new identification‐ and singularity‐robust conditional quasi‐likelihood ratio (SR‐CQLR) test and a new identification‐ and singularity‐robust Anderson and Rubin (1949) (SR‐AR) test for linear and nonlinear moment condition models. Both tests are very fast to compute. The paper shows that the tests have correct asymptotic size and are asymptotically similar (in a uniform sense) under very weak conditions. For example, in i.i.d. scenarios, all that is required is that the moment functions and their derivatives have 2 +  γ bounded moments for some γ  > 0. No conditions are placed on the expected Jacobian of the moment functions, on the eigenvalues of the variance matrix of the moment functions, or on the eigenvalues of the expected outer product of the (vectorized) orthogonalized sample Jacobian of the moment functions. The SR‐CQLR test is shown to be asymptotically efficient in a GMM sense under strong and semi‐strong identification (for all k  ≥  p , where k and p are the numbers of moment conditions and parameters, respectively). The SR‐CQLR test reduces asymptotically to Moreira's CLR test when p  = 1 in the homoskedastic linear IV model. The same is true for p  ≥ 2 in most, but not all, identification scenarios. We also introduce versions of the SR‐CQLR and SR‐AR tests for subvector hypotheses and show that they have correct asymptotic size under the assumption that the parameters not under test are strongly identified. The subvector SR‐CQLR test is shown to be asymptotically efficient in a GMM sense under strong and semi‐strong identification. 
    more » « less
  5. We give a simple, fast algorithm for hyperparameter optimization inspired by techniques from the analysis of Boolean functions. We focus on the high-dimensional regime where the canonical example is training a neural network with a large number of hyperparameters. The algorithm --- an iterative application of compressed sensing techniques for orthogonal polynomials --- requires only uniform sampling of the hyperparameters and is thus easily parallelizable. Experiments for training deep neural networks on Cifar-10 show that compared to state-of-the-art tools (e.g., Hyperband and Spearmint), our algorithm finds significantly improved solutions, in some cases better than what is attainable by hand-tuning. In terms of overall running time (i.e., time required to sample various settings of hyperparameters plus additional computation time), we are at least an order of magnitude faster than Hyperband and Bayesian Optimization. We also outperform Random Search 8x. Additionally, our method comes with provable guarantees and yields the first improvements on the sample complexity of learning decision trees in over two decades. In particular, we obtain the first quasi-polynomial time algorithm for learning noisy decision trees with polynomial sample complexity. 
    more » « less