skip to main content


Title: Debiased machine learning of global and local parameters using regularized Riesz representers
Summary

We provide adaptive inference methods, based on $\ell _1$ regularization, for regular (semiparametric) and nonregular (nonparametric) linear functionals of the conditional expectation function. Examples of regular functionals include average treatment effects, policy effects, and derivatives. Examples of nonregular functionals include average treatment effects, policy effects, and derivatives conditional on a covariate subvector fixed at a point. We construct a Neyman orthogonal equation for the target parameter that is approximately invariant to small perturbations of the nuisance parameters. To achieve this property, we include the Riesz representer for the functional as an additional nuisance parameter. Our analysis yields weak ‘double sparsity robustness’: either the approximation to the regression or the approximation to the representer can be ‘completely dense’ as long as the other is sufficiently ‘sparse’. Our main results are nonasymptotic and imply asymptotic uniform validity over large classes of models, translating into honest confidence bands for both global and local parameters.

 
more » « less
Award ID(s):
1757140
PAR ID:
10371561
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
The Econometrics Journal
Volume:
25
Issue:
3
ISSN:
1368-4221
Page Range / eLocation ID:
p. 576-601
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Many causal and policy effects of interest are defined by linear functionals of high-dimensional or non-parametric regression functions. Root-n consistent and asymptotically normal estimation of the object of interest requires debiasing to reduce the effects of regularization and/or model selection on the object of interest. Debiasing is typically achieved by adding a correction term to the plug-in estimator of the functional, which leads to properties such as semi-parametric efficiency, double robustness, and Neyman orthogonality. We implement an automatic debiasing procedure based on automatically learning the Riesz representation of the linear functional using Neural Nets and Random Forests. Our method only relies on black-box evaluation oracle access to the linear functional and does not require knowledge of its analytic form. We propose a multitasking Neural Net debiasing method with stochastic gradient descent minimization of a combined Riesz representer and regression loss, while sharing representation layers for the two functions. We also propose a Random Forest method which learns a locally linear representation of the Riesz function. Even though our method applies to arbitrary functionals, we experimentally find that it performs well compared to the state of art neural net based algorithm of Shi et al. (2019) for the case of the average treatment effect functional. We also evaluate our method on the problem of estimating average marginal effects with continuous treatments, using semi-synthetic data of gasoline price changes on gasoline demand. Code available at github.com/victor5as/RieszLearning. 
    more » « less
  2. Many causal and structural effects depend on regressions. Examples include policy effects, average derivatives, regression decompositions, average treatment effects, causal mediation, and parameters of economic structural models. The regressions may be high‐dimensional, making machine learning useful. Plugging machine learners into identifying equations can lead to poor inference due to bias from regularization and/or model selection. This paper gives automatic debiasing for linear and nonlinear functions of regressions. The debiasing is automatic in using Lasso and the function of interest without the full form of the bias correction. The debiasing can be applied to any regression learner, including neural nets, random forests, Lasso, boosting, and other high‐dimensional methods. In addition to providing the bias correction, we give standard errors that are robust to misspecification, convergence rates for the bias correction, and primitive conditions for asymptotic inference for estimators of a variety of estimators of structural and causal effects. The automatic debiased machine learning is used to estimate the average treatment effect on the treated for the NSW job training data and to estimate demand elasticities from Nielsen scanner data while allowing preferences to be correlated with prices and income. 
    more » « less
  3. There are many economic parameters that depend on nonparametric first steps. Examples include games, dynamic discrete choice, average exact consumer surplus, and treatment effects. Often estimators of these parameters are asymptotically equivalent to a sample average of an object referred to as the influence function. The influence function is useful in local policy analysis, in evaluating local sensitivity of estimators, and constructing debiased machine learning estimators. We show that the influence function is a Gateaux derivative with respect to a smooth deviation evaluated at a point mass. This result generalizes the classic Von Mises (1947) and Hampel (1974) calculation to estimators that depend on smooth nonparametric first steps. We give explicit influence functions for first steps that satisfy exogenous or endogenous orthogonality conditions. We use these results to generalize the omitted variable bias formula for regression to policy analysis for and sensitivity to structural changes. We apply this analysis and find no sensitivity to endogeneity of average equivalent variation estimates in a gasoline demand application. 
    more » « less
  4. Gustau Camps-Valls ; Francisco J. R. Ruiz ; Isabel Valera (Ed.)
    Robins et al. (2008) introduced a class of influence functions (IFs) which could be used to obtain doubly robust moment functions for the corresponding parameters. However, that class does not include the IF of parameters for which the nuisance functions are solutions to integral equations. Such parameters are particularly important in the field of causal inference, specifically in the recently proposed proximal causal inference framework of Tchetgen Tchetgen et al. (2020), which allows for estimating the causal effect in the presence of latent confounders. In this paper, we first extend the class of Robins et al. to include doubly robust IFs in which the nuisance functions are solutions to integral equations. Then we demonstrate that the double robustness property of these IFs can be leveraged to construct estimating equations for the nuisance functions, which enables us to solve the integral equations without resorting to parametric models. We frame the estimation of the nuisance functions as a minimax optimization problem. We provide convergence rates for the nuisance functions and conditions required for asymptotic linearity of the estimator of the parameter of interest. The experiment results demonstrate that our proposed methodology leads to robust and high-performance estimators for average causal effect in the proximal causal inference framework. 
    more » « less
  5. Shape restrictions have played a central role in economics as both testable implications of theory and sufficient conditions for obtaining informative counterfactual predictions. In this paper, we provide a general procedure for inference under shape restrictions in identified and partially identified models defined by conditional moment restrictions. Our test statistics and proposed inference methods are based on the minimum of the generalized method of moments (GMM) objective function with and without shape restrictions. Uniformly valid critical values are obtained through a bootstrap procedure that approximates a subset of the true local parameter space. In an empirical analysis of the effect of childbearing on female labor supply, we show that employing shape restrictions in linear instrumental variables (IV) models can lead to shorter confidence regions for both local and average treatment effects. Other applications we discuss include inference for the variability of quantile IV treatment effects and for bounds on average equivalent variation in a demand model with general heterogeneity.

     
    more » « less