skip to main content


Title: High-dimensional linear models with many endogenous variables
High-dimensional linear models with endogenous variables play an increasingly important role in the recent econometric literature. In this work, we allow for models with many endogenous variables and make use of many instrumental variables to achieve identification. Because of the high-dimensionality in the structural equation, constructing honest confidence regions with asymptotically correct coverage is non-trivial. Our main contribution is to propose estimators and confidence regions that achieve this goal. Our approach relies on moment conditions that satisfy the usual instrument orthogonality condition but also have an additional orthogonality property with respect to specific linear combinations of the endogenous variables which are treated as nuisance parameters. We propose new pivotal procedures for estimating the high-dimensional nuisance parameters which appear in our formulation. We use a multiplier bootstrap procedure to compute critical values and establish its validity for achieving simultaneously valid confidence regions for a potentially high-dimensional set of endogenous variable coefficients.  more » « less
Award ID(s):
1757140
PAR ID:
10472326
Author(s) / Creator(s):
; ;
Publisher / Repository:
ScienceDirect
Date Published:
Journal Name:
Journal of Econometrics
Volume:
228
Issue:
1
ISSN:
0304-4076
Page Range / eLocation ID:
4 to 26
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Summary High-dimensional statistical inference with general estimating equations is challenging and remains little explored. We study two problems in the area: confidence set estimation for multiple components of the model parameters, and model specifications tests. First, we propose to construct a new set of estimating equations such that the impact from estimating the high-dimensional nuisance parameters becomes asymptotically negligible. The new construction enables us to estimate a valid confidence region by empirical likelihood ratio. Second, we propose a test statistic as the maximum of the marginal empirical likelihood ratios to quantify data evidence against the model specification. Our theory establishes the validity of the proposed empirical likelihood approaches, accommodating over-identification and exponentially growing data dimensionality. Numerical studies demonstrate promising performance and potential practical benefits of the new methods. 
    more » « less
  2. In spite of its urgent importance in the era of big data, testing high-dimensional parameters in generalized linear models (GLMs) in the presence of high-dimensional nuisance parameters has been largely under-studied, especially with regard to constructing powerful tests for general (and unknown) alternatives. Most existing tests are powerful only against certain alternatives and may yield incorrect Type I error rates under high-dimensional nuisance parameter situations. In this paper, we propose the adaptive interaction sum of powered score (aiSPU) test in the framework of penalized regression with a non-convex penalty, called truncated Lasso penalty (TLP), which can maintain correct Type I error rates while yielding high statistical power across a wide range of alternatives. To calculate its p-values analytically, we derive its asymptotic null distribution. Via simulations, its superior finite-sample performance is demonstrated over several representative existing methods. In addition, we apply it and other representative tests to an Alzheimer’s Disease Neuroimaging Initiative (ADNI) data set, detecting possible gene-gender interactions for Alzheimer’s disease. We also put R package “aispu” implementing the proposed test on GitHub. 
    more » « less
  3. Abstract

    Varying coefficient models have been used to explore dynamic effects in many scientific areas, such as in medicine, finance, and epidemiology. As most existing models ignore the existence of zero regions, we propose a new soft-thresholded varying coefficient model, where the coefficient functions are piecewise smooth with zero regions. Our new modeling approach enables us to perform variable selection, detect the zero regions of selected variables, obtain point estimates of the varying coefficients with zero regions, and construct a new type of sparse confidence intervals that accommodate zero regions. We prove the asymptotic properties of the estimator, based on which we draw statistical inference. Our simulation study reveals that the proposed sparse confidence intervals achieve the desired coverage probability. We apply the proposed method to analyze a large-scale preoperative opioid study.

     
    more » « less
  4. We propose a framework for analyzing the sensitivity of counterfactuals to parametric assumptions about the distribution of latent variables in structural models. In particular, we derive bounds on counterfactuals as the distribution of latent variables spans nonparametric neighborhoods of a given parametric specification while other “structural” features of the model are maintained. Our approach recasts the infinite‐dimensional problem of optimizing the counterfactual with respect to the distribution of latent variables (subject to model constraints) as a finite‐dimensional convex program. We also develop an MPEC version of our method to further simplify computation in models with endogenous parameters (e.g., value functions) defined by equilibrium constraints. We propose plug‐in estimators of the bounds and two methods for inference. We also show that our bounds converge to the sharp nonparametric bounds on counterfactuals as the neighborhood size becomes large. To illustrate the broad applicability of our procedure, we present empirical applications to matching models with transferable utility and dynamic discrete choice models. 
    more » « less
  5. Summary

    We provide adaptive inference methods, based on $\ell _1$ regularization, for regular (semiparametric) and nonregular (nonparametric) linear functionals of the conditional expectation function. Examples of regular functionals include average treatment effects, policy effects, and derivatives. Examples of nonregular functionals include average treatment effects, policy effects, and derivatives conditional on a covariate subvector fixed at a point. We construct a Neyman orthogonal equation for the target parameter that is approximately invariant to small perturbations of the nuisance parameters. To achieve this property, we include the Riesz representer for the functional as an additional nuisance parameter. Our analysis yields weak ‘double sparsity robustness’: either the approximation to the regression or the approximation to the representer can be ‘completely dense’ as long as the other is sufficiently ‘sparse’. Our main results are nonasymptotic and imply asymptotic uniform validity over large classes of models, translating into honest confidence bands for both global and local parameters.

     
    more » « less