skip to main content


Title: High-dimensional empirical likelihood inference
Summary High-dimensional statistical inference with general estimating equations is challenging and remains little explored. We study two problems in the area: confidence set estimation for multiple components of the model parameters, and model specifications tests. First, we propose to construct a new set of estimating equations such that the impact from estimating the high-dimensional nuisance parameters becomes asymptotically negligible. The new construction enables us to estimate a valid confidence region by empirical likelihood ratio. Second, we propose a test statistic as the maximum of the marginal empirical likelihood ratios to quantify data evidence against the model specification. Our theory establishes the validity of the proposed empirical likelihood approaches, accommodating over-identification and exponentially growing data dimensionality. Numerical studies demonstrate promising performance and potential practical benefits of the new methods.  more » « less
Award ID(s):
1934962
PAR ID:
10288767
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Biometrika
Volume:
108
Issue:
1
ISSN:
0006-3444
Page Range / eLocation ID:
127 to 147
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. High-dimensional linear models with endogenous variables play an increasingly important role in the recent econometric literature. In this work, we allow for models with many endogenous variables and make use of many instrumental variables to achieve identification. Because of the high-dimensionality in the structural equation, constructing honest confidence regions with asymptotically correct coverage is non-trivial. Our main contribution is to propose estimators and confidence regions that achieve this goal. Our approach relies on moment conditions that satisfy the usual instrument orthogonality condition but also have an additional orthogonality property with respect to specific linear combinations of the endogenous variables which are treated as nuisance parameters. We propose new pivotal procedures for estimating the high-dimensional nuisance parameters which appear in our formulation. We use a multiplier bootstrap procedure to compute critical values and establish its validity for achieving simultaneously valid confidence regions for a potentially high-dimensional set of endogenous variable coefficients. 
    more » « less
  2. Abstract

    In this paper, we propose a flexible nested error regression small area model with high-dimensional parameter that incorporates heterogeneity in regression coefficients and variance components. We develop a new robust small area-specific estimating equations method that allows appropriate pooling of a large number of areas in estimating small area-specific model parameters. We propose a parametric bootstrap and jackknife method to estimate not only the mean squared errors but also other commonly used uncertainty measures such as standard errors and coefficients of variation. We conduct both model-based and design-based simulation experiments and real-life data analysis to evaluate the proposed methodology.

     
    more » « less
  3. This article develops empirical likelihood methodology for a class of long range dependent processes driven by a stationary Gaussian process. We consider population parameters that are defined by estimating equations in the time domain. It is shown that the standard block empirical likelihood (BEL) method, with a suitable scaling, has a non‐standard limit distribution based on a multiple Wiener–Itô integral. Unlike the short memory time series case, the scaling constant involves unknown population quantities that may be difficult to estimate. Alternative versions of the empirical likelihood method, involving the expansive BEL (EBEL) methods are considered. It is shown that the EBEL renditions do not require an explicit scaling and, therefore, remove this undesirable feature of the standard BEL. However, the limit law involves the long memory parameter, which may be estimated from the data. Results from a moderately large simulation study on finite sample properties of tests and confidence intervals based on different empirical likelihood methods are also reported.

     
    more » « less
  4. Summary

    The paper considers the problem of hypothesis testing and confidence intervals in high dimensional proportional hazards models. Motivated by a geometric projection principle, we propose a unified likelihood ratio inferential framework, including score, Wald and partial likelihood ratio statistics for hypothesis testing. Without assuming model selection consistency, we derive the asymptotic distributions of these test statistics, establish their semiparametric optimality and conduct power analysis under Pitman alternatives. We also develop new procedures to construct pointwise confidence intervals for the baseline hazard function and conditional hazard function. Simulation studies show that all tests proposed perform well in controlling type I errors. Moreover, the partial likelihood ratio test is empirically more powerful than the other tests. The methods proposed are illustrated by an example of a gene expression data set.

     
    more » « less
  5. Summary

    To construct an optimal estimating function by weighting a set of score functions, we must either know or estimate consistently the covariance matrix for the individual scores. In problems with high dimensional correlated data the estimated covariance matrix could be unreliable. The smallest eigenvalues of the covariance matrix will be the most important for weighting the estimating equations, but in high dimensions these will be poorly determined. Generalized estimating equations introduced the idea of a working correlation to minimize such problems. However, it can be difficult to specify the working correlation model correctly. We develop an adaptive estimating equation method which requires no working correlation assumptions. This methodology relies on finding a reliable approximation to the inverse of the variance matrix in the quasi-likelihood equations. We apply a multivariate generalization of the conjugate gradient method to find estimating equations that preserve the information well at fixed low dimensions. This approach is particularly useful when the estimator of the covariance matrix is singular or close to singular, or impossible to invert owing to its large size.

     
    more » « less