This article develops empirical likelihood methodology for a class of long range dependent processes driven by a stationary Gaussian process. We consider population parameters that are defined by estimating equations in the time domain. It is shown that the standard block empirical likelihood (BEL) method, with a suitable scaling, has a non‐standard limit distribution based on a multiple Wiener–Itô integral. Unlike the short memory time series case, the scaling constant involves unknown population quantities that may be difficult to estimate. Alternative versions of the empirical likelihood method, involving the expansive BEL (EBEL) methods are considered. It is shown that the EBEL renditions do not require an explicit scaling and, therefore, remove this undesirable feature of the standard BEL. However, the limit law involves the long memory parameter, which may be estimated from the data. Results from a moderately large simulation study on finite sample properties of tests and confidence intervals based on different empirical likelihood methods are also reported.
The upper bounds on the coverage probabilities of the confidence regions based on blockwise empirical likelihood and non-standard expansive empirical likelihood methods for time series data are investigated via studying the probability of violating the convex hull constraint. The large sample bounds are derived on the basis of the pivotal limit of the blockwise empirical log-likelihood ratio obtained under fixed b asymptotics, which has recently been shown to provide a more accurate approximation to the finite sample distribution than the conventional χ2-approximation. Our theoretical and numerical findings suggest that both the finite sample and the large sample upper bounds for coverage probabilities are strictly less than 1 and the blockwise empirical likelihood confidence region can exhibit serious undercoverage when the dimension of moment conditions is moderate or large, the time series dependence is positively strong or the block size is large relative to the sample size. A similar finite sample coverage problem occurs for non-standard expansive empirical likelihood. To alleviate the coverage bound problem, we propose to penalize both empirical likelihood methods by relaxing the convex hull constraint. Numerical simulations and data illustrations demonstrate the effectiveness of our proposed remedies in terms of delivering confidence sets with more accurate coverage. Some technical details and additional simulation results are included in on-line supplemental material.
more » « less- PAR ID:
- 10397460
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Journal of the Royal Statistical Society Series B: Statistical Methodology
- Volume:
- 78
- Issue:
- 2
- ISSN:
- 1369-7412
- Page Range / eLocation ID:
- p. 395-421
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Consider reconstructing a signal x by minimizing a weighted sum of a convex differentiable negative log-likelihood (NLL) (data-fidelity) term and a convex regularization term that imposes a convex-set constraint on x and enforces its sparsity using ℓ1-norm analysis regularization.We compute upper bounds on the regularization tuning constant beyond which the regularization term overwhelmingly dominates the NLL term so that the set of minimum points of the objective function does not change. Necessary and sufficient conditions for irrelevance of sparse signal regularization and a condition for the existence of finite upper bounds are established. We formulate an optimization problem for finding these bounds when the regularization term can be globally minimized by a feasible x and also develop an alternating direction method of multipliers (ADMM) type method for their computation. Simulation examples show that the derived and empirical bounds match.more » « less
-
null (Ed.)Summary This paper is concerned with empirical likelihood inference on the population mean when the dimension $p$ and the sample size $n$ satisfy $p/n\rightarrow c\in [1,\infty)$. As shown in Tsao (2004), the empirical likelihood method fails with high probability when $p/n>1/2$ because the convex hull of the $n$ observations in $\mathbb{R}^p$ becomes too small to cover the true mean value. Moreover, when $p> n$, the sample covariance matrix becomes singular, and this results in the breakdown of the first sandwich approximation for the log empirical likelihood ratio. To deal with these two challenges, we propose a new strategy of adding two artificial data points to the observed data. We establish the asymptotic normality of the proposed empirical likelihood ratio test. The proposed test statistic does not involve the inverse of the sample covariance matrix. Furthermore, its form is explicit, so the test can easily be carried out with low computational cost. Our numerical comparison shows that the proposed test outperforms some existing tests for high-dimensional mean vectors in terms of power. We also illustrate the proposed procedure with an empirical analysis of stock data.more » « less
-
For a set P of n points in the unit ball b ⊆ R d , consider the problem of finding a small subset T ⊆ P such that its convex-hull ε-approximates the convex-hull of the original set. Specifically, the Hausdorff distance between the convex hull of T and the convex hull of P should be at most ε. We present an efficient algorithm to compute such an ε ′ -approximation of size kalg, where ε ′ is a function of ε, and kalg is a function of the minimum size kopt of such an ε-approximation. Surprisingly, there is no dependence on the dimension d in either of the bounds. Furthermore, every point of P can be ε- approximated by a convex-combination of points of T that is O(1/ε2 )-sparse. Our result can be viewed as a method for sparse, convex autoencoding: approximately representing the data in a compact way using sparse combinations of a small subset T of the original data. The new algorithm can be kernelized, and it preserves sparsity in the original input.more » « less
-
Abstract Modeling and drawing inference on the joint associations between single‐nucleotide polymorphisms and a disease has sparked interest in genome‐wide associations studies. In the motivating Boston Lung Cancer Survival Cohort (BLCSC) data, the presence of a large number of single nucleotide polymorphisms of interest, though smaller than the sample size, challenges inference on their joint associations with the disease outcome. In similar settings, we find that neither the debiased lasso approach (van de Geer et al., 2014), which assumes sparsity on the inverse information matrix, nor the standard maximum likelihood method can yield confidence intervals with satisfactory coverage probabilities for generalized linear models. Under this “large
n , divergingp ” scenario, we propose an alternative debiased lasso approach by directly inverting the Hessian matrix without imposing the matrix sparsity assumption, which further reduces bias compared to the original debiased lasso and ensures valid confidence intervals with nominal coverage probabilities. We establish the asymptotic distributions of any linear combinations of the parameter estimates, which lays the theoretical ground for drawing inference. Simulations show that the proposedrefined debiased estimating method performs well in removing bias and yields honest confidence interval coverage. We use the proposed method to analyze the aforementioned BLCSC data, a large‐scale hospital‐based epidemiology cohort study investigating the joint effects of genetic variants on lung cancer risks.