Given a random sample of size n from a p dimensional random vector, we are interested in testing whether the p components of the random vector are mutually independent. This is the so-called complete independence test. In the multivariate normal case, it is equivalent to testing whether the correlation matrix is an identity matrix. In this paper, we propose a one-sided empirical likelihood method for the complete independence test based on squared sample correlation coefficients. The limiting distribution for our one-sided empirical likelihood test statistic is proved to be Z^2I(Z > 0) when both n and p tend to infinity, where Z is a standard normal random variable. In order to improve the power of the empirical likelihood test statistic, we also introduce a rescaled empirical likelihood test statistic. We carry out an extensive simulation study to compare the performance of the rescaled empirical likelihood method and two other statistics.
more »
« less
Empirical likelihood test for a large-dimensional mean vector
Summary This paper is concerned with empirical likelihood inference on the population mean when the dimension $$p$$ and the sample size $$n$$ satisfy $$p/n\rightarrow c\in [1,\infty)$$. As shown in Tsao (2004), the empirical likelihood method fails with high probability when $p/n>1/2$ because the convex hull of the $$n$$ observations in $$\mathbb{R}^p$$ becomes too small to cover the true mean value. Moreover, when $p> n$, the sample covariance matrix becomes singular, and this results in the breakdown of the first sandwich approximation for the log empirical likelihood ratio. To deal with these two challenges, we propose a new strategy of adding two artificial data points to the observed data. We establish the asymptotic normality of the proposed empirical likelihood ratio test. The proposed test statistic does not involve the inverse of the sample covariance matrix. Furthermore, its form is explicit, so the test can easily be carried out with low computational cost. Our numerical comparison shows that the proposed test outperforms some existing tests for high-dimensional mean vectors in terms of power. We also illustrate the proposed procedure with an empirical analysis of stock data.
more »
« less
- PAR ID:
- 10217333
- Date Published:
- Journal Name:
- Biometrika
- Volume:
- 107
- Issue:
- 3
- ISSN:
- 0006-3444
- Page Range / eLocation ID:
- 591 to 607
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Covariance matrices are fundamental to the analysis and forecast of economic, physical and biological systems. Although the eigenvalues $$\{\lambda _i\}$$ and eigenvectors $$\{\boldsymbol{u}_i\}$$ of a covariance matrix are central to such endeavours, in practice one must inevitably approximate the covariance matrix based on data with finite sample size $$n$$ to obtain empirical eigenvalues $$\{\tilde{\lambda }_i\}$$ and eigenvectors $$\{\tilde{\boldsymbol{u}}_i\}$$, and therefore understanding the error so introduced is of central importance. We analyse eigenvector error $$\|\boldsymbol{u}_i - \tilde{\boldsymbol{u}}_i \|^2$$ while leveraging the assumption that the true covariance matrix having size $$p$$ is drawn from a matrix ensemble with known spectral properties—particularly, we assume the distribution of population eigenvalues weakly converges as $$p\to \infty $$ to a spectral density $$\rho (\lambda )$$ and that the spacing between population eigenvalues is similar to that for the Gaussian orthogonal ensemble. Our approach complements previous analyses of eigenvector error that require the full set of eigenvalues to be known, which can be computationally infeasible when $$p$$ is large. To provide a scalable approach for uncertainty quantification of eigenvector error, we consider a fixed eigenvalue $$\lambda $$ and approximate the distribution of the expected square error $$r= \mathbb{E}\left [\| \boldsymbol{u}_i - \tilde{\boldsymbol{u}}_i \|^2\right ]$$ across the matrix ensemble for all $$\boldsymbol{u}_i$$ associated with $$\lambda _i=\lambda $$. We find, for example, that for sufficiently large matrix size $$p$$ and sample size $n> p$, the probability density of $$r$$ scales as $1/nr^2$. This power-law scaling implies that the eigenvector error is extremely heterogeneous—even if $$r$$ is very small for most eigenvectors, it can be large for others with non-negligible probability. We support this and further results with numerical experiments.more » « less
-
III, Hal Daumé (Ed.)We build a Bayesian contextual classification model using an optimistic score ratio for robust binary classification when there is limited information on the class-conditional, or contextual, distribution. The optimistic score searches for the distribution that is most plausible to explain the observed outcomes in the testing sample among all distributions belonging to the contextual ambiguity set which is prescribed using a limited structural constraint on the mean vector and the covariance matrix of the underlying contextual distribution. We show that the Bayesian classifier using the optimistic score ratio is conceptually attractive, delivers solid statistical guarantees, and is computationally tractable. We showcase the power of the proposed optimistic score ratio classifier on both synthetic and empirical data.more » « less
-
null (Ed.)Summary High-dimensional statistical inference with general estimating equations is challenging and remains little explored. We study two problems in the area: confidence set estimation for multiple components of the model parameters, and model specifications tests. First, we propose to construct a new set of estimating equations such that the impact from estimating the high-dimensional nuisance parameters becomes asymptotically negligible. The new construction enables us to estimate a valid confidence region by empirical likelihood ratio. Second, we propose a test statistic as the maximum of the marginal empirical likelihood ratios to quantify data evidence against the model specification. Our theory establishes the validity of the proposed empirical likelihood approaches, accommodating over-identification and exponentially growing data dimensionality. Numerical studies demonstrate promising performance and potential practical benefits of the new methods.more » « less
-
Abstract The paper introduces a new kernel-based Maximum Mean Discrepancy (MMD) statistic for measuring the distance between two distributions given finitely many multivariate samples. When the distributions are locally low-dimensional, the proposed test can be made more powerful to distinguish certain alternatives by incorporating local covariance matrices and constructing an anisotropic kernel. The kernel matrix is asymmetric; it computes the affinity between $$n$$ data points and a set of $$n_R$$ reference points, where $$n_R$$ can be drastically smaller than $$n$$. While the proposed statistic can be viewed as a special class of Reproducing Kernel Hilbert Space MMD, the consistency of the test is proved, under mild assumptions of the kernel, as long as $$\|p-q\| \sqrt{n} \to \infty $$, and a finite-sample lower bound of the testing power is obtained. Applications to flow cytometry and diffusion MRI datasets are demonstrated, which motivate the proposed approach to compare distributions.more » « less
An official website of the United States government

