Generalized kernel distance covariance in high dimensions: non-null CLTs and power universality

Han, Qiyang; Shen, Yandi

doi:10.1093/imaiai/iaae017

Abstract Distance covariance is a popular dependence measure for two random vectors $$X$$ and $$Y$$ of possibly different dimensions and types. Recent years have witnessed concentrated efforts in the literature to understand the distributional properties of the sample distance covariance in a high-dimensional setting, with an exclusive emphasis on the null case that $$X$$ and $$Y$$ are independent. This paper derives the first non-null central limit theorem for the sample distance covariance, and the more general sample (Hilbert–Schmidt) kernel distance covariance in high dimensions, in the distributional class of $(X,Y)$ with a separable covariance structure. The new non-null central limit theorem yields an asymptotically exact first-order power formula for the widely used generalized kernel distance correlation test of independence between $$X$$ and $$Y$$. The power formula in particular unveils an interesting universality phenomenon: the power of the generalized kernel distance correlation test is completely determined by $$n\cdot \operatorname{dCor}^{2}(X,Y)/\sqrt{2}$$ in the high-dimensional limit, regardless of a wide range of choices of the kernels and bandwidth parameters. Furthermore, this separation rate is also shown to be optimal in a minimax sense. The key step in the proof of the non-null central limit theorem is a precise expansion of the mean and variance of the sample distance covariance in high dimensions, which shows, among other things, that the non-null Gaussian approximation of the sample distance covariance involves a rather subtle interplay between the dimension-to-sample ratio and the dependence between $$X$$ and $$Y$$.

More Like this