skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On boosting the power of Chatterjee’s rank correlation
Summary The ingenious approach of Chatterjee (2021) to estimate a measure of dependence first proposed by Dette et al. (2013) based on simple rank statistics has quickly caught attention. This measure of dependence has the appealing property of being between 0 and 1, and being 0 or 1 if and only if the corresponding pair of random variables is independent or one is a measurable function of the other almost surely. However, more recent studies (Cao & Bickel 2020; Shi et al. 2022b) showed that independence tests based on Chatterjee’s rank correlation are unfortunately rate inefficient against various local alternatives and they call for variants. We answer this call by proposing an improvement to Chatterjee’s rank correlation that still consistently estimates the same dependence measure, but provably achieves near-parametric efficiency in testing against Gaussian rotation alternatives. This is possible by incorporating many right nearest neighbours in constructing the correlation coefficients. We thus overcome the ‘ only one disadvantage’ of Chatterjee’s rank correlation (Chatterjee, 2021, § 7).  more » « less
Award ID(s):
2019363 2210019
PAR ID:
10413649
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrika
Volume:
110
Issue:
2
ISSN:
0006-3444
Format(s):
Medium: X Size: p. 283-299
Size(s):
p. 283-299
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary Chatterjee (2021) introduced a simple new rank correlation coefficient that has attracted much attention recently. The coefficient has the unusual appeal that it not only estimates a population quantity first proposed by Dette et al. (2013) that is zero if and only if the underlying pair of random variables is independent, but also is asymptotically normal under independence. This paper compares Chatterjee’s new correlation coefficient with three established rank correlations that also facilitate consistent tests of independence, namely Hoeffding’s $$D$$, Blum–Kiefer–Rosenblatt’s $$R$$, and Bergsma–Dassios–Yanagimoto’s $$\tau^*$$. We compare the computational efficiency of these rank correlation coefficients in light of recent advances, and investigate their power against local rotation and mixture alternatives. Our main results show that Chatterjee’s coefficient is unfortunately rate-suboptimal compared to $$D$$, $$R$$ and $$\tau^*$$. The situation is more subtle for a related earlier estimator of Dette et al. (2013). These results favour $$D$$, $$R$$ and $$\tau^*$$ over Chatterjee’s new correlation coefficient for the purpose of testing independence. 
    more » « less
  2. In his seminal work, Chatterjee (Citation2021) introduced a novel correlation measure that is distribution-free, asymptotically normal, and consistent against all alternatives. In this article, we study the probabilistic relationships between Chatterjee’s correlation and the widely used Spearman’s correlation. We show that, under independence, the two sample-based correlations are asymptotically joint normal and asymptotically independent. Under dependence, the magnitudes of two correlations can be substantially different. We establish some extreme cases featuring large differences between these two correlations. Motivated by these findings, a new independence test is proposed by combining Chatterjee’s and Spearman’s correlations into a maximal strength measure of variable association. Our simulation study and real-data application show the good sensitivity of the new test to different correlation patterns. 
    more » « less
  3. In his seminal work, Chatterjee (2021) introduced a novel correlation measure which is distribution-free, asymptotically normal, and consistent against all alternatives. In this paper, we study the probabilistic relationships between Chatterjee's correlation and the widely used Spearman's correlation. We show that, under independence, the two sample-based correlations are asymptotically joint normal and asymptotically independent. Under dependence, the magnitudes of two correlations can be substantially different. We establish some extremal cases featuring large differences between these two correlations. Motivated by these findings, a new independence test is proposed by combining Chatterjee's and Spearman's correlations into a maximal strength measure of variable association. Our simulation study and real data application show the good sensitivity of the new test to different correlation patterns. 
    more » « less
  4. Abstract While researchers commonly use the bootstrap to quantify the uncertainty of an estimator, it has been noticed that the standard bootstrap, in general, does not work for Chatterjee’s rank correlation. In this paper, we provide proof of this issue under an additional independence assumption, and complement our theory with simulation evidence for general settings. Chatterjee’s rank correlation thus falls into a category of statistics that are asymptotically normal, but bootstrap inconsistent. Valid inferential methods in this case are Chatterjee’s original proposal for testing independence and the analytic asymptotic variance estimator of Lin & Han (2022) for more general purposes. [Received on 5 April 2023. Editorial decision on 10 January 2024] 
    more » « less
  5. Etessami, Kousha; Feige, Uriel; Puppis, Gabriele (Ed.)
    This work continues the study of linear error correcting codes against adversarial insertion deletion errors (insdel errors). Previously, the work of Cheng, Guruswami, Haeupler, and Li [Kuan Cheng et al., 2021] showed the existence of asymptotically good linear insdel codes that can correct arbitrarily close to 1 fraction of errors over some constant size alphabet, or achieve rate arbitrarily close to 1/2 even over the binary alphabet. As shown in [Kuan Cheng et al., 2021], these bounds are also the best possible. However, known explicit constructions in [Kuan Cheng et al., 2021], and subsequent improved constructions by Con, Shpilka, and Tamo [Con et al., 2022] all fall short of meeting these bounds. Over any constant size alphabet, they can only achieve rate < 1/8 or correct < 1/4 fraction of errors; over the binary alphabet, they can only achieve rate < 1/1216 or correct < 1/54 fraction of errors. Apparently, previous techniques face inherent barriers to achieve rate better than 1/4 or correct more than 1/2 fraction of errors. In this work we give new constructions of such codes that meet these bounds, namely, asymptotically good linear insdel codes that can correct arbitrarily close to 1 fraction of errors over some constant size alphabet, and binary asymptotically good linear insdel codes that can achieve rate arbitrarily close to 1/2. All our constructions are efficiently encodable and decodable. Our constructions are based on a novel approach of code concatenation, which embeds the index information implicitly into codewords. This significantly differs from previous techniques and may be of independent interest. Finally, we also prove the existence of linear concatenated insdel codes with parameters that match random linear codes, and propose a conjecture about linear insdel codes. 
    more » « less