skip to main content

Title: Self-Discrepancy Conditional Independence Test
Tests of conditional independence (CI) of ran- dom variables play an important role in ma- chine learning and causal inference. Of partic- ular interest are kernel-based CI tests which allow us to test for independence among ran- dom variables with complex distribution func- tions. The efficacy of a CI test is measured in terms of its power and its calibratedness. We show that the Kernel CI Permutation Test (KCIPT) suffers from a loss of calibratedness as its power is increased by increasing the number of bootstraps. To address this limita- tion, we propose a novel CI test, called Self- Discrepancy Conditional Independence Test (SDCIT). SDCIT uses a test statistic that is a modified unbiased estimate of maximum mean discrepancy (MMD), the largest difference in the means of features of the given sample and its permuted counterpart in the kernel-induced Hilbert space. We present results of experi- ments that demonstrate SDCIT is, relative to the other methods: (i) competitive in terms of its power and calibratedness, outperforming other methods when the number of condition- ing variables is large; (ii) more robust with re- spect to the choice of the kernel function; and (iii) competitive in run time.  more » « less
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
Uncertainty in artificial intelligence
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recently, many regression based conditional independence (CI) test methods have been proposed to solve the problem of causal discovery. These methods provide alternatives to test CI by first removing the information of the controlling set from the two target variables, and then testing the independence between the corresponding residuals Res1 and Res2. When the residuals are linearly uncorrelated, the independence test between them is nontrivial. With the ability to calculate inner product in high-dimensional space, kernel-based methods are usually used to achieve this goal, but still consume considerable time. In this paper, we investigate the independence between two linear combinations under linear non-Gaussian structural equation model. We show that the dependence between the two residuals can be captured by the difference between the similarity of (Res1, Res2) and that of (Res1, Res3) (Res3 is generated by random permutation) in high-dimensional space. With this result, we design a new method called SCIT for CI test, where permutation test is performed to control Type I error rate. The proposed method is simpler yet more efficient and effective than the existing ones. When applied to causal discovery, the proposed method outperforms the counterparts in terms of both speed and Type II error rate, especially in the case of small sample size, which is validated by our extensive experiments on various datasets. 
    more » « less
  2. Conditional independence (CI) tests play a central role in statistical inference, machine learning, and causal discovery. Most existing CI tests assume that the samples are indepen- dently and identically distributed (i.i.d.). How- ever, this assumption often does not hold in the case of relational data. We define Relational Conditional Independence (RCI), a generaliza- tion of CI to the relational setting. We show how, under a set of structural assumptions, we can test for RCI by reducing the task of test- ing for RCI on non-i.i.d. data to the problem of testing for CI on several data sets each of which consists of i.i.d. samples. We develop Kernel Relational CI test (KRCIT), a nonpara- metric test as a practical approach to testing for RCI by relaxing the structural assumptions used in our analysis of RCI. We describe re- sults of experiments with synthetic relational data that show the benefits of KRCIT relative to traditional CI tests that don’t account for the non-i.i.d. nature of relational data. 
    more » « less
  3. We consider the problem of non-parametric Conditional Independence testing (CI testing) for continuous random variables. Given i.i.d samples from the joint distribution f (x, y, z) of continuous random vectors X, Y and Z, we determine whether X is independent Y |Z. We approach this by converting the conditional independence test into a classification problem. This allows us to harness very powerful classifiers like gradient-boosted trees and deep neural networks. These models can handle complex probability distributions and allow us to perform significantly better compared to the prior state of the art, for high-dimensional CI testing. The main technical challenge in the classification problem is the need for samples from the conditional product distribution fCI(x,y,z) = f(x|z)f(y|z)f(z) – the joint distribution if and only if X is independent Y |Z. – when given access only to i.i.d. samples from the true joint distribution f (x, y, z). To tackle this problem we propose a novel nearest neighbor bootstrap procedure and theoretically show that our generated samples are indeed close to f^{CI} in terms of total variational distance. We then develop theoretical results regarding the generalization bounds for classification for our problem, which translate into error bounds for CI testing. We provide a novel analysis of Rademacher type classification bounds in the presence of non-i.i.d near- independent samples. We empirically validate the performance of our algorithm on simulated and real datasets and show performance gains over previous methods. 
    more » « less
  4. Abstract

    The categorical Gini correlation is an alternative measure of dependence between categorical and numerical variables, which characterizes the independence of the variables. A non‐parametric test based on the categorical Gini correlation for the equality ofKdistributions is developed. By applying the jackknife empirical likelihood approach, the standard limiting chi‐squared distribution with degrees of freedom ofK − 1 is established and is used to determine the critical value andp‐value of the test. Simulation studies show that the proposed method is competitive with existing methods in terms of power of the tests in most cases. The proposed method is illustrated in an application on a real dataset.

    more » « less
  5. Kumar, Mohan (Ed.)
    Background Breastfeeding has several benefits for both mothers and their children. Despite strong evidence in support of the practice, its prevalence has remained low worldwide, particularly in Ethiopia. Therefore, this study is aimed to assess breastfeeding knowledge, attitude, and self-efficacy among mothers with index infants and young children in the rural community of Southwest Ethiopia. Methods A community-based cross-sectional study was conducted between March and April 2022 as baseline data for a cluster of randomized control trials. Multistage sample techniques followed by systematic random sampling techniques were employed. The Chi-square and Fisher’s exact probability tests were used to assess the baseline differences in the sociodemographic characteristics of the two groups. An independent sample t-test was used to determine the mean differences. Multivariate logistic regression analysis was used to evaluate the association. All tests were two-tailed, and a statistically significant association was declared at a p-value # 0.05. Results A total of 516 mothers (258 from the intervention and 258 from the control group) were interviewed. A total of 516 mothers (258 from the intervention group and 258 from the control group) were interviewed. Except for the child’s sex and age, no significant difference was observed between the intervention and control groups in terms of socio-demographic variables (p > 0.05). Independent t-tests found no significant difference between the two groups (p > 0.05) in terms of the mean score of maternal breastfeeding knowledge, attitude and self-efficacy at baseline. After adjusting for other covariates, maternal age (AOR = 1.44, 95% CI: 0.69, 3.07), educational status (AOR = 1.87, 95% CI: 0.56,2.33), occupation (AOR = 1.79, 95% CI, 1.04, 3.69), ANC (antenatal care) (AOR = 1.88, 95% CI, 1.11, 4.09), received breastfeeding information 2022 Project Survey for Award #1735038 11 (AOR = 1.69, 95% CI, 1.33, 5.04), postnatal care (PNC) (AOR = 3.85, 95% CI, 2.01, 5.77) and parity (AOR = 2.49, 95% CI, 1.08, 4.19) were significantly associated high level breastfeeding knowledge. The positive attitude was associated with maternal age (AOR = 2.41, 95% CI, 1.18, 5.67), education status (AOR = 1.79, 95% CI, 0.99,4.03), ANC (AOR = 2.07, 95% CI, 1.44,5.13), last child breastfeeding history (AOR = 1.77, 95% CI, 1.21,4.88) and high level of breastfeeding knowledge (AOR = 2.02, 95% CI, 1.56,4.04). Finally, high breastfeeding self-efficacy was associated with ANC (AOR = 1.88, 95% CI 1.04,3.83), parity (AOR = 4.05, 95% CI, 1.49, 5.03) and high knowledge level (AOR = 1.69, 95% CI, 0.89,2.85). Conclusions The study concluded that mothers in both the intervention and control groups have a low level of breastfeeding knowledge, a neutral attitude, and medium self-efficacy. Therefore, nutrition education interventions using tailored messages appropriate to the sociocultural context in the rural setting should be developed and evaluated continuously. 
    more » « less