skip to main content

Search for: All records

Creators/Authors contains: "Zhang, Kun"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available July 1, 2023
  2. Recently, many regression based conditional independence (CI) test methods have been proposed to solve the problem of causal discovery. These methods provide alternatives to test CI by first removing the information of the controlling set from the two target variables, and then testing the independence between the corresponding residuals Res1 and Res2. When the residuals are linearly uncorrelated, the independence test between them is nontrivial. With the ability to calculate inner product in high-dimensional space, kernel-based methods are usually used to achieve this goal, but still consume considerable time. In this paper, we investigate the independence between two linear combinations under linear non-Gaussian structural equation model. We show that the dependence between the two residuals can be captured by the difference between the similarity of (Res1, Res2) and that of (Res1, Res3) (Res3 is generated by random permutation) in high-dimensional space. With this result, we design a new method called SCIT for CI test, where permutation test is performed to control Type I error rate. The proposed method is simpler yet more efficient and effective than the existing ones. When applied to causal discovery, the proposed method outperforms the counterparts in terms of both speed and Type II error rate,more »especially in the case of small sample size, which is validated by our extensive experiments on various datasets.« less
    Free, publicly-accessible full text available June 30, 2023
  3. Chaudhuri, Kamalika ; Jegelka, Stefanie ; Song, Le ; Szepesvari, Csaba ; Niu, Gang ; Sabato, Sivan (Ed.)
    Traditional causal discovery methods mainly focus on estimating causal relations among measured variables, but in many real-world problems, such as questionnaire-based psychometric studies, measured variables are generated by latent variables that are causally related. Accordingly, this paper investigates the problem of discovering the hidden causal variables and estimating the causal structure, including both the causal relations among latent variables and those between latent and measured variables. We relax the frequently-used measurement assumption and allow the children of latent variables to be latent as well, and hence deal with a specific type of latent hierarchical causal structure. In particular, we define a minimal latent hierarchical structure and show that for linear non-Gaussian models with the minimal latent hierarchical structure, the whole structure is identifiable from only the measured variables. Moreover, we develop a principled method to identify the structure by testing for Generalized Independent Noise (GIN) conditions in specific ways. Experimental results on both synthetic and real-world data show the effectiveness of the proposed approach.
    Free, publicly-accessible full text available July 1, 2023
  4. An important problem across multiple disciplines is to infer and understand meaningful latent variables. One strategy commonly used is to model the measured variables in terms of the latent variables under suitable assumptions on the connectivity from the latents to the measured (known as measurement model). Furthermore, it might be even more interesting to discover the causal relations among the latent variables (known as structural model). Recently, some methods have been proposed to estimate the structural model by assuming that the noise terms in the measured and latent variables are non-Gaussian. However, they are not suitable when some of the noise terms become Gaussian. To bridge this gap, we investigate the problem of identification of the structural model with arbitrary noise distributions. We provide necessary and sufficient condition under which the structural model is identifiable: it is identifiable iff for each pair of adjacent latent variables Lx, Ly, (1) at least one of Lx and Ly has non-Gaussian noise, or (2) at least one of them has a non-Gaussian ancestor and is not d-separated from the non-Gaussian component of this ancestor by the common causes of Lx and Ly. This identifiability result relaxes the non-Gaussianity requirements to only a (hopefullymore »small) subset of variables, and accordingly elegantly extends the application scope of the structural model. Based on the above identifiability result, we further propose a practical algorithm to learn the structural model. We verify the correctness of the identifiability result and the effectiveness of the proposed method through empirical studies.« less
    Free, publicly-accessible full text available June 30, 2023
  5. Schölkopf, Bernhard ; Uhler, Caroline ; Zhang, Kun (Ed.)
    Fairness of machine learning algorithms has been of increasing interest. In order to suppress or eliminate discrimination in prediction, various notions as well as approaches have been proposed to impose fairness. Given a notion of fairness, an essential problem is then whether or not it can always be attained, even if with an unlimited amount of data. This issue is, however, not well addressed yet. In this paper, focusing on the Equalized Odds notion of fairness, we consider the attainability of this criterion and, furthermore, if it is attainable, the optimality of the prediction performance under various settings. In particular, for prediction performed by a deterministic function of input features, we give conditions under which Equalized Odds can hold true; if the stochastic prediction is acceptable, we show that under mild assumptions, fair predictors can always be derived. For classification, we further prove that compared to enforcing fairness by post-processing, one can always benefit from exploiting all available features during training and get potentially better prediction performance while remaining fair. Moreover, while stochastic prediction can attain Equalized Odds with theoretical guarantees, we also discuss its limitation and potential negative social impacts.
    Free, publicly-accessible full text available April 1, 2023
  6. Chaudhuri, Kamalika ; Jegelka, Stefanie ; Song, Le ; Szepesvari, Csaba ; Niu, Gang ; Sabato, Sivan (Ed.)
    Unsupervised domain adaptation is critical to many real-world applications where label information is unavailable in the target domain. In general, without further assumptions, the joint distribution of the features and the label is not identifiable in the target domain. To address this issue, we rely on a property of minimal changes of causal mechanisms across domains to minimize unnecessary influences of domain shift. To encode this property, we first formulate the data generating process using a latent variable model with two partitioned latent subspaces: invariant components whose distributions stay the same across domains, and sparse changing components that vary across domains. We further constrain the domain shift to have a restrictive influence on the changing components. Under mild conditions, we show that the latent variables are partially identifiable, from which it follows that the joint distribution of data and labels in the target domain is also identifiable. Given the theoretical insights, we propose a practical domain adaptation framework, called iMSDA. Extensive experimental results reveal that iMSDA outperforms state-of-the-art domain adaptation algorithms on benchmark datasets, demonstrating the effectiveness of our framework.
    Free, publicly-accessible full text available July 1, 2023
  7. Camps-Valls, Gustau ; Ruiz, Francisco J. ; Valera, Isabel (Ed.)
    Traditionally, Bayesian network structure learning is often carried out at a central site, in which all data is gathered. However, in practice, data may be distributed across different parties (e.g., companies, devices) who intend to collectively learn a Bayesian network, but are not willing to disclose information related to their data owing to privacy or security concerns. In this work, we present a federated learning approach to estimate the structure of Bayesian network from data that is horizontally partitioned across different parties. We develop a distributed structure learning method based on continuous optimization, using the alternating direction method of multipliers (ADMM), such that only the model parameters have to be exchanged during the optimization process. We demonstrate the flexibility of our approach by adopting it for both linear and nonlinear cases. Experimental results on synthetic and real datasets show that it achieves an improved performance over the other methods, especially when there is a relatively large number of clients and each has a limited sample size.
  8. This paper investigates the problem of selecting instrumental variables relative to a target causal influence X→Y from observational data generated by linear non-Gaussian acyclic causal models in the presence of unmeasured confounders. We propose a necessary condition for detecting variables that cannot serve as instrumental variables. Unlike many existing conditions for continuous variables, i.e., that at least two or more valid instrumental variables are present in the system, our condition is designed with a single instrumental variable. We then characterize the graphical implications of our condition in linear non-Gaussian acyclic causal models. Given that the existing graphical criteria for the instrument validity are not directly testable given observational data, we further show whether and how such graphical criteria can be checked by exploiting our condition. Finally, we develop a method to select the set of candidate instrumental variables given observational data. Experimental results on both synthetic and real-world data show the effectiveness of the proposed method.
    Free, publicly-accessible full text available April 1, 2023
  9. Camps-Valls, Gustau ; Ruiz, Francisco J. ; Valera, Isabel (Ed.)
    Recently, structure learning of directed acyclic graphs (DAGs) has been formulated as a continuous optimization problem by leveraging an algebraic characterization of acyclicity. The constrained problem is solved using the augmented Lagrangian method (ALM) which is often preferred to the quadratic penalty method (QPM) by virtue of its standard convergence result that does not require the penalty coefficient to go to infinity, hence avoiding ill-conditioning. However, the convergence properties of these methods for structure learning, including whether they are guaranteed to return a DAG solution, remain unclear, which might limit their practical applications. In this work, we examine the convergence of ALM and QPM for structure learning in the linear, nonlinear, and confounded cases. We show that the standard convergence result of ALM does not hold in these settings, and demonstrate empirically that its behavior is akin to that of the QPM which is prone to ill-conditioning. We further establish the convergence guarantee of QPM to a DAG solution, under mild conditions. Lastly, we connect our theoretical results with existing approaches to help resolve the convergence issue, and verify our findings in light of an empirical comparison of them.