Structural causal models (SCMs) are widely used in various disciplines to represent causal relationships among variables in complex systems. Unfortunately, the true underlying directed acyclic graph (DAG) structure is often unknown, and determining it from observational or interventional data remains a challenging task. However, in many situations, the end goal is to identify changes (shifts) in causal mechanisms between related SCMs rather than recovering the entire underlying DAG structure. Examples include analyzing gene regulatory network structure changes between healthy and cancerous individuals or understanding variations in biological pathways under different cellular contexts. This paper focuses on identifying functional mechanism shifts in two or more related SCMs over the same set of variables -- without estimating the entire DAG structure of each SCM. Prior work under this setting assumed linear models with Gaussian noises; instead, in this work we assume that each SCM belongs to the more general class of nonlinear additive noise models (ANMs). A key contribution of this work is to show that the Jacobian of the score function for the mixture distribution allows for identification of shifts in general non-parametric functional mechanisms. Once the shifted variables are identified, we leverage recent work to estimate the structural differences, if any, for the shifted variables. Experiments on synthetic and real-world data are provided to showcase the applicability of this approach.
more »
« less
Causal Inference Despite Limited Global Confounding via Mixture Models
A Bayesian Network is a directed acyclic graph (DAG) on a set of n random variables (the vertices); a Bayesian Network Distribution (BND) is a probability distribution on the random variables that is Markovian on the graph. A finite k-mixture of such models is graphically represented by a larger graph which has an additional “hidden” (or “latent”) random variable U, ranging in {1,...,k}, and a directed edge from U to every other vertex. Models of this type are fundamental to causal inference, where U models an unobserved confounding effect of multiple populations, obscuring the causal relationships in the observable DAG. By solving the mixture problem and recovering the joint probability distribution with U, traditionally unidentifiable causal relationships become identifiable. Using a reduction to the more well-studied “product” case on empty graphs, we give the first algorithm to learn mixtures of non-empty DAGs.
more »
« less
- Award ID(s):
- 1909972
- PAR ID:
- 10475956
- Editor(s):
- van der Schaar, M.; Zhang, C.; Janzing, D.
- Publisher / Repository:
- PMLR
- Date Published:
- Journal Name:
- Proceedings of Machine Learning Research
- Volume:
- 213
- ISSN:
- 2640-3498
- Page Range / eLocation ID:
- 574-601
- Format(s):
- Medium: X
- Location:
- Tubingen, Germany
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Structural causal models (SCMs) are widely used in various disciplines to repre- sent causal relationships among variables in complex systems. Unfortunately, the underlying causal structure is often unknown, and estimating it from data remains a challenging task. In many situations, however, the end goal is to localize the changes (shifts) in the causal mechanisms between related datasets instead of learn- ing the full causal structure of the individual datasets. Some applications include root cause analysis, analyzing gene regulatory network structure changes between healthy and cancerous individuals, or explaining distribution shifts. This paper focuses on identifying the causal mechanism shifts in two or more related datasets over the same set of variables—without estimating the entire DAG structure of each SCM. Prior work under this setting assumed linear models with Gaussian noises; instead, in this work we assume that each SCM belongs to the more general class of nonlinear additive noise models (ANMs). A key technical contribution of this work is to show that the Jacobian of the score function for the mixture distribution allows for the identification of shifts under general non-parametric functional mechanisms. Once the shifted variables are identified, we leverage recent work to estimate the structural differences, if any, for the shifted variables. Experiments on synthetic and real-world data are provided to showcase the applicability of this approach. Code implementing the proposed method is open-source and publicly available at https://github.com/kevinsbello/iSCAN.more » « less
-
Summary We consider graphical models based on a recursive system of linear structural equations. This implies that there is an ordering, $$\sigma$$, of the variables such that each observed variable $$Y_v$$ is a linear function of a variable-specific error term and the other observed variables $$Y_u$$ with $$\sigma(u) < \sigma (v)$$. The causal relationships, i.e., which other variables the linear functions depend on, can be described using a directed graph. It has previously been shown that when the variable-specific error terms are non-Gaussian, the exact causal graph, as opposed to a Markov equivalence class, can be consistently estimated from observational data. We propose an algorithm that yields consistent estimates of the graph also in high-dimensional settings in which the number of variables may grow at a faster rate than the number of observations, but in which the underlying causal structure features suitable sparsity; specifically, the maximum in-degree of the graph is controlled. Our theoretical analysis is couched in the setting of log-concave error distributions.more » « less
-
Bayesian Networks (BNs) represent conditional probability relations among a set of random variables (nodes) in the form of a directed acyclic graph (DAG), and have found diverse applications in knowledge discovery. We study the problem of learning the sparse DAG structure of a BN from continuous observational data. The central problem can be modeled as a mixed-integer program with an objective function composed of a convex quadratic loss function and a regularization penalty subject to linear constraints. The optimal solution to this mathematical program is known to have desirable statistical properties under certain conditions. However, the state-of-the-art optimization solvers are not able to obtain provably optimal solutions to the existing mathematical formulations for medium-size problems within reasonable computational times. To address this difficulty, we tackle the problem from both computational and statistical perspectives. On the one hand, we propose a concrete early stopping criterion to terminate the branch-and-bound process in order to obtain a near-optimal solution to the mixed-integer program, and establish the consistency of this approximate solution. On the other hand, we improve the existing formulations by replacing the linear “big-M " constraints that represent the relationship between the continuous and binary indicator variables with second-order conic constraints. Our numerical results demonstrate the effectiveness of the proposed approaches.more » « less
-
Abstract Bayesian networks are powerful statistical models to understand causal relationships in real-world probabilistic problems such as diagnosis, forecasting, computer vision, etc. For systems that involve complex causal dependencies among many variables, the complexity of the associated Bayesian networks become computationally intractable. As a result, direct hardware implementation of these networks is one promising approach to reducing power consumption and execution time. However, the few hardware implementations of Bayesian networks presented in literature rely on deterministic CMOS devices that are not efficient in representing the stochastic variables in a Bayesian network that encode the probability of occurrence of the associated event. This work presents an experimental demonstration of a Bayesian network building block implemented with inherently stochastic spintronic devices based on the natural physics of nanomagnets. These devices are based on nanomagnets with perpendicular magnetic anisotropy, initialized to their hard axes by the spin orbit torque from a heavy metal under-layer utilizing the giant spin Hall effect, enabling stochastic behavior. We construct an electrically interconnected network of two stochastic devices and manipulate the correlations between their states by changing connection weights and biases. By mapping given conditional probability tables to the circuit hardware, we demonstrate that any two node Bayesian networks can be implemented by our stochastic network. We then present the stochastic simulation of an example case of a four node Bayesian network using our proposed device, with parameters taken from the experiment. We view this work as a first step towards the large scale hardware implementation of Bayesian networks.more » « less