skip to main content


Title: Condition number bounds for causal inference
An important achievement in the field of causal inference was a complete characterization of when a causal effect, in a system modeled by a causal graph, can be determined uniquely from purely observational data. The identification algorithms resulting from this work produce exact symbolic expressions for causal effects, in terms of the observational probabilities. More recent work has looked at the numerical properties of these expressions, in particular using the classical notion of the condition number. In its classical interpretation, the condition number quantifies the sensitivity of the output values of the expressions to small numerical perturbations in the input observational probabilities. In the context of causal identification, the condition number has also been shown to be related to the effect of certain kinds of uncertainties in the structure of the causal graphical model. In this paper, we first give an upper bound on the condition number for the interesting case of causal graphical models with small “confounded components”. We then develop a tight characterization of the condition number of any given causal identification problem. Finally, we use our tight characterization to give a specific example where the condition number can be much lower than that obtained via generic bounds on the condition number, and to show that even “equivalent” expressions for causal identification can behave very differently with respect to their numerical stability properties.  more » « less
Award ID(s):
1909972
NSF-PAR ID:
10374292
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
161
ISSN:
2640-3498
Page Range / eLocation ID:
1948-1957
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. de Campos, C. ; Maathuis, M. H. (Ed.)
    An important achievement in the field of causal inference was a complete characterization of when a causal effect, in a system modeled by a causal graph, can be determined uniquely from purely observational data. The identification algorithms resulting from this work produce exact symbolic expressions for causal effects, in terms of the observational probabilities. More recent work has looked at the numerical properties of these expressions, in particular using the classical notion of the condition number. In its classical interpretation, the condition number quantifies the sensitivity of the output values of the expressions to small numerical perturbations in the input observational probabilities. In the context of causal identification, the condition number has also been shown to be related to the effect of certain kinds of uncertainties in the structure of the causal graphical model. In this paper, we first give an upper bound on the condition number for the interesting case of causal graphical models with small “confounded components”. We then develop a tight characterization of the condition number of any given causal identification problem. Finally, we use our tight characterization to give a specific example where the condition number can be much lower than that obtained via generic bounds on the condition number, and to show that even “equivalent” expressions for causal identification can behave very differently with respect to their numerical stability properties. 
    more » « less
  2. Ruiz, Francisco ; Dy, Jennifer ; van de Meent, Jan-Willem (Ed.)
    We study the sample complexity of causal structure learning on a two-variable system with observational and experimental data. Specifically, for two variables X and Y, we consider the classical scenario where either X causes Y , Y causes X, or there is an unmeasured confounder between X and Y. We show that if X and Y are over a finite domain of size k and are significantly correlated, the minimum number of interventional samples needed is sublinear in k. We give a tight characterization of the tradeoff between observational and interventional data when the number of observational samples is sufficiently large. We build upon techniques for closeness testing and for non-parametric density estimation in different regimes of observational data. Our hardness results are based on carefully constructing causal models whose marginal and interventional distributions form hard instances of canonical results on property testing. 
    more » « less
  3. This paper investigates the problem of selecting instrumental variables relative to a target causal influence X→Y from observational data generated by linear non-Gaussian acyclic causal models in the presence of unmeasured confounders. We propose a necessary condition for detecting variables that cannot serve as instrumental variables. Unlike many existing conditions for continuous variables, i.e., that at least two or more valid instrumental variables are present in the system, our condition is designed with a single instrumental variable. We then characterize the graphical implications of our condition in linear non-Gaussian acyclic causal models. Given that the existing graphical criteria for the instrument validity are not directly testable given observational data, we further show whether and how such graphical criteria can be checked by exploiting our condition. Finally, we develop a method to select the set of candidate instrumental variables given observational data. Experimental results on both synthetic and real-world data show the effectiveness of the proposed method. 
    more » « less
  4. Causal effect identification is concerned with determining whether a causal effect is computable from a combination of qualitative assumptions about the underlying system (e.g., a causal graph) and distributions collected from this system. Many identification algorithms exclusively rely on graphical criteria made of a non-trivial combination of probability axioms, do-calculus, and refined c-factorization (e.g., Lee & Bareinboim, 2020). In a sequence of increasingly sophisticated results, it has been shown how proxy variables can be used to identify certain effects that would not be otherwise recoverable in challenging scenarios through solving matrix equations (e.g., Kuroki & Pearl, 2014; Miao et al., 2018). In this paper, we develop a new causal identification algorithm which utilizes both graphical criteria and matrix equations. Specifically, we first characterize the relationships between certain graphically-driven formulae and matrix multiplications. With such characterizations, we broaden the spectrum of proxy variable based identification conditions and further propose novel intermediary criteria based on the pseudoinverse of a matrix. Finally, we devise a causal effect identification algorithm, which accepts as input a collection of marginal, conditional, and interventional distributions, integrating enriched matrix-based criteria into a graphical identification approach. 
    more » « less
  5. Causal effect identification is one of the most prominent and well-understood problems in causal inference. Despite the generality and power of the results developed so far, there are still challenges in their applicability to practical settings, arguably due to the finitude of the samples. Simply put, there is a gap between causal effect identification and estimation. One popular setting in which sample-efficient estimators from finite samples exist is when the celebrated back-door condition holds. In this paper, we extend weighting-based methods developed for the back-door case to more general settings, and develop novel machinery for estimating causal effects using the weighting-based method as a building block. We derive graphical criteria under which causal effects can be estimated using this new machinery and demonstrate the effectiveness of the proposed method through simulation studies. 
    more » « less