skip to main content


Title: Identifiability of causal effects with multiple causes and a binary outcome
Summary Unobserved confounding presents a major threat to causal inference in observational studies. Recently, several authors have suggested that this problem could be overcome in a shared confounding setting where multiple treatments are independent given a common latent confounder. It has been shown that under a linear Gaussian model for the treatments, the causal effect is not identifiable without parametric assumptions on the outcome model. In this note, we show that the causal effect is indeed identifiable if we assume a general binary choice model for the outcome with a non-probit link. Our identification approach is based on the incongruence between Gaussianity of the treatments and latent confounder and non-Gaussianity of a latent outcome variable. We further develop a two-step likelihood-based estimation procedure.  more » « less
Award ID(s):
1811245
NSF-PAR ID:
10337450
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Biometrika
Volume:
109
Issue:
1
ISSN:
0006-3444
Page Range / eLocation ID:
265 to 272
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary Structural failure time models are causal models for estimating the effect of time-varying treatments on a survival outcome. G-estimation and artificial censoring have been proposed for estimating the model parameters in the presence of time-dependent confounding and administrative censoring. However, most existing methods require manually pre-processing data into regularly spaced data, which may invalidate the subsequent causal analysis. Moreover, the computation and inference are challenging due to the nonsmoothness of artificial censoring. We propose a class of continuous-time structural failure time models that respects the continuous-time nature of the underlying data processes. Under a martingale condition of no unmeasured confounding, we show that the model parameters are identifiable from a potentially infinite number of estimating equations. Using the semiparametric efficiency theory, we derive the first semiparametric doubly robust estimators, which are consistent if the model for the treatment process or the failure time model, but not necessarily both, is correctly specified. Moreover, we propose using inverse probability of censoring weighting to deal with dependent censoring. In contrast to artificial censoring, our weighting strategy does not introduce nonsmoothness in estimation and ensures that resampling methods can be used for inference. 
    more » « less
  2. An important problem across multiple disciplines is to infer and understand meaningful latent variables. One strategy commonly used is to model the measured variables in terms of the latent variables under suitable assumptions on the connectivity from the latents to the measured (known as measurement model). Furthermore, it might be even more interesting to discover the causal relations among the latent variables (known as structural model). Recently, some methods have been proposed to estimate the structural model by assuming that the noise terms in the measured and latent variables are non-Gaussian. However, they are not suitable when some of the noise terms become Gaussian. To bridge this gap, we investigate the problem of identification of the structural model with arbitrary noise distributions. We provide necessary and sufficient condition under which the structural model is identifiable: it is identifiable iff for each pair of adjacent latent variables Lx, Ly, (1) at least one of Lx and Ly has non-Gaussian noise, or (2) at least one of them has a non-Gaussian ancestor and is not d-separated from the non-Gaussian component of this ancestor by the common causes of Lx and Ly. This identifiability result relaxes the non-Gaussianity requirements to only a (hopefully small) subset of variables, and accordingly elegantly extends the application scope of the structural model. Based on the above identifiability result, we further propose a practical algorithm to learn the structural model. We verify the correctness of the identifiability result and the effectiveness of the proposed method through empirical studies. 
    more » « less
  3. Learning causal structure from observational data has attracted much attention,and it is notoriously challenging to find the underlying structure in the presenceof confounders (hidden direct common causes of two variables). In this paper,by properly leveraging the non-Gaussianity of the data, we propose to estimatethe structure over latent variables with the so-called Triad constraints: we design a form of "pseudo-residual" from three variables, and show that when causal relations are linear and noise terms are non-Gaussian, the causal direction between the latent variables for the three observed variables is identifiable by checking a certain kind of independence relationship. In other words, the Triad constraints help us to locate latent confounders and determine the causal direction between them. This goes far beyond the Tetrad constraints and reveals more information about the underlying structure from non-Gaussian data. Finally, based on the Triad constraints, we develop a two-step algorithm to learn the causal structure corresponding to measurement models. Experimental results on both synthetic and real data demonstrate the effectiveness and reliability of our method. 
    more » « less
  4. Methicillin-resistant Staphylococcus aureus (MRSA) is a type of bacteria resistant to certain antibiotics, making it difficult to prevent MRSA infections. Among decades of efforts to conquer infectious diseases caused by MRSA, many studies have been proposed to estimate the causal effects of close contact (treatment) on MRSA infection (outcome) from observational data. In this problem, the treatment assignment mechanism plays a key role as it determines the patterns of missing counterfactuals --- the fundamental challenge of causal effect estimation. Most existing observational studies for causal effect learning assume that the treatment is assigned individually for each unit. However, on many occasions, the treatments are pairwisely assigned for units that are connected in graphs, i.e., the treatments of different units are entangled. Neglecting the entangled treatments can impede the causal effect estimation. In this paper, we study the problem of causal effect estimation with treatment entangled in a graph. Despite a few explorations for entangled treatments, this problem still remains challenging due to the following challenges: (1) the entanglement brings difficulties in modeling and leveraging the unknown treatment assignment mechanism; (2) there may exist hidden confounders which lead to confounding biases in causal effect estimation; (3) the observational data is often time-varying. To tackle these challenges, we propose a novel method NEAT, which explicitly leverages the graph structure to model the treatment assignment mechanism, and mitigates confounding biases based on the treatment assignment modeling. We also extend our method into a dynamic setting to handle time-varying observational data. Experiments on both synthetic datasets and a real-world MRSA dataset validate the effectiveness of the proposed method, and provide insights for future applications. 
    more » « less
  5. Recommender systems may be confounded by various types of confounding factors (also called confounders) that may lead to inaccurate recommendations and sacrificed recommendation performance. Current approaches to solving the problem usually design each specific model for each specific confounder. However, real-world systems may include a huge number of confounders and thus designing each specific model for each specific confounder could be unrealistic. More importantly, except for those “explicit confounders” that experts can manually identify and process such as item’s position in the ranking list, there are also many “latent confounders” that are beyond the imagination of experts. For example, users’ rating on a song may depend on their current mood or the current weather, and users’ preference on ice creams may depend on the air temperature. Such latent confounders may be unobservable in the recorded training data. To solve the problem, we propose Deconfounded Causal Collaborative Filtering (DCCF). We first frame user behaviors with unobserved confounders into a causal graph, and then we design a front-door adjustment model carefully fused with machine learning to deconfound the influence of unobserved confounders. Experiments on real-world datasets show that our method is able to deconfound unobserved confounders to achieve better recommendation performance. 
    more » « less