Recent years have witnessed a rocketing growth of machine learning methods on graph data, especially those powered by effective neural networks. Despite their success in different real‐world scenarios, the majority of these methods on graphs only focus on predictive or descriptive tasks, but lack consideration of causality. Causal inference can reveal the causality inside data, promote human understanding of the learning process and model prediction, and serve as a significant component of artificial intelligence (AI). An important problem in causal inference is causal effect estimation, which aims to estimate the causal effects of a certain treatment (e.g., prescription of medicine) on an outcome (e.g., cure of disease) at an individual level (e.g., each patient) or a population level (e.g., a group of patients). In this paper, we introduce the background of causal effect estimation from observational data, envision the challenges of causal effect estimation with graphs, and then summarize representative approaches of causal effect estimation with graphs in recent years. Furthermore, we provide some insights for future research directions in related area. Link to video abstract:
- Award ID(s):
- 1747614
- Publication Date:
- NSF-PAR ID:
- 10304124
- Journal Name:
- ACM Transactions on Knowledge Discovery from Data
- Volume:
- 15
- Issue:
- 5
- ISSN:
- 1556-4681
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract https://youtu.be/BpDPOOqw‐ns -
Abstract Evaluating treatment effect heterogeneity widely informs treatment decision making. At the moment, much emphasis is placed on the estimation of the conditional average treatment effect via flexible machine learning algorithms. While these methods enjoy some theoretical appeal in terms of consistency and convergence rates, they generally perform poorly in terms of uncertainty quantification. This is troubling since assessing risk is crucial for reliable decision-making in sensitive and uncertain environments. In this work, we propose a conformal inference-based approach that can produce reliable interval estimates for counterfactuals and individual treatment effects under the potential outcome framework. For completely randomized or stratified randomized experiments with perfect compliance, the intervals have guaranteed average coverage in finite samples regardless of the unknown data generating mechanism. For randomized experiments with ignorable compliance and general observational studies obeying the strong ignorability assumption, the intervals satisfy a doubly robust property which states the following: the average coverage is approximately controlled if either the propensity score or the conditional quantiles of potential outcomes can be estimated accurately. Numerical studies on both synthetic and real data sets empirically demonstrate that existing methods suffer from a significant coverage deficit even in simple models. In contrast, our methods achieve themore »
-
Identifying cause-effect relations among variables is a key step in the decision-making process. Whereas causal inference requires randomized experiments, researchers and policy makers are increasingly using observational studies to test causal hypotheses due to the wide availability of data and the infeasibility of experiments. The matching method is the most used technique to make causal inference from observational data. However, the pair assignment process in one-to-one matching creates uncertainty in the inference because of different choices made by the experimenter. Recently, discrete optimization models have been proposed to tackle such uncertainty; however, they produce 0-1 nonlinear problems and lack scalability. In this work, we investigate this emerging data science problem and develop a unique computational framework to solve the robust causal inference test instances from observational data with continuous outcomes. In the proposed framework, we first reformulate the nonlinear binary optimization problems as feasibility problems. By leveraging the structure of the feasibility formulation, we develop greedy schemes that are efficient in solving robust test problems. In many cases, the proposed algorithms achieve a globally optimal solution. We perform experiments on real-world data sets to demonstrate the effectiveness of the proposed algorithms and compare our results with the state-of-the-art solver. Ourmore »
-
Abstract Structural nested mean models (SNMMs) are useful for causal inference of treatment effects in longitudinal observational studies. Most existing works assume that the data are collected at prefixed time points for all subjects, which, however, may be restrictive in practice. To deal with irregularly spaced observations, we assume a class of continuous‐time SNMMs and a martingale condition of no unmeasured confounding (NUC) to identify the causal parameters. We develop the semiparametric efficiency theory and locally efficient estimators for continuous‐time SNMMs. This task is nontrivial due to the restrictions from the NUC assumption imposed on the SNMM parameter. In the presence of ignorable censoring, we show that the complete‐case estimator is optimal among a class of weighting estimators including the inverse probability of censoring weighting estimator, and it achieves a double robustness feature in that it is consistent if at least one of the models for the potential outcome mean function and the treatment process is correctly specified. The new framework allows us to conduct causal analysis respecting the underlying continuous‐time nature of data processes. The simulation study shows that the proposed estimator outperforms existing approaches. We estimate the effect of time to initiate highly active antiretroviral therapy on themore »
-
Two important considerations in clinical research studies are proper evaluations of internal and external validity. While randomized clinical trials can overcome several threats to internal validity, they may be prone to poor external validity. Conversely, large prospective observational studies sampled from a broadly generalizable population may be externally valid, yet susceptible to threats to internal validity, particularly confounding. Thus, methods that address confounding and enhance transportability of study results across populations are essential for internally and externally valid causal inference, respectively. These issues persist for another problem closely related to transportability known as data‐fusion. We develop a calibration method to generate balancing weights that address confounding and sampling bias, thereby enabling valid estimation of the target population average treatment effect. We compare the calibration approach to two additional doubly robust methods that estimate the effect of an intervention on an outcome within a second, possibly unrelated target population. The proposed methodologies can be extended to resolve data‐fusion problems that seek to evaluate the effects of an intervention using data from two related studies sampled from different populations. A simulation study is conducted to demonstrate the advantages and similarities of the different techniques. We also test the performance of the calibrationmore »