skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: Neuropathic Pain Diagnosis Simulator for Causal Discovery Algorithm Evaluation
Discovery of causal relations from observational data is essential for many disciplines of science and real-world applications. However, unlike other machine learning algorithms, whose development has been greatly fostered by a large amount of available benchmark datasets, causal discovery algorithms are notoriously difficult to be systematically evaluated because few datasets with known ground-truth causal relations are available. In this work, we handle the problem of evaluating causal discovery algorithms by building a flexible simulator in the medical setting. We develop a neuropathic pain diagnosis simulator, inspired by the fact that the biological processes of neuropathic pathophysiology are well studied with well-understood causal influences. Our simulator exploits the causal graph of theneuropathic pain pathology and its parameters in the generator are estimated from real-life patient cases. We show that the data generated from our simulator have similar statistics as real-world data. As a clear advantage, the simulator can produce infinite samples without jeopardizing the privacy of real-world patients. Our simulator provides a natural tool for evaluating various types of causal discovery algorithms, including those to deal with practical issues in causal discovery, such as unknown confounders, selection bias, and missing data. Using our simulator,we have evaluated extensively causal discovery algorithms under various settings.  more » « less
Award ID(s):
1829681
NSF-PAR ID:
10125763
Author(s) / Creator(s):
Date Published:
Journal Name:
Advances in neural information processing systems
ISSN:
1049-5258
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Chaudhuri, Kamalika ; Jegelka, Stefanie ; Song, Le ; Szepesvari, Csaba ; Niu, Gang ; Sabato, Sivan (Ed.)
    Traditional causal discovery methods mainly focus on estimating causal relations among measured variables, but in many real-world problems, such as questionnaire-based psychometric studies, measured variables are generated by latent variables that are causally related. Accordingly, this paper investigates the problem of discovering the hidden causal variables and estimating the causal structure, including both the causal relations among latent variables and those between latent and measured variables. We relax the frequently-used measurement assumption and allow the children of latent variables to be latent as well, and hence deal with a specific type of latent hierarchical causal structure. In particular, we define a minimal latent hierarchical structure and show that for linear non-Gaussian models with the minimal latent hierarchical structure, the whole structure is identifiable from only the measured variables. Moreover, we develop a principled method to identify the structure by testing for Generalized Independent Noise (GIN) conditions in specific ways. Experimental results on both synthetic and real-world data show the effectiveness of the proposed approach. 
    more » « less
  2. We present CausalSim, a causal framework for unbiased trace-driven simulation. Current trace-driven simulators assume that the interventions being simulated (e.g., a new algorithm) would not affect the validity of the traces. However, real-world traces are often biased by the choices algorithms make during trace collection, and hence replaying traces under an intervention may lead to incorrect results. CausalSim addresses this challenge by learning a causal model of the system dynamics and latent factors capturing the underlying system conditions during trace collection. It learns these models using an initial randomized control trial (RCT) under a fixed set of algorithms, and then applies them to remove biases from trace data when simulating new algorithms. Key to CausalSim is mapping unbiased trace-driven simulation to a tensor completion problem with extremely sparse observations. By exploiting a basic distributional invariance property present in RCT data, CausalSim enables a novel tensor completion method despite the sparsity of observations. Our extensive evaluation of CausalSim on both real and synthetic datasets, including more than ten months of real data from the Puffer video streaming system shows it improves simulation accuracy, reducing errors by 53% and 61% on average compared to expert-designed and supervised learning baselines. Moreover, CausalSim provides markedly different insights about ABR algorithms compared to the biased baseline simulator, which we validate with a real deployment 
    more » « less
  3. Causal discovery is an important problem in many sciences that enables us to estimate causal relationships from observational data. Particularly, in the healthcare domain, it can guide practitioners in making informed clinical decisions. Several causal discovery approaches have been developed over the last few decades. The success of these approaches mostly relies on a large number of data samples. In practice, however, an infinite amount of data is never available. Fortunately, often we have some prior knowledge available from the problem domain. Particularly, in healthcare settings, we often have some prior knowledge such as expert opinions, prior RCTs, literature evidence, and systematic reviews about the clinical problem. This prior information can be utilized in a systematic way to address the data scarcity problem. However, most of the existing causal discovery approaches lack a systematic way to incorporate prior knowledge during the search process. Recent advances in reinforcement learning techniques can be explored to use prior knowledge as constraints by penalizing the agent for their violations. Therefore, in this work, we propose a framework KCRL that utilizes the existing knowledge as a constraint to penalize the search process during causal discovery. This utilization of existing information during causal discovery reduces the graph search space and enables a faster convergence to the optimal causal mechanism. We evaluated our framework on benchmark synthetic and real datasets as well as on a real-life healthcare application. We also compared its performance with several baseline causal discovery methods. The experimental findings show that penalizing the search process for constraint violation yields better performance compared to existing approaches that do not include prior knowledge. 
    more » « less
  4. In real-world phenomena which involve mutual influence or causal effects between interconnected units, equilibrium states are typically represented with cycles in graphical models. An expressive class of graphical models, relational causal models, can represent and reason about complex dynamic systems exhibiting such cycles or feedback loops. Existing cyclic causal discovery algorithms for learning causal models from observational data assume that the data instances are independent and identically distributed which makes them unsuitable for relational causal models. At the same time, causal discovery algorithms for relational causal models assume acyclicity. In this work, we examine the necessary and sufficient conditions under which a constraint-based relational causal discovery algorithm is sound and complete for cyclic relational causal models. We introduce relational acyclification, an operation specifically designed for relational models that enables reasoning about the identifiability of cyclic relational causal models. We show that under the assumptions of relational acyclification and sigma-faithfulness, the relational causal discovery algorithm RCD is sound and complete for cyclic relational models. We present experimental results to support our claim. 
    more » « less
  5. Abstract Motivation

    Understanding causal effects is a fundamental goal of science and underpins our ability to make accurate predictions in unseen settings and conditions. While direct experimentation is the gold standard for measuring and validating causal effects, the field of causal graph theory offers a tantalizing alternative: extracting causal insights from observational data. Theoretical analysis has shown that this is indeed possible, given a large dataset and if certain conditions are met. However, biological datasets, frequently, do not meet such requirements but evaluation of causal discovery algorithms is typically performed on synthetic datasets, which they meet all requirements. Thus, real-life datasets are needed, in which the causal truth is reasonably known. In this work we first construct such a large-scale real-life dataset and then we perform on it a comprehensive benchmarking of various causal discovery methods.

    Results

    We find that the PC algorithm is particularly accurate at estimating causal structure, including the causal direction which is critical for biological applicability. However, PC does only produces cause-effect directionality, but not estimates of causal effects. We propose PC-NOTEARS (PCnt), a hybrid solution, which includes the PC output as an additional constraint inside the NOTEARS optimization. This approach combines PC algorithm’s strengths in graph structure prediction with the NOTEARS continuous optimization to estimate causal effects accurately. PCnt achieved best aggregate performance across all structural and effect size metrics.

    Availability and implementation

    https://github.com/zhu-yh1/PC-NOTEARS.

     
    more » « less