Causal knowledge is sought after throughout data-driven fields due to its explanatory power and potential value to inform decision-making. If the targeted system is well-understood in terms of its causal components, one is able to design more precise and surgical interventions so as to bring certain desired outcomes about. The idea of leveraging the causal understand- ing of a system to improve decision-making has been studied in the literature under the rubric of structural causal bandits (Lee and Bareinboim, 2018). In this setting, (1) pulling an arm corresponds to performing a causal intervention on a set of variables, while (2) the associated rewards are governed by the underlying causal mechanisms. One key assumption of this work is that any observed variable (X) in the system is manipulable, which means that intervening and making do(X = x) is always realizable. In many real-world scenarios, however, this is a too stringent requirement. For instance, while scientific evidence may support that obesity shortens life, it’s not feasible to manipulate obesity directly, but, for example, by decreasing the amount of soda consumption (Pearl, 2018). In this paper, we study a relaxed version of the structural causal bandit problem when not all variables are manipulable. Specifically, we develop a procedure that takes as argument partially specified causal knowledge and identifies the possibly-optimal arms in structural bandits with non-manipulable variables. We further introduce an algorithm that uncovers non-trivial dependence structure among the possibly-optimal arms. Finally, we corroborate our findings with simulations, which shows that MAB solvers enhanced with causal knowledge and leveraging the newly discovered dependence structure among arms consistently outperform their causal-insensitive counterparts.
more »
« less
On the Interpretation of d o ( x )do(x)
Abstract This paper provides empirical interpretation of the do(x) operator when applied to non-manipulable variables such as race, obesity, or cholesterol level. We view do(x) as an ideal intervention that provides valuable information on the effects of manipulable variables and is thus empirically testable. We draw parallels between this interpretation and ways of enabling machines to learn effects of untried actions from those tried. We end with the conclusion that researchers need not distinguish manipulable from non-manipulable variables; both types are equally eligible to receive the do(x) operator and to produce useful information for decision makers.
more »
« less
- Award ID(s):
- 1704932
- PAR ID:
- 10097921
- Date Published:
- Journal Name:
- Journal of Causal Inference
- Volume:
- 7
- Issue:
- 1
- ISSN:
- 2193-3685
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Cryo-cooling has been nearly universally adopted to mitigate X-ray damage and facilitate crystal handling in protein X-ray crystallography. However, cryo X-ray crystallographic data provide an incomplete window into the ensemble of conformations that is at the heart of protein function and energetics. Room-temperature (RT) X-ray crystallography provides accurate ensemble information, and recent developments allow conformational heterogeneity (the experimental manifestation of ensembles) to be extracted from single-crystal data. Nevertheless, high sensitivity to X-ray damage at RT raises concerns about data reliability. To systematically address this critical issue, increasingly X-ray-damaged high-resolution data sets (1.02–1.52 Å resolution) were obtained from single proteinase K, thaumatin and lysozyme crystals at RT (277 K). In each case a modest increase in conformational heterogeneity with X-ray damage was observed. Merging data with different extents of damage (as is typically carried out) had negligible effects on conformational heterogeneity until the overall diffraction intensity decayed to ∼70% of its initial value. These effects were compared with X-ray damage effects in cryo-cooled crystals by carrying out an analogous analysis of increasingly damaged proteinase K cryo data sets (0.9–1.16 Å resolution). X-ray damage-associated heterogeneity changes were found that were not observed at RT. This property renders it difficult to distinguish real from artefactual conformations and to determine the conformational response to changes in temperature. The ability to acquire reliable heterogeneity information from single crystals at RT, together with recent advances in RT data collection at accessible synchrotron beamlines, provides a strong motivation for the widespread adoption of RT X-ray crystallography to obtain conformational ensemble information.more » « less
-
In a chance constrained program (CCP), decision makers seek the best decision whose probability of violating the uncertainty constraints is within the prespecified risk level. As a CCP is often nonconvex and is difficult to solve to optimality, much effort has been devoted to developing convex inner approximations for a CCP, among which the conditional value-at-risk (CVaR) has been known to be the best for more than a decade. This paper studies and generalizes the ALSO-X, originally proposed by Ahmed, Luedtke, SOng, and Xie in 2017 , for solving a CCP. We first show that the ALSO-X resembles a bilevel optimization, where the upper-level problem is to find the best objective function value and enforce the feasibility of a CCP for a given decision from the lower-level problem, and the lower-level problem is to minimize the expectation of constraint violations subject to the upper bound of the objective function value provided by the upper-level problem. This interpretation motivates us to prove that when uncertain constraints are convex in the decision variables, ALSO-X always outperforms the CVaR approximation. We further show (i) sufficient conditions under which ALSO-X can recover an optimal solution to a CCP; (ii) an equivalent bilinear programming formulation of a CCP, inspiring us to enhance ALSO-X with a convergent alternating minimization method (ALSO-X+); and (iii) an extension of ALSO-X and ALSO-X+ to distributionally robust chance constrained programs (DRCCPs) under the ∞−Wasserstein ambiguity set. Our numerical study demonstrates the effectiveness of the proposed methods.more » « less
-
Abstract We study the free probabilistic analog of optimal couplings for the quadratic cost, where classical probability spaces are replaced by tracial von Neumann algebras, and probability measures on $${\mathbb {R}}^m$$ R m are replaced by non-commutative laws of m -tuples. We prove an analog of the Monge–Kantorovich duality which characterizes optimal couplings of non-commutative laws with respect to Biane and Voiculescu’s non-commutative $$L^2$$ L 2 -Wasserstein distance using a new type of convex functions. As a consequence, we show that if ( X , Y ) is a pair of optimally coupled m -tuples of non-commutative random variables in a tracial $$\mathrm {W}^*$$ W ∗ -algebra $$\mathcal {A}$$ A , then $$\mathrm {W}^*((1 - t)X + tY) = \mathrm {W}^*(X,Y)$$ W ∗ ( ( 1 - t ) X + t Y ) = W ∗ ( X , Y ) for all $$t \in (0,1)$$ t ∈ ( 0 , 1 ) . Finally, we illustrate the subtleties of non-commutative optimal couplings through connections with results in quantum information theory and operator algebras. For instance, two non-commutative laws that can be realized in finite-dimensional algebras may still require an infinite-dimensional algebra to optimally couple. Moreover, the space of non-commutative laws of m -tuples is not separable with respect to the Wasserstein distance for $$m > 1$$ m > 1 .more » « less
-
Windecker, Saras (Ed.)1. The ecological and environmental science communities have embraced machine learning (ML) for empirical modelling and prediction. However, going beyond prediction to draw insights into underlying functional relationships between response variables and environmental ‘drivers’ is less straightforward. Deriving ecological insights from fitted ML models requires techniques to extract the ‘learning’ hidden in the ML models. 2. We revisit the theoretical background and effectiveness of four approaches for deriving insights from ML: ranking independent variable importance (Gini importance, GI; permutation importance, PI; split importance, SI; and conditional permutation importance, CPI), and two approaches for inference of bivariate functional relationships (partial dependence plots, PDP; and accumulated local effect plots, ALE). We also explore the use of a surrogate model for visualization and interpretation of complex multi-variate relationships between response variables and environmental drivers. We examine the challenges and opportunities for extracting ecological insights with these interpretation approaches. Specifically, we aim to improve interpretation of ML models by investigating how effectiveness relates to (a) interpretation algorithm, (b) sample size and (c) the presence of spurious explanatory variables. 3. We base the analysis on simulations with known underlying functional relationships between response and predictor variables, with added white noise and the presence of correlated but non-influential variables. The results indicate that deriving ecological insight is strongly affected by interpretation algorithm and spurious variables, and moderately impacted by sample size. Removing spurious variables improves interpretation of ML models. Meanwhile, increasing sample size has limited value in the presence of spurious variables, but increasing sample size does improves performance once spurious variables are omitted. Among the four ranking methods, SI is slightly more effective than the other methods in the presence of spurious variables, while GI and SI yield higher accuracy when spurious variables are removed. PDP is more effective in retrieving underlying functional relationships than ALE, but its reliability declines sharply in the presence of spurious variables. Visualization and interpretation of the interactive effects of predictors and the response variable can be enhanced using surrogate models, including three-dimensional visualizations and use of loess planes to represent independent variable effects and interactions. 4. Machine learning analysts should be aware that including correlated independent variables in ML models with no clear causal relationship to response variables can interfere with ecological inference. When ecological inference is important, ML models should be constructed with independent variables that have clear causal effects on response variables. While interpreting ML models for ecological inference remains challenging, we show that careful choice of interpretation methods, exclusion of spurious variables and adequate sample size can provide more and better opportunities to ‘learn from machine learning’.more » « less