skip to main content

Title: Quasi-oracle estimation of heterogeneous treatment effects
Summary Flexible estimation of heterogeneous treatment effects lies at the heart of many statistical applications, such as personalized medicine and optimal resource allocation. In this article we develop a general class of two-step algorithms for heterogeneous treatment effect estimation in observational studies. First, we estimate marginal effects and treatment propensities to form an objective function that isolates the causal component of the signal. Then, we optimize this data-adaptive objective function. The proposed approach has several advantages over existing methods. From a practical perspective, our method is flexible and easy to use: in both steps, any loss-minimization method can be employed, such as penalized regression, deep neural networks, or boosting; moreover, these methods can be fine-tuned by cross-validation. Meanwhile, in the case of penalized kernel regression, we show that our method has a quasi-oracle property. Even when the pilot estimates for marginal effects and treatment propensities are not particularly accurate, we achieve the same error bounds as an oracle with prior knowledge of these two nuisance components. We implement variants of our approach based on penalized regression, kernel ridge regression, and boosting in a variety of simulation set-ups, and observe promising performance relative to existing baselines.
Authors:
;
Award ID(s):
1916163
Publication Date:
NSF-PAR ID:
10311702
Journal Name:
Biometrika
Volume:
108
Issue:
2
ISSN:
0006-3444
Sponsoring Org:
National Science Foundation
More Like this
  1. Multi‐view data have been routinely collected in various fields of science and engineering. A general problem is to study the predictive association between multivariate responses and multi‐view predictor sets, all of which can be of high dimensionality. It is likely that only a few views are relevant to prediction, and the predictors within each relevant view contribute to the prediction collectively rather than sparsely. We cast this new problem under the familiar multivariate regression framework and propose an integrative reduced‐rank regression (iRRR), where each view has its own low‐rank coefficient matrix. As such, latent features are extracted from each view in a supervised fashion. For model estimation, we develop a convex composite nuclear norm penalization approach, which admits an efficient algorithm via alternating direction method of multipliers. Extensions to non‐Gaussian and incomplete data are discussed. Theoretically, we derive non‐asymptotic oracle bounds of iRRR under a restricted eigenvalue condition. Our results recover oracle bounds of several special cases of iRRR including Lasso, group Lasso, and nuclear norm penalized regression. Therefore, iRRR seamlessly bridges group‐sparse and low‐rank methods and can achieve substantially faster convergence rate under realistic settings of multi‐view learning. Simulation studies and an application in the Longitudinal Studies of Agingmore »further showcase the efficacy of the proposed methods.« less
  2. SUMMARY We introduce a new finite-element (FE) based computational framework to solve forward and inverse elastic deformation problems for earthquake faulting via the adjoint method. Based on two advanced computational libraries, FEniCS and hIPPYlib for the forward and inverse problems, respectively, this framework is flexible, transparent and easily extensible. We represent a fault discontinuity through a mixed FE elasticity formulation, which approximates the stress with higher order accuracy and exposes the prescribed slip explicitly in the variational form without using conventional split node and decomposition discrete approaches. This also allows the first order optimality condition, that is the vanishing of the gradient, to be expressed in continuous form, which leads to consistent discretizations of all field variables, including the slip. We show comparisons with the standard, pure displacement formulation and a model containing an in-plane mode II crack, whose slip is prescribed via the split node technique. We demonstrate the potential of this new computational framework by performing a linear coseismic slip inversion through adjoint-based optimization methods, without requiring computation of elastic Green’s functions. Specifically, we consider a penalized least squares formulation, which in a Bayesian setting—under the assumption of Gaussian noise and prior—reflects the negative log of the posteriormore »distribution. The comparison of the inversion results with a standard, linear inverse theory approach based on Okada’s solutions shows analogous results. Preliminary uncertainties are estimated via eigenvalue analysis of the Hessian of the penalized least squares objective function. Our implementation is fully open-source and Jupyter notebooks to reproduce our results are provided. The extension to a fully Bayesian framework for detailed uncertainty quantification and non-linear inversions, including for heterogeneous media earthquake problems, will be analysed in a forthcoming paper.« less
  3. In this thesis we propose novel estimation techniques for localization and planning problems, which are key challenges in long-term autonomy. We take inspiration in our methods from non-parametric estimation and use tools such as kernel density estimation, non-linear least-squares optimization, binary masking, and random sampling. We show that these methods, by avoiding explicit parametric models, outperform existing methods that use them. Despite the seeming differences between localization and planning, we demonstrate in this thesis that the problems share core structural similarities. When real or simulation-sampled measurements are expensive, noisy, or high variance, non-parametric estimation techniques give higher-quality results in less time. We first address two localization problems. In order to permit localization with a set of ad hoc-placed radios, we propose an ultra-wideband (UWB) graph realization system to localize the radios. Our system achieves high accuracy and robustness by using kernel density estimation for measurement probability densities, by explicitly modeling antenna delays, and by optimizing this combination with a non-linear least squares formulation. Next, in order to then support robotic navigation, we present a flexible system for simultaneous localization and mapping (SLAM) that combines elements from both traditional dense metric SLAM and topological SLAM, using a binary "masking function" tomore »focus attention. This masking function controls which lidar scans are available for loop closures. We provide several masking functions based on approximate topological class detectors. We then examine planning problems in the final chapter and in the appendix. In order to plan with uncertainty around multiple dynamic agents, we describe Monte-Carlo Policy-Tree Decision Making (MCPTDM), a framework for efficiently computing policies in partially-observable, stochastic, continuous problems. MCPTDM composes a sequence of simpler closed-loop policies and uses marginal action costs and particle repetition to improve cost estimates and sample efficiency by reducing variance. Finally, in the appendix we explore Learned Similarity Monte-Carlo Planning (LSMCP), where we seek to enhance the sample efficiency of partially observable Monte Carlo tree search-based planning by taking advantage of similarities in the final outcomes of similar states and actions. We train a multilayer perceptron to learn a similarity function which we then use to enhance value estimates in the planning. Collectively, we show in this thesis that non-parametric methods promote long-term autonomy by reducing error and increasing robustness across multiple domains.« less
  4. Abstract Understanding treatment heterogeneity is essential to the development of precision medicine, which seeks to tailor medical treatments to subgroups of patients with similar characteristics. One of the challenges of achieving this goal is that we usually do not have a priori knowledge of the grouping information of patients with respect to treatment effect. To address this problem, we consider a heterogeneous regression model which allows the coefficients for treatment variables to be subject-dependent with unknown grouping information. We develop a concave fusion penalized method for estimating the grouping structure and the subgroup-specific treatment effects, and derive an alternating direction method of multipliers algorithm for its implementation. We also study the theoretical properties of the proposed method and show that under suitable conditions there exists a local minimizer that equals the oracle least squares estimator based on a priori knowledge of the true grouping information with high probability. This provides theoretical support for making statistical inference about the subgroup-specific treatment effects using the proposed method. The proposed method is illustrated in simulation studies and illustrated with real data from an AIDS Clinical Trials Group Study.
  5. There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of metaalgorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the conditional average treatment effect (CATE) function. Metaalgorithms build on base algorithms—such as random forests (RFs), Bayesian additive regression trees (BARTs), or neural networks—to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a metaalgorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than in the other and can exploit structural properties of the CATE function. For example, if the CATE function is linear and the response functions in treatment and control are Lipschitz-continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In extensive simulation studies, the X-learner performs favorably, although none of the metalearners is uniformly the best. In two persuasion field experiments from political science, we demonstrate how our X-learner can be used to target treatment regimes and to shed light onmore »underlying mechanisms. A software package is provided that implements our methods.

    « less