skip to main content

Title: Resampling‐based confidence intervals for model‐free robust inference on optimal treatment regimes

We propose a new procedure for inference on optimal treatment regimes in the model‐free setting, which does not require to specify an outcome regression model. Existing model‐free estimators for optimal treatment regimes are usually not suitable for the purpose of inference, because they either have nonstandard asymptotic distributions or do not necessarily guarantee consistent estimation of the parameter indexing the Bayes rule due to the use of surrogate loss. We first study a smoothed robust estimator that directly targets the parameter corresponding to the Bayes decision rule for optimal treatment regimes estimation. This estimator is shown to have an asymptotic normal distribution. Furthermore, we verify that a resampling procedure provides asymptotically accurate inference for both the parameter indexing the optimal treatment regime and the optimal value function. A new algorithm is developed to calculate the proposed estimator with substantially improved speed and stability. Numerical results demonstrate the satisfactory performance of the new methods.

more » « less
Award ID(s):
Author(s) / Creator(s):
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Medium: X Size: p. 465-476
["p. 465-476"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    A salient feature of data from clinical trials and medical studies is inhomogeneity. Patients not only differ in baseline characteristics, but also in the way that they respond to treatment. Optimal individualized treatment regimes are developed to select effective treatments based on patient's heterogeneity. However, the optimal treatment regime might also vary for patients across different subgroups. We mainly consider patients’ heterogeneity caused by groupwise individualized treatment effects assuming the same marginal treatment effects for all groups. We propose a new maximin projection learning method for estimating a single treatment decision rule that works reliably for a group of future patients from a possibly new subpopulation. Based on estimated optimal treatment regimes for all subgroups, the proposed maximin treatment regime is obtained by solving a quadratically constrained linear programming problem, which can be efficiently computed by interior point methods. Consistency and asymptotic normality of the estimator are established. Numerical examples show the reliability of the methodology proposed.

    more » « less
  2. Abstract

    Censored survival data are common in clinical trial studies. We propose a unified framework for sensitivity analysis to censoring at random in survival data using multiple imputation and martingale, called SMIM. The proposed framework adopts the δ‐adjusted and control‐based models, indexed by the sensitivity parameter, entailing censoring at random and a wide collection of censoring not at random assumptions. Also, it targets a broad class of treatment effect estimands defined as functionals of treatment‐specific survival functions, taking into account missing data due to censoring. Multiple imputation facilitates the use of simple full‐sample estimation; however, the standard Rubin's combining rule may overestimate the variance for inference in the sensitivity analysis framework. We decompose the multiple imputation estimator into a martingale series based on the sequential construction of the estimator and propose the wild bootstrap inference by resampling the martingale series. The new bootstrap inference has a theoretical guarantee for consistency and is computationally efficient compared to the nonparametric bootstrap counterpart. We evaluate the finite‐sample performance of the proposed SMIM through simulation and an application on an HIV clinical trial.

    more » « less
  3. In this article, we tackle the estimation and inference problem of analyzing distributed streaming data that is collected continuously over multiple data sites. We propose an online two‐way approach via linear mixed‐effects models. We explicitly model the site‐specific effects as random‐effect terms, and tackle both between‐site heterogeneity and within‐site correlation. We develop an online updating procedure that does not need to re‐access the previous data and can efficiently update the parameter estimate, when either new data sites, or new streams of sample observations of the existing data sites, become available. We derive the non‐asymptotic error bound for our proposed online estimator, and show that it is asymptotically equivalent to the offline counterpart based on all the raw data. We compare with some key alternative solutions both analytically and numerically, and demonstrate the advantages of our proposal. We further illustrate our method with two data applications.

    more » « less
  4. Abstract

    To tackle massive data, subsampling is a practical approach to select the more informative data points. However, when responses are expensive to measure, developing efficient subsampling schemes is challenging, and an optimal sampling approach under measurement constraints was developed to meet this challenge. This method uses the inverses of optimal sampling probabilities to reweight the objective function, which assigns smaller weights to the more important data points. Thus, the estimation efficiency of the resulting estimator can be improved. In this paper, we propose an unweighted estimating procedure based on optimal subsamples to obtain a more efficient estimator. We obtain the unconditional asymptotic distribution of the estimator via martingale techniques without conditioning on the pilot estimate, which has been less investigated in the existing subsampling literature. Both asymptotic results and numerical results show that the unweighted estimator is more efficient in parameter estimation.

    more » « less
  5. Abstract

    We address the problem of adaptive minimax density estimation on $\mathbb{R}^{d}$ with $L_{p}$ loss functions under Huber’s contamination model. To investigate the contamination effect on the optimal estimation of the density, we first establish the minimax rate with the assumption that the density is in an anisotropic Nikol’skii class. We then develop a data-driven bandwidth selection procedure for kernel estimators, which can be viewed as a robust generalization of the Goldenshluger-Lepski method. We show that the proposed bandwidth selection rule can lead to the estimator being minimax adaptive to either the smoothness parameter or the contamination proportion. When both of them are unknown, we prove that finding any minimax-rate adaptive method is impossible. Extensions to smooth contamination cases are also discussed.

    more » « less