skip to main content

Title: Robust best linear estimator for Cox regression with instrumental variables in whole cohort and surrogates with additive measurement error in calibration sample

Biomedical researchers are often interested in estimating the effect of an environmental exposure in relation to a chronic disease endpoint. However, the exposure variable of interest may be measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies an additive measurement error model, but it may not have repeated measurements. The subset in which the surrogate variables are available is called acalibration sample. In addition to the surrogate variables that are available among the subjects in the calibration sample, we consider the situation when there is an instrumental variable available for all study subjects. An instrumental variable is correlated with the unobserved true exposure variable, and hence can be useful in the estimation of the regression coefficients. In this paper, we propose a nonparametric method for Cox regression using the observed data from the whole cohort. The nonparametric estimator is the best linear combination of a nonparametric correction estimator from the calibration sample and the difference of the naive estimators from the calibration sample and the whole cohort. The asymptotic distribution is derived, and the finite sample performance of the proposed estimator is examined via intensive simulation studies. The methods are applied to the Nutritional Biomarkers Study of the Women's Health Initiative.

more » « less
Author(s) / Creator(s):
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Biometrical Journal
Medium: X Size: p. 1465-1484
p. 1465-1484
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    We consider logistic regression with covariate measurement error. Most existing approaches require certain replicates of the error‐contaminated covariates, which may not be available in the data. We propose generalized method of moments (GMM) nonparametric correction approaches that use instrumental variables observed in a calibration subsample. The instrumental variable is related to the underlying true covariates through a general nonparametric model, and the probability of being in the calibration subsample may depend on the observed variables. We first take a simple approach adopting the inverse selection probability weighting technique using the calibration subsample. We then improve the approach based on the GMM using the whole sample. The asymptotic properties are derived, and the finite sample performance is evaluated through simulation studies and an application to a real data set.

    more » « less
  2. Abstract

    Observational epidemiological studies often confront the problem of estimating exposure‐disease relationships when the exposure is not measured exactly. Regression calibration (RC) is a common approach to correct for bias in regression analysis with covariate measurement error. In survival analysis with covariate measurement error, it is well known that the RC estimator may be biased when the hazard is an exponential function of the covariates. In the paper, we investigate the RC estimator with general hazard functions, including exponential and linear functions of the covariates. When the hazard is a linear function of the covariates, we show that a risk set regression calibration (RRC) is consistent and robust to a working model for the calibration function. Under exponential hazard models, there is a trade‐off between bias and efficiency when comparing RC and RRC. However, one surprising finding is that the trade‐off between bias and efficiency in measurement error research is not seen under linear hazard when the unobserved covariate is from a uniform or normal distribution. Under this situation, the RRC estimator is in general slightly better than the RC estimator in terms of both bias and efficiency. The methods are applied to the Nutritional Biomarkers Study of the Women's Health Initiative.

    more » « less
  3. Summary

    The statistical challenges in using big data for making valid statistical inference in the finite population have been well documented in literature. These challenges are due primarily to statistical bias arising from under‐coverage in the big data source to represent the population of interest and measurement errors in the variables available in the data set. By stratifying the population into a big data stratum and a missing data stratum, we can estimate the missing data stratum by using a fully responding probability sample and hence the population as a whole by using a data integration estimator. By expressing the data integration estimator as a regression estimator, we can handle measurement errors in the variables in big data and also in the probability sample. We also propose a fully nonparametric classification method for identifying the overlapping units and develop a bias‐corrected data integration estimator under misclassification errors. Finally, we develop a two‐step regression data integration estimator to deal with measurement errors in the probability sample. An advantage of the approach advocated in this paper is that we do not have to make unrealistic missing‐at‐random assumptions for the methods to work. The proposed method is applied to the real data example using 2015–2016 Australian Agricultural Census data.

    more » « less
  4. Conventional advice discourages controlling for postoutcome variables in regression analysis. By contrast, we show that controlling for commonly available postoutcome (i.e., future) values of the treatment variable can help detect, reduce, and even remove omitted variable bias (unobserved confounding). The premise is that the same unobserved confounder that affects treatment also affects the future value of the treatment. Future treatments thus proxy for the unmeasured confounder, and researchers can exploit these proxy measures productively. We establish several new results: Regarding a commonly assumed data-generating process involving future treatments, we (1) introduce a simple new approach and show that it strictly reduces bias, (2) elaborate on existing approaches and show that they can increase bias, (3) assess the relative merits of alternative approaches, and (4) analyze true state dependence and selection as key challenges. (5) Importantly, we also introduce a new nonparametric test that uses future treatments to detect hidden bias even when future-treatment estimation fails to reduce bias. We illustrate these results empirically with an analysis of the effect of parental income on children’s educational attainment. 
    more » « less
  5. Background

    Properly adjusting for unmeasured confounders is critical for health studies in order to achieve valid testing and estimation of the exposure’s causal effect on outcomes. The instrumental variable (IV) method has long been used in econometrics to estimate causal effects while accommodating the effect of unmeasured confounders. Mendelian randomization (MR), which uses genetic variants as the instrumental variables, is an application of the instrumental variable method to biomedical research fields, and has become popular in recent years. One often‐used estimator of causal effects for instrumental variables and Mendelian randomization is the two‐stage least square estimator (TSLS). The validity of TSLS relies on the accurate prediction of exposure based on IVs in its first stage.


    In this note, we propose to model the link between exposure and genetic IVs using the least‐squares kernel machine (LSKM). Some simulation studies are used to evaluate the feasibility of LSKM in TSLS setting.


    Our results show that LSKM based on genotype score or genotype can be used effectively in TSLS. It may provide higher power when the association between exposure and genetic IVs is nonlinear.

    more » « less