With advances in biomedical research, biomarkers are becoming increasingly important prognostic factors for predicting overall survival, while the measurement of biomarkers is often censored due to instruments' lower limits of detection. This leads to two types of censoring: random censoring in overall survival outcomes and fixed censoring in biomarker covariates, posing new challenges in statistical modeling and inference. Existing methods for analyzing such data focus primarily on linear regression ignoring censored responses or semiparametric accelerated failure time models with covariates under detection limits (DL). In this paper, we propose a quantile regression for survival data with covariates subject to DL. Comparing to existing methods, the proposed approach provides a more versatile tool for modeling the distribution of survival outcomes by allowing covariate effects to vary across conditional quantiles of the survival time and requiring no parametric distribution assumptions for outcome data. To estimate the quantile process of regression coefficients, we develop a novel multiple imputation approach based on another quantile regression for covariates under DL, avoiding stringent parametric restrictions on censored covariates as often assumed in the literature. Under regularity conditions, we show that the estimation procedure yields uniformly consistent and asymptotically normal estimators. Simulation results demonstrate the satisfactory finite‐sample performance of the method. We also apply our method to the motivating data from a study of genetic and inflammatory markers of Sepsis.
Analysing secondary outcomes is a common practice for case–control studies. Traditional secondary analysis employs either completely parametric models or conditional mean regression models to link the secondary outcome to covariates. In many situations, quantile regression models complement mean-based analyses and provide alternative new insights on the associations of interest. For example, biomedical outcomes are often highly asymmetric, and median regression is more useful in describing the ‘central’ behaviour than mean regressions. There are also cases where the research interest is to study the high or low quantiles of a population, as they are more likely to be at risk. We approach the secondary quantile regression problem from a semiparametric perspective, allowing the covariate distribution to be completely unspecified. We derive a class of consistent semiparametric estimators and identify the efficient member. The asymptotic properties of the resulting estimators are established. Simulation results and a real data analysis are provided to demonstrate the superior performance of our approach with a comparison with the only existing approach so far in the literature.
more » « less- PAR ID:
- 10397805
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Journal of the Royal Statistical Society Series B: Statistical Methodology
- Volume:
- 80
- Issue:
- 4
- ISSN:
- 1369-7412
- Page Range / eLocation ID:
- p. 625-648
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Summary We study the regression relationship between covariates in case–control data: an area known as the secondary analysis of case–control studies. The context is such that only the form of the regression mean is specified, so that we allow an arbitrary regression error distribution, which can depend on the covariates and thus can be heteroscedastic. Under mild regularity conditions we establish the theoretical identifiability of such models. Previous work in this context has either specified a fully parametric distribution for the regression errors, specified a homoscedastic distribution for the regression errors, has specified the rate of disease in the population (we refer to this as the true population) or has made a rare disease approximation. We construct a class of semiparametric estimation procedures that rely on none of these. The estimators differ from the usual semiparametric estimators in that they draw conclusions about the true population, while technically operating in a hypothetical superpopulation. We also construct estimators with a unique feature, in that they are robust against the misspecification of the regression error distribution in terms of variance structure, whereas all other non-parametric effects are estimated despite the biased samples. We establish the asymptotic properties of the estimators and illustrate their finite sample performance through simulation studies, as well as through an empirical example on the relationship between red meat consumption and hetero-cyclic amines. Our analysis verified the positive relationship between red meat consumption and two forms of hetro-cyclic amines, indicating that increased red meat consumption leads to increased levels of MeIQx and PhIP, both being risk factors for colorectal cancer. Computer software as well as data to illustrate the methodology are available from http://www.stat.tamu.edu/~carroll/matlab__programs/software.php .
-
Triangular systems with nonadditively separable unobserved heterogeneity provide a theoretically appealing framework for the modeling of complex structural relationships. However, they are not commonly used in practice due to the need for exogenous variables with large support for identification, the curse of dimensionality in estimation, and the lack of inferential tools. This paper introduces two classes of semiparametric nonseparable triangular models that address these limitations. They are based on distribution and quantile regression modeling of the reduced form conditional distributions of the endogenous variables. We show that average, distribution, and quantile structural functions are identified in these systems through a control function approach that does not require a large support condition. We propose a computationally attractive three‐stage procedure to estimate the structural functions where the first two stages consist of quantile or distribution regressions. We provide asymptotic theory and uniform inference methods for each stage. In particular, we derive functional central limit theorems and bootstrap functional central limit theorems for the distribution regression estimators of the structural functions. These results establish the validity of the bootstrap for three‐stage estimators of structural functions, and lead to simple inference algorithms. We illustrate the implementation and applicability of all our methods with numerical simulations and an empirical application to demand analysis.
-
Summary Primary analysis of case–control studies focuses on the relationship between disease D and a set of covariates of interest (Y, X). A secondary application of the case–control study, which is often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated owing to the case–control sampling, where the regression of Y on X is different from what it is in the population. Previous work has assumed a parametric distribution for Y given X and derived semiparametric efficient estimation and inference without any distributional assumptions about X. We take up the issue of estimation of a regression function when Y given X follows a homoscedastic regression model, but otherwise the distribution of Y is unspecified. The semiparametric efficient approaches can be used to construct semiparametric efficient estimates, but they suffer from a lack of robustness to the assumed model for Y given X. We take an entirely different approach. We show how to estimate the regression parameters consistently even if the assumed model for Y given X is incorrect, and thus the estimates are model robust. For this we make the assumption that the disease rate is known or well estimated. The assumption can be dropped when the disease is rare, which is typically so for most case–control studies, and the estimation algorithm simplifies. Simulations and empirical examples are used to illustrate the approach.
-
Abstract Quantile regression for right‐ or left‐censored outcomes has attracted attention due to its ability to accommodate heterogeneity in regression analysis of survival times. Rank‐based inferential methods have desirable properties for quantile regression analysis, but censored data poses challenges to the general concept of ranking. In this article, we propose a notion of censored quantile regression rank scores, which enables us to construct rank‐based tests for quantile regression coefficients at a single quantile or over a quantile region. A model‐based bootstrap algorithm is proposed to implement the tests. We also illustrate the advantage of focusing on a quantile region instead of a single quantile level when testing the effect of certain covariates in a quantile regression framework.