The Cox proportional hazards model is typically used to analyze time‐to‐event data. If the event of interest is rare and covariates are difficult or expensive to collect, the nested case‐control (NCC) design provides consistent estimates at reduced costs with minimal impact on precision if the model is specified correctly. If our scientific goal is to conduct inference regarding an association of interest, it is essential that we specify the model a priori to avoid multiple testing bias. We cannot, however, be certain that all assumptions will be satisfied so it is important to consider robustness of the NCC design under model misspecification. In this manuscript, we show that in finite sample settings where the functional form of a covariate of interest is misspecified, the estimates resulting from the partial likelihood estimator under the NCC design depend on the number of controls sampled at each event time. To account for this dependency, we propose an estimator that recovers the results obtained using using the full cohort, where full covariate information is available for all study participants. We present the utility of our estimator using simulation studies and show the theoretical properties. We end by applying our estimator to motivating data from the Alzheimer's Disease Neuroimaging Initiative.
We consider a functional linear Cox regression model for characterizing the association between time-to-event data and a set of functional and scalar predictors. The functional linear Cox regression model incorporates a functional principal component analysis for modeling the functional predictors and a high-dimensional Cox regression model to characterize the joint effects of both functional and scalar predictors on the time-to-event data. We develop an algorithm to calculate the maximum approximate partial likelihood estimates of unknown finite and infinite dimensional parameters. We also systematically investigate the rate of convergence of the maximum approximate partial likelihood estimates and a score test statistic for testing the nullity of the slope function associated with the functional predictors. We demonstrate our estimation and testing procedures by using simulations and the analysis of the Alzheimer's Disease Neuroimaging Initiative (ADNI) data. Our real data analyses show that high-dimensional hippocampus surface data may be an important marker for predicting time to conversion to Alzheimer's disease. Data used in the preparation of this article were obtained from the ADNI database (adni.loni.usc.edu).
more » « less- NSF-PAR ID:
- 10485958
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Biometrics
- Volume:
- 74
- Issue:
- 1
- ISSN:
- 0006-341X
- Format(s):
- Medium: X Size: p. 109-117
- Size(s):
- p. 109-117
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Summary Motivated by searching for associations between genetic variants and brain imaging phenotypes, the aim of this article is to develop a groupwise envelope model for multivariate linear regression in order to establish the association between both multivariate responses and covariates. The groupwise envelope model allows for both distinct regression coefficients and distinct error structures for different groups. Statistically, the proposed envelope model can dramatically improve efficiency of tests and of estimation. Theoretical properties of the proposed model are established. Numerical experiments as well as the analysis of an imaging genetic data set obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study show the effectiveness of the model in efficient estimation. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.
-
Abstract We propose a constrained maximum partial likelihood estimator for dimension reduction in integrative (e.g., pan-cancer) survival analysis with high-dimensional predictors. We assume that for each population in the study, the hazard function follows a distinct Cox proportional hazards model. To borrow information across populations, we assume that each of the hazard functions depend only on a small number of linear combinations of the predictors (i.e., “factors”). We estimate these linear combinations using an algorithm based on “distance-to-set” penalties. This allows us to impose both low-rankness and sparsity on the regression coefficient matrix estimator. We derive asymptotic results that reveal that our estimator is more efficient than fitting a separate proportional hazards model for each population. Numerical experiments suggest that our method outperforms competitors under various data generating models. We use our method to perform a pan-cancer survival analysis relating protein expression to survival across 18 distinct cancer types. Our approach identifies six linear combinations, depending on only 20 proteins, which explain survival across the cancer types. Finally, to validate our fitted model, we show that our estimated factors can lead to better prediction than competitors on four external datasets.
-
Abstract Many biomedical studies have identified important imaging biomarkers that are associated with both repeated clinical measures and a survival outcome. The functional joint model (FJM) framework, proposed by Li and Luo in 2017, investigates the association between repeated clinical measures and survival data, while adjusting for both high‐dimensional images and low‐dimensional covariates based on the functional principal component analysis (FPCA). In this paper, we propose a novel algorithm for the estimation of FJM based on the functional partial least squares (FPLS). Our numerical studies demonstrate that, compared to FPCA, the proposed FPLS algorithm can yield more accurate and robust estimation and prediction performance in many important scenarios. We apply the proposed FPLS algorithm to a neuroimaging study. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.
-
Motivated by mobile devices that record data at a high frequency, we propose a new methodological framework for analyzing a semi-parametric regression model that allow us to study a nonlinear relationship between a scalar response and multiple functional predictors in the presence of scalar covariates. Utilizing functional principal component analysis (FPCA) and the least-squares kernel machine method (LSKM), we are able to substantially extend the framework of semi-parametric regression models of scalar responses on scalar predictors by allowing multiple functional predictors to enter the nonlinear model. Regularization is established for feature selection in the setting of reproducing kernel Hilbert spaces. Our method performs simultaneously model fitting and variable selection on functional features. For the implementation, we propose an effective algorithm to solve related optimization problems in that iterations take place between both linear mixed-effects models and a variable selection method (e.g., sparse group lasso). We show algorithmic convergence results and theoretical guarantees for the proposed methodology. We illustrate its performance through simulation experiments and an analysis of accelerometer data.more » « less