skip to main content


Title: Fitting additive risk models using auxiliary information

There has been a growing interest in incorporating auxiliary summary information from external studies into the analysis of internal individual‐level data. In this paper, we propose an adaptive estimation procedure for an additive risk model to integrate auxiliary subgroup survival information via a penalized method of moments technique. Our approach can accommodate information from heterogeneous data. Parameters to quantify the magnitude of potential incomparability between internal data and external auxiliary information are introduced in our framework while nonzero components of these parameters suggest a violation of the homogeneity assumption. We further develop an efficient computational algorithm to solve the numerical optimization problem by profiling out the nuisance parameters. In an asymptotic sense, our method can be as efficient as if all the incomparable auxiliary information is accurately acknowledged and has been automatically excluded from consideration. The asymptotic normality of the proposed estimator of the regression coefficients is established, with an explicit formula for the asymptotic variance‐covariance matrix that can be consistently estimated from the data. Simulation studies show that the proposed method yields a substantial gain in statistical efficiency over the conventional method using the internal data only, and reduces estimation biases when the given auxiliary survival information is incomparable. We illustrate the proposed method with a lung cancer survival study.

 
more » « less
Award ID(s):
2112938
NSF-PAR ID:
10396065
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Statistics in Medicine
Volume:
42
Issue:
6
ISSN:
0277-6715
Page Range / eLocation ID:
p. 894-916
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    The multispecies coalescent model is now widely accepted as an effective model for incorporating variation in the evolutionary histories of individual genes into methods for phylogenetic inference from genome-scale data. However, because model-based analysis under the coalescent can be computationally expensive for large datasets, a variety of inferential frameworks and corresponding algorithms have been proposed for estimation of species-level phylogenies and associated parameters, including speciation times and effective population sizes.

    Results

    We consider the problem of estimating the timing of speciation events along a phylogeny in a coalescent framework. We propose a maximum a posteriori estimator based on composite likelihood (MAPCL) for inferring these speciation times under a model of DNA sequence evolution for which exact site-pattern probabilities can be computed under the assumption of a constant θ throughout the species tree. We demonstrate that the MAPCL estimates are statistically consistent and asymptotically normally distributed, and we show how this result can be used to estimate their asymptotic variance. We also provide a more computationally efficient estimator of the asymptotic variance based on the non-parametric bootstrap. We evaluate the performance of our method using simulation and by application to an empirical dataset for gibbons.

    Availability and implementation

    The method has been implemented in the PAUP* program, freely available at https://paup.phylosolutions.com for Macintosh, Windows and Linux operating systems.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Abstract

    Quantiles and expected shortfalls are commonly used risk measures in financial risk management. The two measurements are correlated while having distinguished features. In this project, our primary goal is to develop a stable and practical inference method for the conditional expected shortfall. We consider the joint modelling of conditional quantile and expected shortfall to facilitate the statistical inference procedure. While the regression coefficients can be estimated jointly by minimizing a class of strictly consistent joint loss functions, the computation is challenging, especially when the dimension of parameters is large since the loss functions are neither differentiable nor convex. We propose a two‐step estimation procedure to reduce the computational effort by first estimating the quantile regression parameters with standard quantile regression. We show that the two‐step estimator has the same asymptotic properties as the joint estimator, but the former is numerically more efficient. We develop a score‐type inference method for hypothesis testing and confidence interval construction. Compared to the Wald‐type method, the score method is robust against heterogeneity and is superior in finite samples, especially for cases with many confounding factors. The advantages of our proposed method over existing approaches are demonstrated by simulations and empirical studies based on income and college education data.

     
    more » « less
  3. Summary

    The paper studies estimation of partially linear hazard regression models with varying coefficients for multivariate survival data. A profile pseudo-partial-likelihood estimation method is proposed. The estimation of the parameters of the linear part is accomplished via maximization of the profile pseudo-partial-likelihood, whereas the varying-coefficient functions are considered as nuisance parameters that are profiled out of the likelihood. It is shown that the estimators of the parameters are root n consistent and the estimators of the non-parametric coefficient functions achieve optimal convergence rates. Asymptotic normality is obtained for the estimators of the finite parameters and varying-coefficient functions. Consistent estimators of the asymptotic variances are derived and empirically tested, which facilitate inference for the model. We prove that the varying-coefficient functions can be estimated as well as if the parametric components were known and the failure times within each subject were independent. Simulations are conducted to demonstrate the performance of the estimators proposed. A real data set is analysed to illustrate the methodology proposed.

     
    more » « less
  4. Abstract

    Multi-view data have been routinely collected in various fields of science and engineering. A general problem is to study the predictive association between multivariate responses and multi-view predictor sets, all of which can be of high dimensionality. It is likely that only a few views are relevant to prediction, and the predictors within each relevant view contribute to the prediction collectively rather than sparsely. We cast this new problem under the familiar multivariate regression framework and propose an integrative reduced-rank regression (iRRR), where each view has its own low-rank coefficient matrix. As such, latent features are extracted from each view in a supervised fashion. For model estimation, we develop a convex composite nuclear norm penalization approach, which admits an efficient algorithm via alternating direction method of multipliers. Extensions to non-Gaussian and incomplete data are discussed. Theoretically, we derive non-asymptotic oracle bounds of iRRR under a restricted eigenvalue condition. Our results recover oracle bounds of several special cases of iRRR including Lasso, group Lasso, and nuclear norm penalized regression. Therefore, iRRR seamlessly bridges group-sparse and low-rank methods and can achieve substantially faster convergence rate under realistic settings of multi-view learning. Simulation studies and an application in the Longitudinal Studies of Aging further showcase the efficacy of the proposed methods.

     
    more » « less
  5. Abstract

    Atomic systems, ranging from trapped ions to ultracold and Rydberg atoms, offer unprecedented control over both internal and external degrees of freedom at the single‐particle level. They are considered among the foremost candidates for realizing quantum simulation and computation platforms that can outperform classical computers at specific tasks. In this work, a realistic experimental toolbox for quantum information processing with neutral alkaline‐earth‐like atoms in optical tweezer arrays is described. In particular, a comprehensive and scalable architecture based on a programmable array of alkaline‐earth‐like atoms is proposed, exploiting their electronic clock states as a precise and robust auxiliary degree of freedom, and thus allowing for efficient all‐optical one‐ and two‐qubit operations between nuclear spin qubits. The proposed platform promises excellent performance thanks to high‐fidelity register initialization, rapid spin‐exchange gates, and error detection in read‐out. As a benchmark and application example, the expected fidelity of an increasing number of subsequent SWAP gates for optimal parameters is computed, which can be used to distribute entanglement between remote atoms within the array.

     
    more » « less