Quantile regression has become a widely used tool for analysing competing risk data. However, quantile regression for competing risk data with a continuous mark is still scarce. The mark variable is an extension of cause of failure in a classical competing risk model where cause of failure is replaced by a continuous mark only observed at uncensored failure times. An example of the continuous mark variable is the genetic distance that measures dissimilarity between the infecting virus and the virus contained in the vaccine construct. In this article, we propose a novel mark-specific quantile regression model. The proposed estimation method borrows strength from data in a neighbourhood of a mark and is based on an induced smoothed estimation equation, which is very different from the existing methods for competing risk data with discrete causes. The asymptotic properties of the resulting estimators are established across mark and quantile continuums. In addition, a mark-specific quantile-type vaccine efficacy is proposed and its statistical inference procedures are developed. Simulation studies are conducted to evaluate the finite sample performances of the proposed estimation and hypothesis testing procedures. An application to the first HIV vaccine efficacy trial is provided.
more »
« less
A hybrid approach for the stratified mark‐specific proportional hazards model with missing covariates and missing marks, with application to vaccine efficacy trials
Deployment of the recently licensed tetravalent dengue vaccine based on a chimeric yellow fever virus, CYD-TDV, requires understanding of how the risk of dengue disease in vaccine recipients depends jointly on a host biomarker measured after vaccination (neutralization titre—neutralizing antibodies) and on a ‘mark’ feature of the dengue disease failure event (the amino acid sequence distance of the dengue virus to the dengue sequence represented in the vaccine). The CYD14 phase 3 trial of CYD-TDV measured neutralizing antibodies via case– cohort sampling and the mark in dengue disease failure events, with about a third missing marks.We addressed the question of interest by developing inferential procedures for the stratified mark-specific proportional hazards model with missing covariates and missing marks.Two hybrid approaches are investigated that leverage both augmented inverse probability weighting and nearest neighbourhood hot deck multiple imputation. The two approaches differ in how the imputed marks are pooled in estimation. Our investigation shows that nearest neighbourhood hot deck imputation can lead to biased estimation without properly selected neighbourhoods. Simulations show that the hybrid methods developed perform well with unbiased nearest neighbourhood hot deck imputations from proper neighbourhood selection.The new methods applied to CYD14 show that neutralizing antibody level is strongly inversely associated with the risk of dengue disease in vaccine recipients, more strongly against dengue viruses with shorter distances.
more »
« less
- PAR ID:
- 10169493
- Date Published:
- Journal Name:
- Journal of the Royal Statistical Society: Series C (Applied Statistics)
- ISSN:
- 0035-9254
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In the CYD14 trial of the CYD-TDV dengue vaccine in 2–14 year-olds, neutralizing antibody (nAb) titers to the vaccine-insert dengue strains correlated inversely with symptomatic, virologically-confirmed dengue (VCD). Also, vaccine efficacy against VCD was higher against dengue prM/E amino acid sequences closer to the vaccine inserts. We integrated the nAb and sequence data types by assessing nAb titers as a correlate of sequence-specific VCD separately in the vaccine arm and in the placebo arm. In both vaccine and placebo recipients the correlation of nAb titer with sequence-specific VCD was stronger for dengue nAb contact site sequences closer to the vaccine (p = 0.005 and p = 0.012, respectively). The risk of VCD in vaccine (placebo) recipients was 6.7- (1.80)-fold lower at the 90th vs 10th percentile of nAb for viruses perfectly matched to CYD-TDV, compared to 2.1- (0.78)-fold lower at the 90th vs 10th percentile for viruses with five amino acid mismatches. The evidence for a stronger sequence-distance dependent correlate of risk for the vaccine arm indicates departure from the Prentice criteria for a valid sequence-distance specific surrogate endpoint and suggests that the nAb marker may affect dengue risk differently depending on whether nAbs arise from infection or also by vaccination. However, when restricting to baselineseropositive 9–14 year-olds, the correlation pattern became more similar between the vaccine and placebo arms, supporting nAb titers as an approximate surrogate endpoint in this population. No sequencespecific nAb titer correlates of VCD were seen in baseline-seronegative participants. Integrated immune response/pathogen sequence data correlates analyses could help increase knowledge of correlates of risk and surrogate endpoints for other vaccines against genetically diverse pathogens.more » « less
-
Abstract Multiple imputation (MI) is a popular and well-established method for handling missing data in multivariate data sets, but its practicality for use in massive and complex data sets has been questioned. One such data set is the Panel Study of Income Dynamics (PSID), a longstanding and extensive survey of household income and wealth in the United States. Missing data for this survey are currently handled using traditional hot deck methods because of the simple implementation; however, the univariate hot deck results in large random wealth fluctuations. MI is effective but faced with operational challenges. We use a sequential regression/chained-equation approach, using the software IVEware, to multiply impute cross-sectional wealth data in the 2013 PSID, and compare analyses of the resulting imputed data with those from the current hot deck approach. Practical difficulties, such as non-normally distributed variables, skip patterns, categorical variables with many levels, and multicollinearity, are described together with our approaches to overcoming them. We evaluate the imputation quality and validity with internal diagnostics and external benchmarking data. MI produces improvements over the existing hot deck approach by helping preserve correlation structures, such as the associations between PSID wealth components and the relationships between the household net worth and sociodemographic factors, and facilitates completed data analyses with general purposes. MI incorporates highly predictive covariates into imputation models and increases efficiency. We recommend the practical implementation of MI and expect greater gains when the fraction of missing information is large.more » « less
-
The emergence of new virus variants, including the Omicron variant (B.1.1.529) of SARS-CoV-2, can lead to reduced vaccine effectiveness (VE) and the need for new vaccines or vaccine doses if the extent of immune evasion is severe. Neutralizing antibody titers have been shown to be a correlate of protection for SARS-CoV-2 and other pathogens, and could be used to quickly estimate vaccine effectiveness for new variants. However, no model currently exists to provide precise VE estimates for a new variant against severe disease for SARS-CoV-2 using robust datasets from several populations. We developed predictive models for VE against COVID-19 symptomatic disease and hospitalization across a 54-fold range of mean neutralizing antibody titers. For two mRNA vaccines (mRNA-1273, BNT162b2), models fit without Omicron data predicted that infection with the BA.1 Omicron variant increased the risk of hospitalization 2.8–4.4-fold and increased the risk of symptomatic disease 1.7–4.2-fold compared to the Delta variant. Out-of-sample validation showed that model predictions were accurate; all predictions were within 10% of observed VE estimates and fell within the model prediction intervals. Predictive models using neutralizing antibody titers can provide rapid VE estimates, which can inform vaccine booster timing, vaccine design, and vaccine selection for new virus variants.more » « less
-
Machine learning (ML) advancements hinge upon data - the vital ingredient for training. Statistically-curing the missing data is called imputation, and there are many imputation theories and tools. Butthey often require difficult statistical and/or discipline-specific assumptions, lacking general tools capable of curing large data. Fractional hot deck imputation (FHDI) can cure data by filling nonresponses with observed values (thus, hot-deck) without resorting to assumptions. The review paper summarizes how FHDI evolves to ultra dataoriented parallel version (UP-FHDI).Here, ultra data have concurrently large instances (bign) and high dimensionality (big-p). The evolution is made possible with specialized parallelism and fast variance estimation technique. Validations with scientific and engineering data confirm that UP-FHDI can cure ultra data(p >10,000& n > 1M), and the cured data sets can improve the prediction accuracy of subsequent ML. The evolved FHDI will help promote reliable ML with cured big data.more » « less
An official website of the United States government

