skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Statistical inference for time‐to‐event data in non‐randomized cohorts with selective attrition
In multi‐season clinical trials with a randomize‐once strategy, patients enrolled from previous seasons who stay alive and remain in the study will be treated according to the initial randomization in subsequent seasons. To address the potentially selective attrition from earlier seasons for the non‐randomized cohorts, we develop an inverse probability of treatment weighting method using season‐specific propensity scores to produce unbiased estimates of survival functions or hazard ratios. Bootstrap variance estimators are used to account for the randomness in the estimated weights and the potential correlations in repeated events within each patient from season to season. Simulation studies show that the weighting procedure and bootstrap variance estimator provide unbiased estimates and valid inferences in Kaplan‐Meier estimates and Cox proportional hazard models. Finally, data from the INVESTED trial are analyzed to illustrate the proposed method.  more » « less
Award ID(s):
2015526
PAR ID:
10577213
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Wiley
Date Published:
Journal Name:
Statistics in Medicine
Volume:
43
Issue:
2
ISSN:
0277-6715
Page Range / eLocation ID:
216 to 232
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This study investigates appropriate estimation of estimator variability in the context of causal mediation analysis that employs propensity score‐based weighting. Such an analysis decomposes the total effect of a treatment on the outcome into an indirect effect transmitted through a focal mediator and a direct effect bypassing the mediator. Ratio‐of‐mediator‐probability weighting estimates these causal effects by adjusting for the confounding impact of a large number of pretreatment covariates through propensity score‐based weighting. In step 1, a propensity score model is estimated. In step 2, the causal effects of interest are estimated using weights derived from the prior step's regression coefficient estimates. Statistical inferences obtained from this 2‐step estimation procedure are potentially problematic if the estimated standard errors of the causal effect estimates do not reflect the sampling uncertainty in the estimation of the weights. This study extends to ratio‐of‐mediator‐probability weighting analysis a solution to the 2‐step estimation problem by stacking the score functions from both steps. We derive the asymptotic variance‐covariance matrix for the indirect effect and direct effect 2‐step estimators, provide simulation results, and illustrate with an application study. Our simulation results indicate that the sampling uncertainty in the estimated weights should not be ignored. The standard error estimation using the stacking procedure offers a viable alternative to bootstrap standard error estimation. We discuss broad implications of this approach for causal analysis involving propensity score‐based weighting. 
    more » « less
  2. Abstract Obtaining accurate estimates of machine learning model uncertainties on newly predicted data is essential for understanding the accuracy of the model and whether its predictions can be trusted. A common approach to such uncertainty quantification is to estimate the variance from an ensemble of models, which are often generated by the generally applicable bootstrap method. In this work, we demonstrate that the direct bootstrap ensemble standard deviation is not an accurate estimate of uncertainty but that it can be simply calibrated to dramatically improve its accuracy. We demonstrate the effectiveness of this calibration method for both synthetic data and numerous physical datasets from the field of Materials Science and Engineering. The approach is motivated by applications in physical and biological science but is quite general and should be applicable for uncertainty quantification in a wide range of machine learning regression models. 
    more » « less
  3. Abstract The ideal spectral averaging method depends on one’s science goals and the available information about one’s data. Including low-quality data in the average can decrease the signal-to-noise ratio (S/N), which may necessitate an optimization method or a consideration of different weighting schemes. Here, we explore a variety of spectral averaging methods. We investigate the use of three weighting schemes during averaging: weighting by the signal divided by the variance (“intensity-noise weighting”), weighting by the inverse of the variance (“noise weighting”), and uniform weighting. Whereas for intensity-noise weighting the S/N is maximized when all spectra are averaged, for noise and uniform weighting we find that averaging the 35%–45% of spectra with the highest S/N results in the highest S/N average spectrum. With this intensity cutoff, the average spectrum with noise or uniform weighting has ∼95% of the intensity of the spectrum created from intensity-noise weighting. We apply our spectral averaging methods to GBT Diffuse Ionized Gas hydrogen radio recombination line data to determine the ionic abundance ratio,y+, and discuss future applications of the methodology. 
    more » « less
  4. Abstract Linear mixed models are widely used for analyzing longitudinal datasets, and the inference for variance component parameters relies on the bootstrap method. However, health systems and technology companies routinely generate massive longitudinal datasets that make the traditional bootstrap method infeasible. To solve this problem, we extend the highly scalable bag of little bootstraps method for independent data to longitudinal data and develop a highly efficient Julia packageMixedModelsBLB.jl.Simulation experiments and real data analysis demonstrate the favorable statistical performance and computational advantages of our method compared to the traditional bootstrap method. For the statistical inference of variance components, it achieves 200 times speedup on the scale of 1 million subjects (20 million total observations), and is the only currently available tool that can handle more than 10 million subjects (200 million total observations) using desktop computers. 
    more » « less
  5. In adaptive importance sampling and other contexts, we haveK> 1 unbiased and uncorrelated estimates μ^kof a common quantity μ. The optimal unbiased linear combination weights them inversely to their variances, but those weights are unknown and hard to estimate. A simple deterministic square root rule based on a working model that Var(μ^k) ∝k−1/2gives an unbiased estimate of μ that is nearly optimal under a wide range of alternative variance patterns. We show that if Var(μ^k)∝k−yfor an unknown rate parametery∈[0,1], then the square root rule yields the optimal variance rate with a constant that is too large by at most 9/8 for any 0 ⩽y⩽ 1 and any numberKof estimates. Numerical work shows that rule is similarly robust to some other patterns with mildly decreasing variance askincreases. 
    more » « less