Panel count data arise when the number of recurrent events experienced by each subject is observed intermittently at discrete examination times. The examination time process can be informative about the underlying recurrent event process even after conditioning on covariates. We consider a semiparametric accelerated mean model for the recurrent event process and allow the two processes to be correlated through a shared frailty. The regression parameters have a simple marginal interpretation of modifying the time scale of the cumulative mean function of the event process. A novel estimation procedure for the regression parameters and the baseline rate function is proposed based on a conditioning technique. In contrast to existing methods, the proposed method is robust in the sense that it requires neither the strong Poisson-type assumption for the underlying recurrent event process nor a parametric assumption on the distribution of the unobserved frailty. Moreover, the distribution of the examination time process is left unspecified, allowing for arbitrary dependence between the two processes. Asymptotic consistency of the estimator is established, and the variance of the estimator is estimated by a model-based smoothed bootstrap procedure. Numerical studies demonstrated that the proposed point estimator and variance estimator perform well with practical sample sizes. The methods are applied to data from a skin cancer chemoprevention trial.
- Award ID(s):
- 2100729
- NSF-PAR ID:
- 10345673
- Date Published:
- Journal Name:
- Journal of machine learning research
- Volume:
- 22
- ISSN:
- 1532-4435
- Page Range / eLocation ID:
- 1-62
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Summary -
Despite numerous years of research into the merits and trade-offs of various model selection criteria, obtaining robust results that elucidate the behavior of cross-validation remains a challenging endeavor. In this paper, we highlight the inherent limitations of cross-validation when employed to discern the structure of a Gaussian graphical model. We provide finite-sample bounds on the probability that the Lasso estimator for the neighborhood of a node within a Gaussian graphical model, optimized using a prediction oracle, misidentifies the neighborhood. Our results pertain to both undirected and directed acyclic graphs, encompassing general, sparse covariance structures. To support our theoretical findings, we conduct an empirical investigation of this inconsistency by contrasting our outcomes with other commonly used information criteria through an extensive simulation study. Given that many algorithms designed to learn the structure of graphical models require hyperparameter selection, the precise calibration of this hyperparameter is paramount for accurately estimating the inherent structure. Consequently, our observations shed light on this widely recognized practical challenge.more » « less
-
Summary Structural learning of Gaussian graphical models in the presence of latent variables has long been a challenging problem. Chandrasekaran et al. (2012) proposed a convex program for estimating a sparse graph plus a low-rank term that adjusts for latent variables; however, this approach poses challenges from both computational and statistical perspectives. We propose an alternative, simple solution: apply a hard-thresholding operator to existing graph selection methods. Conceptually simple and computationally attractive, the approach of thresholding the graphical lasso is shown to be graph selection consistent in the presence of latent variables under a simpler minimum edge strength condition and at an improved statistical rate. The results are extended to estimators for thresholded neighbourhood selection and constrained $\ell_{1}$-minimization for inverse matrix estimation as well. We show that our simple thresholded graph estimators yield stronger empirical results than existing methods for the latent variable graphical model problem, and we apply them to a neuroscience case study on estimating functional neural connections.
-
This paper considers the latent Gaussian graphical model, which extends the Gaussian graphical model to handle discrete data as well as mixed data with both continuous and discrete variables by assuming that discrete variables are generated by discretizing latent Gaussian variables. We propose a modified expectationâmaximization (EM) algorithm to estimate parameters in the latent Gaussian model for binary data. We also extend the proposed modified EM algorithm to the latent Gaussian model for mixed data. The conditional dependence structure can be consequently constructed by exploring the sparsity pattern of the precision matrix of the latent variables. We illustrate the performance of our proposed estimator through comprehensive numerical studies and an application to voting data of the United Nations General Assembly.
-
Summary A within-cluster resampling method is proposed for fitting a multilevel model in the presence of informative cluster size. Our method is based on the idea of removing the information in the cluster sizes by drawing bootstrap samples which contain a fixed number of observations from each cluster. We then estimate the parameters by maximizing an average, over the bootstrap samples, of a suitable composite loglikelihood. The consistency of the proposed estimator is shown and does not require that the correct model for cluster size is specified. We give an estimator of the covariance matrix of the proposed estimator, and a test for the noninformativeness of the cluster sizes. A simulation study shows, as in Neuhaus & McCulloch (2011), that the standard maximum likelihood estimator exhibits little bias for some regression coefficients. However, for those parameters which exhibit nonnegligible bias, the proposed method is successful in correcting for this bias.