In this paper, the panel count data analysis for recurrent events is considered. Such analysis is useful for studying tumor or infection recurrences in both clinical trial and observational studies. A bivariate Gaussian Cox process model is proposed to jointly model the observation process and the recurrent event process. Bayesian nonparametric inference is proposed for simultaneously estimating regression parameters, bivariate frailty effects, and baseline intensity functions. Inference is done through Markov chain Monte Carlo, with fully developed computational techniques. Predictive inference is also discussed under the Bayesian setting. The proposed method is shown to be efficient via simulation studies. A clinical trial dataset on skin cancer patients is analyzed to illustrate the proposed approach.
- Award ID(s):
- 1904165
- NSF-PAR ID:
- 10250789
- Date Published:
- Journal Name:
- Journal of Statistical Computation and Simulation
- ISSN:
- 0094-9655
- Page Range / eLocation ID:
- 1 to 19
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Summary Panel count data arise when the number of recurrent events experienced by each subject is observed intermittently at discrete examination times. The examination time process can be informative about the underlying recurrent event process even after conditioning on covariates. We consider a semiparametric accelerated mean model for the recurrent event process and allow the two processes to be correlated through a shared frailty. The regression parameters have a simple marginal interpretation of modifying the time scale of the cumulative mean function of the event process. A novel estimation procedure for the regression parameters and the baseline rate function is proposed based on a conditioning technique. In contrast to existing methods, the proposed method is robust in the sense that it requires neither the strong Poisson-type assumption for the underlying recurrent event process nor a parametric assumption on the distribution of the unobserved frailty. Moreover, the distribution of the examination time process is left unspecified, allowing for arbitrary dependence between the two processes. Asymptotic consistency of the estimator is established, and the variance of the estimator is estimated by a model-based smoothed bootstrap procedure. Numerical studies demonstrated that the proposed point estimator and variance estimator perform well with practical sample sizes. The methods are applied to data from a skin cancer chemoprevention trial.
-
This paper investigates the semiparametric statistical methods for recurrent events. The mean number of the recurrent events are modeled with the generalized semiparametric varying‐coefficient model that can flexibly model three types of covariate effects: time‐constant effects, time‐varying effects, and covariate‐varying effects. We assume that the time‐varying effects are unspecified functions of time and the covariate‐varying effects are parametric functions of an exposure variable specified up to a finite number of unknown parameters. Different link functions can be selected to provide a rich family of models for recurrent events data. The profile estimation methods are developed for the parametric and nonparametric components. The asymptotic properties are established. We also develop some hypothesis testing procedures to test validity of the parametric forms of covariate‐varying effects. The simulation study shows that both estimation and hypothesis testing procedures perform well. The proposed method is applied to analyze a data set from an acyclovir study and investigate whether acyclovir treatment reduces the mean relapse recurrences.
-
Abstract Motivation Detecting cancer gene expression and transcriptome changes with mRNA-sequencing (RNA-Seq) or array-based data are important for understanding the molecular mechanisms underlying carcinogenesis and cellular events during cancer progression. In previous studies, the differentially expressed genes were detected across patients in one cancer type. These studies ignored the role of mRNA expression changes in driving tumorigenic mechanisms that are either universal or specific in different tumor types. To address the problem, we introduce two network-based multi-task learning frameworks, NetML and NetSML, to discover common differentially expressed genes shared across different cancer types as well as differentially expressed genes specific to each cancer type. The proposed frameworks consider the common latent gene co-expression modules and gene-sample biclusters underlying the multiple cancer datasets to learn the knowledge crossing different tumor types. Results Large-scale experiments on simulations and real cancer high-throughput datasets validate that the proposed network-based multi-task learning frameworks perform better sample classification compared with the models without the knowledge sharing across different cancer types. The common and cancer specific molecular signatures detected by multi-task learning frameworks on TCGA ovarian cancer, breast cancer, and prostate cancer datasets are correlated with the known marker genes and enriched in cancer relevant KEGG pathways and Gene Ontology terms. Availability and Implementation Source code is available at: https://github.com/compbiolabucf/NetML Supplementary information Supplementary data are available at Bioinformaticsmore » « less
-
Continuous-time event data are common in applications such as individual behavior data, financial transactions, and medical health records. Modeling such data can be very challenging, in particular for applications with many different types of events, since it requires a model to predict the event types as well as the time of occurrence. Recurrent neural networks that parameterize time-varying intensity functions are the current state-of-the-art for predictive modeling with such data. These models typically assume that all event sequences come from the same data distribution. However, in many applications event sequences are generated by different sources, or users, and their characteristics can be very different. In this paper, we extend the broad class of neural marked point process models to mixtures of latent embeddings, where each mixture component models the characteristic traits of a given user. Our approach relies on augmenting these models with a latent variable that encodes user characteristics, represented by a mixture model over user behavior that is trained via amortized variational inference. We evaluate our methods on four large real-world datasets and demonstrate systematic improvements from our approach over existing work for a variety of predictive metrics such as log-likelihood, next event ranking, and source-of-sequence identification.more » « less