skip to main content


Title: An Information Ratio-Based Goodness-of-Fit Test for Copula Models on Censored Data
Abstract

Copula is a popular method for modeling the dependence among marginal distributions in multivariate censored data. As many copula models are available, it is essential to check if the chosen copula model fits the data well for analysis. Existing approaches to testing the fitness of copula models are mainly for complete or right-censored data. No formal goodness-of-fit (GOF) test exists for interval-censored or recurrent events data. We develop a general GOF test for copula-based survival models using the information ratio (IR) to address this research gap. It can be applied to any copula family with a parametric form, such as the frequently used Archimedean, Gaussian, and D-vine families. The test statistic is easy to calculate, and the test procedure is straightforward to implement. We establish the asymptotic properties of the test statistic. The simulation results show that the proposed test controls the type-I error well and achieves adequate power when the dependence strength is moderate to high. Finally, we apply our method to test various copula models in analyzing multiple real datasets. Our method consistently separates different copula models for all these datasets in terms of model fitness.

 
more » « less
NSF-PAR ID:
10486022
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
79
Issue:
3
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 1713-1725
Size(s):
["p. 1713-1725"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The analysis of time series data with detection limits is challenging due to the high‐dimensional integral involved in the likelihood. Existing methods are either computationally demanding or rely on restrictive parametric distributional assumptions. We propose a semiparametric approach, where the temporal dependence is captured by parametric copula, while the marginal distribution is estimated non‐parametrically. Utilizing the properties of copulas, we develop a new copula‐based sequential sampling algorithm, which provides a convenient way to calculate the censored likelihood. Even without full parametric distributional assumptions, the proposed method still allows us to efficiently compute the conditional quantiles of the censored response at a future time point, and thus construct both point and interval predictions. We establish the asymptotic properties of the proposed pseudo maximum likelihood estimator, and demonstrate through simulation and the analysis of a water quality data that the proposed method is more flexible and leads to more accurate predictions than Gaussian‐based methods for non‐normal data.The Canadian Journal of Statistics47: 438–454; 2019 © 2019 Statistical Society of Canada

     
    more » « less
  2. Abstract

    Popular parametric and semiparametric hazards regression models for clustered survival data are inappropriate and inadequate when the unknown effects of different covariates and clustering are complex. This calls for a flexible modeling framework to yield efficient survival prediction. Moreover, for some survival studies involving time to occurrence of some asymptomatic events, survival times are typically interval censored between consecutive clinical inspections. In this article, we propose a robust semiparametric model for clustered interval‐censored survival data under a paradigm of Bayesian ensemble learning, called soft Bayesian additive regression trees or SBART (Linero and Yang, 2018), which combines multiple sparse (soft) decision trees to attain excellent predictive accuracy. We develop a novel semiparametric hazards regression model by modeling the hazard function as a product of a parametric baseline hazard function and a nonparametric component that uses SBART to incorporate clustering, unknown functional forms of the main effects, and interaction effects of various covariates. In addition to being applicable for left‐censored, right‐censored, and interval‐censored survival data, our methodology is implemented using a data augmentation scheme which allows for existing Bayesian backfitting algorithms to be used. We illustrate the practical implementation and advantages of our method via simulation studies and an analysis of a prostate cancer surgery study where dependence on the experience and skill level of the physicians leads to clustering of survival times. We conclude by discussing our method's applicability in studies involving high‐dimensional data with complex underlying associations.

     
    more » « less
  3. Abstract

    The joint analysis of spatial and temporal processes poses computational challenges due to the data's high dimensionality. Furthermore, such data are commonly non-Gaussian. In this paper, we introduce a copula-based spatiotemporal model for analyzing spatiotemporal data and propose a semiparametric estimator. The proposed algorithm is computationally simple, since it models the marginal distribution and the spatiotemporal dependence separately. Instead of assuming a parametric distribution, the proposed method models the marginal distributions nonparametrically and thus offers more flexibility. The method also provides a convenient way to construct both point and interval predictions at new times and locations, based on the estimated conditional quantiles. Through a simulation study and an analysis of wind speeds observed along the border between Oregon and Washington, we show that our method produces more accurate point and interval predictions for skewed data than those based on normality assumptions.

     
    more » « less
  4. Summary

    In demand of predicting new human immunodeficiency virus (HIV) diagnosis rates based on publicly available HIV data that are abundant in space but have few points in time, we propose a class of spatially varying auto-regressive models compounded with conditional auto-regressive spatial correlation structures. We then propose to use the copula approach and a flexible conditional auto-regressive formulation to model the dependence between adjacent counties. These models allow for spatial and temporal correlation as well as space–time interactions and are naturally suitable for predicting HIV cases and other spatiotemporal disease data that feature a similar data structure. We apply the proposed models to HIV data over Florida, California and New England states and compare them with a range of linear mixed models that have been recently popular for modelling spatiotemporal disease data. The results show that for such data our proposed models outperform the others in terms of prediction.

     
    more » « less
  5. Abstract

    A hurricane event can often produce both intense rainfall and a storm tide that can cause a major compound flooding threat to coastlines. This paper examined applications of multivariate copula‐based time series models using data observed during Hurricane Irma (2017) along the coastlines of Florida, Georgia, and South Carolina, United States. Multivariate time series models were developed using bivariate copulas wherein storm tide and rainfall data were modeled using LOWESS‐based autoregressive moving average (ARMA).nsamples of observed data were then synthesized using a Monte Carlo approach in which the empirical copula and the parametric estimate of the copula were obtained to approximate two‐sidedp‐values using the Rosenblatt probability integral transform method. Analysis suggested that proper selection of the underlying LOWESS‐based ARMA model was the crucial aspect for modeling compound flooding wherein Archimedean, Elliptical, and Extreme Value copulas all offered consistent flexibility in terms of dependence modeling. As a backdrop to compound flood probabilities, this research also outlined both theoretical and applied frameworks for the calculation of non‐exceedance probabilities in a multidimensional environment using classical isofrequency probability assumptions for the “AND” (a bivariate joint probability) and Survival Kendall definitions. Random realizations from storm copulas combined with multivariate non‐exceedance probability definitions ultimately showed there were periods of temporal yet cyclical high intensities that lasted 1–2 hr. Lastly, a discussion is presented on the broader application of the proposed methodology within the field of engineering design and risk management which may serve as a catalyst for the continued research in compound flooding.

     
    more » « less