NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Diagnostics for regression models with semicontinuous outcomes

https://doi.org/10.1093/biomtc/ujae007

Yang, Lu (March 2024, Biometrics)

ABSTRACT Semicontinuous outcomes commonly arise in a wide variety of fields, such as insurance claims, healthcare expenditures, rainfall amounts, and alcohol consumption. Regression models, including Tobit, Tweedie, and two-part models, are widely employed to understand the relationship between semicontinuous outcomes and covariates. Given the potential detrimental consequences of model misspecification, after fitting a regression model, it is of prime importance to check the adequacy of the model. However, due to the point mass at zero, standard diagnostic tools for regression models (eg, deviance and Pearson residuals) are not informative for semicontinuous data. To bridge this gap, we propose a new type of residuals for semicontinuous outcomes that is applicable to general regression models. Under the correctly specified model, the proposed residuals converge to being uniformly distributed, and when the model is misspecified, they significantly depart from this pattern. In addition to in-sample validation, the proposed methodology can also be employed to evaluate predictive distributions. We demonstrate the effectiveness of the proposed tool using health expenditure data from the US Medical Expenditure Panel Survey.
more » « less
A copula model for marked point process with a terminal event: An application in dynamic prediction of insurance claims

https://doi.org/10.1214/24-AOAS1902

Yang, Lu; Shi, Peng; Huang, Shimeng (December 2024, The Annals of Applied Statistics)

Accurate prediction of an insurer’s outstanding liabilities is crucial for maintaining the financial health of the insurance sector. We aim to develop a statistical model for insurers to dynamically forecast unpaid losses by leveraging the granular transaction data on individual claims. The liability cash flow from a single insurance claim is determined by an event process that describes the recurrences of payments, a payment process that generates a sequence of payment amounts, and a settlement process that terminates both the event and payment processes. More importantly, the three components are dependent on one another, which enables the dynamic prediction of an insurer’s outstanding liability. We introduce a copula-based point process framework to model the recurrent events of payment transactions from an insurance claim, where the longitudinal payment amounts and the time-to-settlement outcome are formulated as the marks and the terminal event of the counting process, respectively. The dependencies among the three components are characterized using the method of pair copula constructions. We further develop a stagewise strategy for parameter estimation and illustrate its desirable properties with numerical experiments. In the application we consider a portfolio of property insurance claims for building and contents coverage obtained from a commercial property insurance provider, where we find intriguing dependence patterns among the three components. The superior dynamic prediction performance of the proposed joint model enhances the insurer’s decision-making in claims reserving and risk financing operations.
more » « less
Full Text Available
Double Probability Integral Transform Residuals for Regression Models with Discrete Outcomes

https://doi.org/10.1080/10618600.2024.2303336

Yang, Lu (February 2024, Journal of Computational and Graphical Statistics)

The assessment of regression models with discrete outcomes is challenging and has many fundamental issues. With discrete outcomes, standard regression model assessment tools such as Pearson and deviance residuals do not follow the conventional reference distribution (normal) under the true model, calling into question the legitimacy of model assessment based on these tools. To fill this gap, we construct a new type of residuals for regression models with general discrete outcomes, including ordinal and count outcomes. The proposed residuals are based on two layers of probability integral transformation. When at least one continuous covariate is available, the proposed residuals closely follow a uniform distribution (or a normal distribution after transformation) under the correctly specified model. One can construct visualizations such as QQ plots to check the overall fit of a model straightforwardly, and the shape of QQ plots can further help identify possible causes of misspecification such as overdispersion. We provide theoretical justification for the proposed residuals by establishing their asymptotic properties. Moreover, in order to assess the mean structure and identify potential covariates, we develop an ordered curve as a supplementary tool, which is based on the comparison between the partial sum of outcomes and of fitted means. Through simulation, we demonstrate empirically that the proposed tools outperform commonly used residuals for various model assessment tasks. We also illustrate the workflow of model assessment using the proposed tools in data analysis. Supplementary materials for this article are available online.
more » « less
Full Text Available

Search for: All records