skip to main content


Title: An Exploratory Statistical Cusp Catastrophe Model
The Cusp Catastrophe Model provides a promising approach for health and behavioral researchers to investigate both continuous and quantum changes in one modeling framework. However, application of the model is hindered by unresolved issues around a statistical model fitting to the data. This paper reports our exploratory work in developing a new approach to statistical cusp catastrophe modeling. In this new approach, the Cusp Catastrophe Model is cast into a statistical nonlinear regression for parameter estimation. The algorithms of the delayed convention and Maxwell convention are applied to obtain parameter estimates using maximum likelihood estimation. Through a series of simulation studies, we demonstrate that (a) parameter estimation of this statistical cusp model is unbiased, and (b) use of a bootstrapping procedure enables efficient statistical inference. To test the utility of this new method, we analyze survey data collected for an NIH-funded project providing HIV-prevention education to adolescents in the Bahamas. We found that the results can be more reasonably explained by our approach than other existing methods. Additional research is needed to establish this new approach as the most reliable method for fitting the cusp catastrophe model. Further research should focus on additional theoretical analysis, extension of the model for analyzing categorical and counting data, and additional applications in analyzing different data types.  more » « less
Award ID(s):
1633212
NSF-PAR ID:
10039228
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2016 IEEE International Conference on Data Science and Advanced Analytics
Page Range / eLocation ID:
100 to 109
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Recent research applies soft computing techniques to fit software reliability growth models. However, runtime performance and the distribution of the distance from an optimal solution over multiple runs must be explicitly considered to justify the practical utility of these approaches, promote comparison, and support reproducible research. This paper presents a meta-optimization framework to design stable and efficient multi-phase algorithms for fitting software reliability growth models. The approach combines initial parameter estimation techniques from statistical algorithms, the global search properties of soft computing, and the rapid convergence of numerical methods. Designs that exhibit the best balance between runtime performance and accuracy are identified. The approach is illustrated through nonhomogeneous Poisson process and covariate software reliability growth models, including a cross-validation step on data sets not used to identify designs. The results indicate the nonhomogeneous Poisson process model considered is too simple to benefit from soft computing because it incurs additional runtime with no increase in accuracy attained. However, a multi-phase design for the covariate software reliability growth model consisting of the bat algorithm followed by a numerical method achieves better performance and converges consistently, compared to a numerical method only. The proposed approach supports higher dimensional covariate software reliability growth model fitting suitable for implementation in a tool. 
    more » « less
  2. <italic>Abstract</italic>

    We consider the situation where there is a known regression model that can be used to predict an outcome,Y, from a set of predictor variablesX. A new variableBis expected to enhance the prediction ofY. A dataset of sizencontainingY,XandBis available, and the challenge is to build an improved model forY|X,Bthat uses both the available individual level data and some summary information obtained from the known model forY|X. We propose a synthetic data approach, which consists of creatingmadditional synthetic data observations, and then analyzing the combined dataset of sizen + mto estimate the parameters of theY|X,Bmodel. This combined dataset of sizen + mnow has missing values ofBformof the observations, and is analyzed using methods that can handle missing data (e.g., multiple imputation). We present simulation studies and illustrate the method using data from the Prostate Cancer Prevention Trial. Though the synthetic data method is applicable to a general regression context, to provide some justification, we show in two special cases that the asymptotic variances of the parameter estimates in theY|X,Bmodel are identical to those from an alternative constrained maximum likelihood estimation approach. This correspondence in special cases and the method's broad applicability makes it appealing for use across diverse scenarios.The Canadian Journal of Statistics47: 580–603; 2019 © 2019 Statistical Society of Canada

     
    more » « less
  3. Abstract

    With advances in biomedical research, biomarkers are becoming increasingly important prognostic factors for predicting overall survival, while the measurement of biomarkers is often censored due to instruments' lower limits of detection. This leads to two types of censoring: random censoring in overall survival outcomes and fixed censoring in biomarker covariates, posing new challenges in statistical modeling and inference. Existing methods for analyzing such data focus primarily on linear regression ignoring censored responses or semiparametric accelerated failure time models with covariates under detection limits (DL). In this paper, we propose a quantile regression for survival data with covariates subject to DL. Comparing to existing methods, the proposed approach provides a more versatile tool for modeling the distribution of survival outcomes by allowing covariate effects to vary across conditional quantiles of the survival time and requiring no parametric distribution assumptions for outcome data. To estimate the quantile process of regression coefficients, we develop a novel multiple imputation approach based on another quantile regression for covariates under DL, avoiding stringent parametric restrictions on censored covariates as often assumed in the literature. Under regularity conditions, we show that the estimation procedure yields uniformly consistent and asymptotically normal estimators. Simulation results demonstrate the satisfactory finite‐sample performance of the method. We also apply our method to the motivating data from a study of genetic and inflammatory markers of Sepsis.

     
    more » « less
  4. The graphon (W-graph), including the stochastic block model as a special case, has been widely used in modeling and analyzing network data. Estimation of the graphon function has gained a lot of recent research interests. Most existing works focus on inference in the latent space of the model, while adopting simple maximum likelihood or Bayesian estimates for the graphon or connectivity parameters given the identified latent variables. In this work, we propose a hierarchical model and develop a novel empirical Bayes estimate of the connectivity matrix of a stochastic block model to approximate the graphon function. Based on our hierarchical model, we further introduce a new model selection criterion for choosing the number of communities. Numerical results on extensive simulations and two well-annotated social networks demonstrate the superiority of our approach in terms of parameter estimation and model selection. 
    more » « less
  5. Abstract Although the governing equations of many systems, when derived from first principles, may be viewed as known, it is often too expensive to numerically simulate all the interactions they describe. Therefore, researchers often seek simpler descriptions that describe complex phenomena without numerically resolving all the interacting components. Stochastic differential equations (SDEs) arise naturally as models in this context. The growth in data acquisition, both through experiment and through simulations, provides an opportunity for the systematic derivation of SDE models in many disciplines. However, inconsistencies between SDEs and real data at short time scales often cause problems, when standard statistical methodology is applied to parameter estimation. The incompatibility between SDEs and real data can be addressed by deriving sufficient statistics from the time-series data and learning parameters of SDEs based on these. Here, we study sufficient statistics computed from time averages, an approach that we demonstrate to lead to sufficient statistics on a variety of problems and that has the secondary benefit of obviating the need to match trajectories. Following this approach, we formulate the fitting of SDEs to sufficient statistics from real data as an inverse problem and demonstrate that this inverse problem can be solved by using ensemble Kalman inversion. Furthermore, we create a framework for non-parametric learning of drift and diffusion terms by introducing hierarchical, refinable parameterizations of unknown functions, using Gaussian process regression. We demonstrate the proposed methodology for the fitting of SDE models, first in a simulation study with a noisy Lorenz ’63 model, and then in other applications, including dimension reduction in deterministic chaotic systems arising in the atmospheric sciences, large-scale pattern modeling in climate dynamics and simplified models for key observables arising in molecular dynamics. The results confirm that the proposed methodology provides a robust and systematic approach to fitting SDE models to real data. 
    more » « less