skip to main content


Title: A Selective Overview and Comparison of Robust Mixture Regression Estimators
Summary

Mixture regression models have been widely used in business, marketing and social sciences to model mixed regression relationships arising from a clustered and thus heterogeneous population. The unknown mixture regression parameters are usually estimated by maximum likelihood estimators using the expectation–maximisation algorithm based on the normality assumption of component error density. However, it is well known that the normality‐based maximum likelihood estimation is very sensitive to outliers or heavy‐tailed error distributions. This paper aims to give a selective overview of the recently proposed robust mixture regression methods and compare their performance using simulation studies.

 
more » « less
NSF-PAR ID:
10457879
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
International Statistical Review
Volume:
88
Issue:
1
ISSN:
0306-7734
Page Range / eLocation ID:
p. 176-202
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Growth curve models have been widely used to analyse longitudinal data in social and behavioural sciences. Although growth curve models with normality assumptions are relatively easy to estimate, practical data are rarely normal. Failing to account for non‐normal data may lead to unreliable model estimation and misleading statistical inference. In this work, we propose a robust approach for growth curve modelling using conditional medians that are less sensitive to outlying observations. Bayesian methods are applied for model estimation and inference. Based on the existing work on Bayesian quantile regression using asymmetric Laplace distributions, we use asymmetric Laplace distributions to convert the problem of estimating a median growth curve model into a problem of obtaining the maximum likelihood estimator for a transformed model. Monte Carlo simulation studies have been conducted to evaluate the numerical performance of the proposed approach with data containing outliers or leverage observations. The results show that the proposed approach yields more accurate and efficient parameter estimates than traditional growth curve modelling. We illustrate the application of our robust approach using conditional medians based on a real data set from the Virginia Cognitive Aging Project.

     
    more » « less
  2. Abstract Objectives

    Previously developed methods in subadult body mass estimation have not been tested in populations other than European–American or African–American. This study uses a contemporary Taiwanese sample to test these methods. Through evaluating their accuracy and bias, we addressed whether the allometric relationships between body mass and skeletal traits commonly used in subadult body mass estimation are conserved among different populations.

    Materials and Methods

    Computed tomography scans of lower limbs from individuals aged 0–17 years old of both sexes were collected from National Taiwan University Hospital along with documented body weight. Polar second moment of area, distal femoral metaphyseal breadth, and maximum superior/inferior femoral head diameter were collected either directly from the scans or from reconstructed 3D models. Estimated body mass was compared with documented body mass to assess the performance of the equations.

    Results

    Current methods provided good body mass estimates in Taiwanese individuals, with accuracy and bias similar to those reported in other validation studies. A tendency for increasing error with increasing age was observed for all methods. Reduced major axis regression showed the allometric relationships between different skeletal traits and body mass across different age categories can all be summarized using a common fitted line. A revised, maximum likelihood‐based approach was proposed for all skeletal traits.

    Discussion

    The results suggested that the allometric relationships between body mass and different skeletal traits are largely conserved among populations. The revised method provided improved applicability with strong underlying theoretical justifications, and potential for future improvements.

     
    more » « less
  3. Summary

    We consider the problem of selecting covariates in a spatial regression model when the response is binary. Penalized likelihood-based approach is proved to be effective for both variable selection and estimation simultaneously. In the context of a spatially dependent binary variable, an uniquely interpretable likelihood is not available, rather a quasi-likelihood might be more suitable. We develop a penalized quasi-likelihood with spatial dependence for simultaneous variable selection and parameter estimation along with an efficient computational algorithm. The theoretical properties including asymptotic normality and consistency are studied under increasing domain asymptotics framework. An extensive simulation study is conducted to validate the methodology. Real data examples are provided for illustration and applicability. Although theoretical justification has not been made, we also investigate empirical performance of the proposed penalized quasi-likelihood approach for spatial count data to explore suitability of this method to a general exponential family of distributions.

     
    more » « less
  4. Summary

    The paper is concerned with parameter estimation for inhomogeneous spatial point processes with a regression model for the intensity function and tractable second-order properties (K-function). Regression parameters are estimated by using a Poisson likelihood score estimating function and in the second step minimum contrast estimation is applied for the residual clustering parameters. Asymptotic normality of parameter estimates is established under certain mixing conditions and we exemplify how the results may be applied in ecological studies of rainforests.

     
    more » « less
  5. Abstract

    We consider the proportional hazards model in which the covariates include the discretized categories of a continuous time‐dependent exposure variable measured with error. Naively ignoring the measurement error in the analysis may cause biased estimation and erroneous inference. Although various approaches have been proposed to deal with measurement error when the hazard depends linearly on the time‐dependent variable, it has not yet been investigated how to correct when the hazard depends on the discretized categories of the time‐dependent variable. To fill this gap in the literature, we propose a smoothed corrected score approach based on approximation of the discretized categories after smoothing the indicator function. The consistency and asymptotic normality of the proposed estimator are established. The observation times of the time‐dependent variable are allowed to be informative. For comparison, we also extend to this setting two approximate approaches, the regression calibration and the risk‐set regression calibration. The methods are assessed by simulation studies and by application to data from an HIV clinical trial.

     
    more » « less