skip to main content


Title: A Robust Approach for Estimating Field Reliability Using Aggregate Failure Time Data
Failure time data of fielded systems are usually obtained from the actual users of the systems. Due to various operational preferences and/or technical obstacles, a large proportion of field data are collected as aggregate data instead of the exact failure times of individual units. The challenge of using such data is that the obtained information is more concise but less precise in comparison to using individual failure times. The most significant needs in modeling aggregate failure time data are the selection of an appropriate probability distribution and the development of a statistical inference procedure capable of handling data aggregation. Although some probability distributions, such as the Gamma and Inverse Gaussian distributions, have well-known closed-form expressions for the probability density function for aggregate data, the use of such distributions limits the applications in field reliability estimation. For reliability practitioners, it would be invaluable to use a robust approach to handle aggregate failure time data without being limited to a small number of probability distributions. This paper studies the application of phase-type (PH) distribution as a candidate for modeling aggregate failure time data. An expectation-maximization algorithm is developed to obtain the maximum likelihood estimates of model parameters, and the confidence interval for the reliability estimate is also obtained. The simulation and numerical studies show that the robust approach is quite powerful because of the high capability of PH distribution in mimicking a variety of probability distributions. In the area of reliability engineering, there is limited work on modeling aggregate data for field reliability estimation. The analytical and statistical inference methods described in this work provide a robust tool for analyzing aggregate failure time data for the first time.  more » « less
Award ID(s):
1635379
NSF-PAR ID:
10112816
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of 2019 Reliability and Maintainability Symposium (RAMS)
Page Range / eLocation ID:
1 to 6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The actual failure times of individual components are usually unavailable in many applications. Instead, only aggregate failure-time data are collected by actual users, due to technical and/or economic reasons. When dealing with such data for reliability estimation, practitioners often face the challenges of selecting the underlying failure-time distributions and the corresponding statistical inference methods. So far, only the exponential, normal, gamma and inverse Gaussian distributions have been used in analyzing aggregate failure-time data, due to these distributions having closed-form expressions for such data. However, the limited choices of probability distributions cannot satisfy extensive needs in a variety of engineering applications. PHase-type (PH) distributions are robust and flexible in modeling failure-time data, as they can mimic a large collection of probability distributions of non-negative random variables arbitrarily closely by adjusting the model structures. In this article, PH distributions are utilized, for the first time, in reliability estimation based on aggregate failure-time data. A Maximum Likelihood Estimation (MLE) method and a Bayesian alternative are developed. For the MLE method, an Expectation-Maximization algorithm is developed for parameter estimation, and the corresponding Fisher information is used to construct the confidence intervals for the quantities of interest. For the Bayesian method, a procedure for performing point and interval estimation is also introduced. Numerical examples show that the proposed PH-based reliability estimation methods are quite flexible and alleviate the burden of selecting a probability distribution when the underlying failure-time distribution is general or even unknown. 
    more » « less
  2. Summary

    Integrating genomic information with traditional clinical risk factors to improve the prediction of disease outcomes could profoundly change the practice of medicine. However, the large number of potential markers and possible complexity of the relationship between markers and disease make it difficult to construct accurate risk prediction models. Standard approaches for identifying important markers often rely on marginal associations or linearity assumptions and may not capture non-linear or interactive effects. In recent years, much work has been done to group genes into pathways and networks. Integrating such biological knowledge into statistical learning could potentially improve model interpretability and reliability. One effective approach is to employ a kernel machine (KM) framework, which can capture nonlinear effects if nonlinear kernels are used (Scholkopf and Smola, 2002; Liu et al., 2007, 2008). For survival outcomes, KM regression modeling and testing procedures have been derived under a proportional hazards (PH) assumption (Li and Luan, 2003; Cai, Tonini, and Lin, 2011). In this article, we derive testing and prediction methods for KM regression under the accelerated failure time (AFT) model, a useful alternative to the PH model. We approximate the null distribution of our test statistic using resampling procedures. When multiple kernels are of potential interest, it may be unclear in advance which kernel to use for testing and estimation. We propose a robust Omnibus Test that combines information across kernels, and an approach for selecting the best kernel for estimation. The methods are illustrated with an application in breast cancer.

     
    more » « less
  3. Growth curve models (GCMs), with their ability to directly investigate within-subject change over time and between-subject differences in change for longitudinal data, are widely used in social and behavioral sciences. While GCMs are typically studied with the normal distribution assumption, empirical data often violate the normality assumption in applications. Failure to account for the deviation from normality in data distribution may lead to unreliable model estimation and misleading statistical inferences. A robust GCM based on conditional medians was recently proposed and outperformed traditional growth curve modeling when outliers are present resulting in nonnormality. However, this robust approach was shown to perform less satisfactorily when leverage observations existed. In this work, we propose a robust double medians growth curve modeling approach (DOME GCM) to thoroughly disentangle the influence of data contamination on model estimation and inferences, where two conditional medians are employed for the distributions of the within-subject measurement errors and of random effects, respectively. Model estimation and inferences are conducted in the Bayesian framework, and Laplace distributions are used to convert the optimization problem of median estimation into a problem of obtaining the maximum likelihood estimator for a transformed model. A Monte Carlo simulation study has been conducted to evaluate the numerical performance of the proposed approach, and showed that the proposed approach yields more accurate and efficient parameter estimates when data contain outliers or leverage observations. The application of the developed robust approach is illustrated using a real dataset from the Virginia Cognitive Aging Project to study the change of memory ability. 
    more » « less
  4. Growth curve models have been widely used to analyse longitudinal data in social and behavioural sciences. Although growth curve models with normality assumptions are relatively easy to estimate, practical data are rarely normal. Failing to account for non‐normal data may lead to unreliable model estimation and misleading statistical inference. In this work, we propose a robust approach for growth curve modelling using conditional medians that are less sensitive to outlying observations. Bayesian methods are applied for model estimation and inference. Based on the existing work on Bayesian quantile regression using asymmetric Laplace distributions, we use asymmetric Laplace distributions to convert the problem of estimating a median growth curve model into a problem of obtaining the maximum likelihood estimator for a transformed model. Monte Carlo simulation studies have been conducted to evaluate the numerical performance of the proposed approach with data containing outliers or leverage observations. The results show that the proposed approach yields more accurate and efficient parameter estimates than traditional growth curve modelling. We illustrate the application of our robust approach using conditional medians based on a real data set from the Virginia Cognitive Aging Project.

     
    more » « less
  5. null (Ed.)
    A normalizing flow is an invertible mapping between an arbitrary probability distribution and a standard normal distribution; it can be used for density estimation and statistical inference. Computing the flow follows the change of variables formula and thus requires invertibility of the mapping and an efficient way to compute the determinant of its Jacobian. To satisfy these requirements, normalizing flows typically consist of carefully chosen components. Continuous normalizing flows (CNFs) are mappings obtained by solving a neural ordinary differential equation (ODE). The neural ODE's dynamics can be chosen almost arbitrarily while ensuring invertibility. Moreover, the log-determinant of the flow's Jacobian can be obtained by integrating the trace of the dynamics' Jacobian along the flow. Our proposed OT-Flow approach tackles two critical computational challenges that limit a more widespread use of CNFs. First, OT-Flow leverages optimal transport (OT) theory to regularize the CNF and enforce straight trajectories that are easier to integrate. Second, OT-Flow features exact trace computation with time complexity equal to trace estimators used in existing CNFs. On five high-dimensional density estimation and generative modeling tasks, OT-Flow performs competitively to state-of-the-art CNFs while on average requiring one-fourth of the number of weights with an 8x speedup in training time and 24x speedup in inference. 
    more » « less