skip to main content

Title: Robust monitoring of multivariate processes with short‐ranged serial data correlation

Control charts are commonly used in practice for detecting distributional shifts of sequential processes. Traditional statistical process control (SPC) charts are based on the assumptions that process observations are independent and identically distributed and follow a parametric distribution when the process is in‐control (IC). In practice, these assumptions are rarely valid, and it has been well demonstrated that these traditional control charts are unreliable to use when their model assumptions are invalid. To overcome this limitation, nonparametric SPC has become an active research area, and some nonparametric control charts have been developed. But, most existing nonparametric control charts are based on data ordering and/or data categorization of the original process observations, which would result in information loss in the observed data and consequently reduce the effectiveness of the related control charts. In this paper, we suggest a new multivariate online monitoring scheme, in which process observations are first sequentially decorrelated, the decorrelated data of each quality variable are then transformed using their estimated IC distribution so that the IC distribution of the transformed data would be roughlyN(0, 1), and finally the conventional multivariate exponentially weighted moving average (MEWMA) chart is applied to the transformed data of all quality variables for online process monitoring. This chart is self‐starting in the sense that estimates of all related IC quantities are updated recursively over time. It can well accommodate stationary short‐range serial data correlation, and its design is relatively simple since its control limit can be determined in advance by a Monte Carlo simulation. Because information loss due to data ordering and/or data categorization is avoided in this approach, numerical studies show that it is reliable to use and effective for process monitoring in various cases considered.

more » « less
Author(s) / Creator(s):
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Quality and Reliability Engineering International
Page Range / eLocation ID:
p. 4196-4209
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Profile monitoring is an active research area in statistical process control (SPC) because it has many important applications in manufacturing and other industries. Early profile monitoring methods often impose model assumptions that the mean profile function has a parametric form (e.g., linear), profile observations have a parametric distribution (e.g., normal), and within‐profile observations are independent of each other. These assumptions have been lifted in some recent profile monitoring research, making the related methods more reliable to use in various applications. One notoriously challenging task in profile monitoring research is to properly accommodate serial data correlation among profiles observed at different time points, and this task has not been properly addressed in the SPC literature yet. Serial data correlation is common in practice, and it has been well demonstrated in the literature that control charts are unreliable to use if the serial data correlation is ignored. In this paper, we suggest a novel mixed‐effects model for describing serially correlated univariate profile data. Based on this model, a Phase I profile monitoring chart is developed. This chart is flexible in the sense that it does not require any parametric forms for describing the mean profile function and the profile data distribution. It can accommodate both the within‐profile and between‐profile data correlation. Numerical studies show that it works well in different cases.

    more » « less
  2. In many real-world applications of monitoring multivariate spatio-temporal data that are non-stationary over time, one is often interested in detecting hot-spots with spatial sparsity and temporal consistency, instead of detecting system-wise changes as in traditional statistical process control (SPC) literature. In this paper, we propose an efficient method to detect hot-spots through tensor decomposition, and our method has three steps. First, we fit the observed data into a Smooth Sparse Decomposition Tensor (SSD-Tensor) model that serves as a dimension reduction and de-noising technique: it is an additive model decomposing the original data into: smooth but non-stationary global mean, sparse local anomalies, and random noises. Next, we estimate model parameters by the penalized framework that includes Least Absolute Shrinkage and Selection Operator (LASSO) and fused LASSO penalty. An efficient recursive optimization algorithm is developed based on Fast Iterative Shrinkage Thresholding Algorithm (FISTA). Finally, we apply a Cumulative Sum (CUSUM) Control Chart to monitor model residuals after removing global means, which helps to detect when and where hot-spots occur. To demonstrate the usefulness of our proposed SSD-Tensor method, we compare it with several other methods including scan statistics, LASSO-based, PCA-based, T2-based control chart in extensive numerical simulation studies and a real crime rate dataset. 
    more » « less
  3. Abstract

    Run length distributions are generally used to characterize the performance of a control chart in signaling alarms when a process is out‐of‐control. Since it is usually difficult to directly compare distributions, statistics of the run length distribution are commonly adopted as the performance criteria in practice. Particularly, the average run length (ARL) and its extended versions play a dominant role. However, due to the skewness of the run length distribution, the ARL cannot accurately reflect the central tendency and may be misleading in some cases. In order to comprehensively summarize the information of the run length distribution, a novel criterion is proposed based on the continuous ranked probability score (CRPS). The CRPS‐based criterion measures the difference between the run length distribution and the ideal constant value 0 for the run length. It has advantages of easy computation and good interpretability. Furthermore, theoretical properties and geometric representation guarantee that the CRPS‐based criterion is statistically consistent, informative of both first and second moments of the run length distribution, and robust to extreme values. Results of numerical experiments show that the proposed criterion favors control charts with higher probability to detect outliers earlier, and is a superior metric for characterizing the run length distribution.

    more » « less
  4. Summary

    Decentralized waste water treatment facilities monitor many features that are complexly related. The ability to detect the onset of a fault and to identify variables accurately that have shifted because of the fault are vital to maintaining proper system operation and high quality produced water. Various multivariate methods have been proposed to perform fault detection and isolation, but the methods require data to be independent and identically distributed when the process is in control, and most require a distributional assumption. We propose a distribution-free retrospective change-point-detection method for auto-correlated and non-stationary multivariate processes. We detrend the data by using observations from an in-control time period to account for expected changes due to external or user-controlled factors. Next, we perform the fused lasso, which penalizes differences in consecutive observations, to detect faults and to identify shifted variables. To account for auto-correlation, the regularization parameter is chosen by using an estimated effective sample size in the extended Bayesian information criterion. We demonstrate the performance of our method compared with a competitor in simulation. Finally, we apply our method to waste water treatment facility data with a known fault, and the variables identified by our proposed method are consistent with the operators’ diagnosis of the fault's cause.

    more » « less
  5. Abstract Particle filters avoid parametric estimates for Bayesian posterior densities, which alleviates Gaussian assumptions in nonlinear regimes. These methods, however, are more sensitive to sampling errors than Gaussian-based techniques such as ensemble Kalman filters. A recent study by the authors introduced an iterative strategy for particle filters that match posterior moments—where iterations improve the filter’s ability to draw samples from non-Gaussian posterior densities. The iterations follow from a factorization of particle weights, providing a natural framework for combining particle filters with alternative filters to mitigate the impact of sampling errors. The current study introduces a novel approach to forming an adaptive hybrid data assimilation methodology, exploiting the theoretical strengths of nonparametric and parametric filters. At each data assimilation cycle, the iterative particle filter performs a sequence of updates while the prior sample distribution is non-Gaussian, then an ensemble Kalman filter provides the final adjustment when Gaussian distributions for marginal quantities are detected. The method employs the Shapiro–Wilk test to determine when to make the transition between filter algorithms, which has outstanding power for detecting departures from normality. Experiments using low-dimensional models demonstrate that the approach has a significant value, especially for nonhomogeneous observation networks and unknown model process errors. Moreover, hybrid factors are extended to consider marginals of more than one collocated variables using a test for multivariate normality. Findings from this study motivate the use of the proposed method for geophysical problems characterized by diverse observation networks and various dynamic instabilities, such as numerical weather prediction models. Significance Statement Data assimilation statistically processes observation errors and model forecast errors to provide optimal initial conditions for the forecast, playing a critical role in numerical weather forecasting. The ensemble Kalman filter, which has been widely adopted and developed in many operational centers, assumes Gaussianity of the prior distribution and solves a linear system of equations, leading to bias in strong nonlinear regimes. On the other hand, particle filters avoid many of those assumptions but are sensitive to sampling errors and are computationally expensive. We propose an adaptive hybrid strategy that combines their advantages and minimizes the disadvantages of the two methods. The hybrid particle filter–ensemble Kalman filter is achieved with the Shapiro–Wilk test to detect the Gaussianity of the ensemble members and determine the timing of the transition between these filter updates. Demonstrations in this study show that the proposed method is advantageous when observations are heterogeneous and when the model has an unknown bias. Furthermore, by extending the statistical hypothesis test to the test for multivariate normality, we consider marginals of more than one collocated variable. These results encourage further testing for real geophysical problems characterized by various dynamic instabilities, such as real numerical weather prediction models. 
    more » « less