skip to main content


This content will become publicly available on April 1, 2025

Title: e RPCA : Robust Principal Component Analysis for Exponential Family Distributions
Abstract

Robust principal component analysis (RPCA) is a widely used method for recovering low‐rank structure from data matrices corrupted by significant and sparse outliers. These corruptions may arise from occlusions, malicious tampering, or other causes for anomalies, and the joint identification of such corruptions with low‐rank background is critical for process monitoring and diagnosis. However, existing RPCA methods and their extensions largely do not account for the underlying probabilistic distribution for the data matrices, which in many applications are known and can be highly non‐Gaussian. We thus propose a new method called RPCA for exponential family distributions (), which can perform the desired decomposition into low‐rank and sparse matrices when such a distribution falls within the exponential family. We present a novel alternating direction method of multiplier optimization algorithm for efficient decomposition, under either its natural or canonical parametrization. The effectiveness of is then demonstrated in two applications: the first for steel sheet defect detection and the second for crime activity monitoring in the Atlanta metropolitan area.

 
more » « less
Award ID(s):
2220496
NSF-PAR ID:
10518507
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Wiley
Date Published:
Journal Name:
Statistical Analysis and Data Mining: The ASA Data Science Journal
Volume:
17
Issue:
2
ISSN:
1932-1864
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    In many applications, non-Gaussian data such as binary or count are observed over a continuous domain and there exists a smooth underlying structure for describing such data. We develop a new functional data method to deal with this kind of data when the data are regularly spaced on the continuous domain. Our method, referred to as Exponential Family Functional Principal Component Analysis (EFPCA), assumes the data are generated from an exponential family distribution, and the matrix of the canonical parameters has a low-rank structure. The proposed method flexibly accommodates not only the standard one-way functional data, but also two-way (or bivariate) functional data. In addition, we introduce a new cross validation method for estimating the latent rank of a generalized data matrix. We demonstrate the efficacy of the proposed methods using a comprehensive simulation study. The proposed method is also applied to a real application of the UK mortality study, where data are binomially distributed and two-way functional across age groups and calendar years. The results offer novel insights into the underlying mortality pattern.

     
    more » « less
  2. This paper proposes a data-driven method to pinpoint the source of a new emerging dynamical phenomenon in the power grid, referred to “forced oscillations” in the difficult but highly risky case where there is a resonance phenomenon. By exploiting the low-rank and sparse properties of synchrophasor measurements, the localization problem is formulated as a matrix decomposition problem, which can be efficiently solved by the exact augmented Lagrange multiplier algorithm. An online detection scheme is developed based on the problem formulation. The data-driven nature of the proposed method allows for a very efficient implementation. The efficacy of the proposed method is illustrated in a 68-bus power system. The proposed method may possibly be more broadly useful in other situations for identifying the source of forced oscillations in resonant systems. Index Terms—Forced oscillations, resonant systems, phasor measurement unit (PMU), robust principal component analysis (RPCA), Big Data. 
    more » « less
  3. Abstract

    In many applications, we seek to recover signals from linear measurements far fewer than the ambient dimension, given the signals have exploitable structures such as sparse vectors or low rank matrices. In this paper, we work in a general setting where signals are approximately sparse in a so-called atomic set. We provide general recovery results stating that a convex programming can stably and robustly recover signals if the null space of the sensing map satisfies certain properties. Moreover, we argue that such null space property can be satisfied with high probability if each measurement is sub-Gaussian even when the number of measurements are very few. Some new results for recovering signals sparse in a frame, and recovering low rank matrices are also derived as a result.

     
    more » « less
  4. Summary

    This paper addresses matrix approximation problems for matrices that are large, sparse, and/or representations of large graphs. To tackle these problems, we consider algorithms that are based primarily on coarsening techniques, possibly combined with random sampling. A multilevel coarsening technique is proposed, which utilizes a hypergraph associated with the data matrix and a graph coarsening strategy based on column matching. We consider a number of standard applications of this technique as well as a few new ones. Among standard applications, we first consider the problem of computingpartial singular value decomposition, for which a combination of sampling and coarsening yields significantly improved singular value decomposition results relative to sampling alone. We also consider thecolumn subset selectionproblem, a popular low‐rank approximation method used in data‐related applications, and show how multilevel coarsening can be adapted for this problem. Similarly, we consider the problem ofgraph sparsificationand show how coarsening techniques can be employed to solve it. We also establish theoretical results that characterize the approximation error obtained and the quality of the dimension reduction achieved by a coarsening step, when a proper column matching strategy is employed. Numerical experiments illustrate the performances of the methods in a few applications.

     
    more » « less
  5. In this study, we explore the use of low rank and sparse constraints for the noninvasive estimation of epicardial and endocardial extracellular potentials from body-surface electrocardiographic data to locate the focus of premature ventricular contractions (PVCs). The proposed strategy formulates the dynamic spatiotemporal distribution of cardiac potentials by means of low rank and sparse decomposition, where the low rank term represents the smooth background and the anomalous potentials are extracted in the sparse matrix. Compared to the most previous potential-based approaches, the proposed low rank and sparse constraints are batch spatiotemporal constraints that capture the underlying relationship of dynamic potentials. The resulting optimization problem is solved using alternating direction method of multipliers . Three sets of simulation experiments with eight different ventricular pacing sites demonstrate that the proposed model outperforms the existing Tikhonov regularization (zero-order, second-order) and L1-norm based method at accurately reconstructing the potentials and locating the ventricular pacing sites. Experiments on a total of 39 cases of real PVC data also validate the ability of the proposed method to correctly locate ectopic pacing sites. 
    more » « less