skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: e RPCA : Robust Principal Component Analysis for Exponential Family Distributions
Abstract Robust principal component analysis (RPCA) is a widely used method for recovering low‐rank structure from data matrices corrupted by significant and sparse outliers. These corruptions may arise from occlusions, malicious tampering, or other causes for anomalies, and the joint identification of such corruptions with low‐rank background is critical for process monitoring and diagnosis. However, existing RPCA methods and their extensions largely do not account for the underlying probabilistic distribution for the data matrices, which in many applications are known and can be highly non‐Gaussian. We thus propose a new method called RPCA for exponential family distributions (), which can perform the desired decomposition into low‐rank and sparse matrices when such a distribution falls within the exponential family. We present a novel alternating direction method of multiplier optimization algorithm for efficient decomposition, under either its natural or canonical parametrization. The effectiveness of is then demonstrated in two applications: the first for steel sheet defect detection and the second for crime activity monitoring in the Atlanta metropolitan area.  more » « less
Award ID(s):
2220496 2220495
PAR ID:
10518507
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Wiley
Date Published:
Journal Name:
Statistical Analysis and Data Mining: The ASA Data Science Journal
Volume:
17
Issue:
2
ISSN:
1932-1864
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract In many applications, we seek to recover signals from linear measurements far fewer than the ambient dimension, given the signals have exploitable structures such as sparse vectors or low rank matrices. In this paper, we work in a general setting where signals are approximately sparse in a so-called atomic set. We provide general recovery results stating that a convex programming can stably and robustly recover signals if the null space of the sensing map satisfies certain properties. Moreover, we argue that such null space property can be satisfied with high probability if each measurement is sub-Gaussian even when the number of measurements are very few. Some new results for recovering signals sparse in a frame, and recovering low rank matrices are also derived as a result. 
    more » « less
  2. Abstract This paper introduces a general framework of Semi-parametric TEnsor Factor Analysis (STEFA) that focuses on the methodology and theory of low-rank tensor decomposition with auxiliary covariates. Semi-parametric TEnsor Factor Analysis models extend tensor factor models by incorporating auxiliary covariates in the loading matrices. We propose an algorithm of iteratively projected singular value decomposition (IP-SVD) for the semi-parametric estimation. It iteratively projects tensor data onto the linear space spanned by the basis functions of covariates and applies singular value decomposition on matricized tensors over each mode. We establish the convergence rates of the loading matrices and the core tensor factor. The theoretical results only require a sub-exponential noise distribution, which is weaker than the assumption of sub-Gaussian tail of noise in the literature. Compared with the Tucker decomposition, IP-SVD yields more accurate estimators with a faster convergence rate. Besides estimation, we propose several prediction methods with new covariates based on the STEFA model. On both synthetic and real tensor data, we demonstrate the efficacy of the STEFA model and the IP-SVD algorithm on both the estimation and prediction tasks. 
    more » « less
  3. This paper proposes a data-driven method to pinpoint the source of a new emerging dynamical phenomenon in the power grid, referred to “forced oscillations” in the difficult but highly risky case where there is a resonance phenomenon. By exploiting the low-rank and sparse properties of synchrophasor measurements, the localization problem is formulated as a matrix decomposition problem, which can be efficiently solved by the exact augmented Lagrange multiplier algorithm. An online detection scheme is developed based on the problem formulation. The data-driven nature of the proposed method allows for a very efficient implementation. The efficacy of the proposed method is illustrated in a 68-bus power system. The proposed method may possibly be more broadly useful in other situations for identifying the source of forced oscillations in resonant systems. Index Terms—Forced oscillations, resonant systems, phasor measurement unit (PMU), robust principal component analysis (RPCA), Big Data. 
    more » « less
  4. In this study, we explore the use of low rank and sparse constraints for the noninvasive estimation of epicardial and endocardial extracellular potentials from body-surface electrocardiographic data to locate the focus of premature ventricular contractions (PVCs). The proposed strategy formulates the dynamic spatiotemporal distribution of cardiac potentials by means of low rank and sparse decomposition, where the low rank term represents the smooth background and the anomalous potentials are extracted in the sparse matrix. Compared to the most previous potential-based approaches, the proposed low rank and sparse constraints are batch spatiotemporal constraints that capture the underlying relationship of dynamic potentials. The resulting optimization problem is solved using alternating direction method of multipliers . Three sets of simulation experiments with eight different ventricular pacing sites demonstrate that the proposed model outperforms the existing Tikhonov regularization (zero-order, second-order) and L1-norm based method at accurately reconstructing the potentials and locating the ventricular pacing sites. Experiments on a total of 39 cases of real PVC data also validate the ability of the proposed method to correctly locate ectopic pacing sites. 
    more » « less
  5. Due to their transient nature, clouds represent anomalies relative to the underlying landscape of interest. Hence, the challenge of cloud identification can be considered a specific case in the more general problem of anomaly detection. The confounding effects of transient anomalies are particularly troublesome for spatiotemporal analysis of land surface processes. While spatiotemporal characterization provides a statistical basis to quantify the most significant temporal patterns and their spatial distributions without the need for a priori assumptions about the observed changes, the presence of transient anomalies can obscure the statistical properties of the spatiotemporal processes of interest. The objective of this study is to implement and evaluate a robust approach to distinguish clouds and other transient anomalies from diurnal and annual thermal cycles observed with time-lapse thermography. The approach uses Robust Principal Component Analysis (RPCA) to statistically distinguish low-rank (L) and sparse (S) components of the land surface temperature image time series, followed by a spatiotemporal characterization of its low rank component to quantify the dominant diurnal and annual thermal cycles in the study area. RPCA effectively segregates clouds, sensor anomalies, swath gaps, geospatial displacements and transient thermal anomalies into the sparse component time series. Spatiotemporal characterization of the low-rank component time series clearly resolves a variety of diurnal and annual thermal cycles for different land covers and water bodies while segregating transient anomalies potentially of interest. 
    more » « less