Robust principal component analysis (RPCA) is a widely used method for recovering low‐rank structure from data matrices corrupted by significant and sparse outliers. These corruptions may arise from occlusions, malicious tampering, or other causes for anomalies, and the joint identification of such corruptions with low‐rank background is critical for process monitoring and diagnosis. However, existing RPCA methods and their extensions largely do not account for the underlying probabilistic distribution for the data matrices, which in many applications are known and can be highly non‐Gaussian. We thus propose a new method called RPCA for exponential family distributions (), which can perform the desired decomposition into low‐rank and sparse matrices when such a distribution falls within the exponential family. We present a novel alternating direction method of multiplier optimization algorithm for efficient decomposition, under either its natural or canonical parametrization. The effectiveness of is then demonstrated in two applications: the first for steel sheet defect detection and the second for crime activity monitoring in the Atlanta metropolitan area.
This content will become publicly available on January 1, 2025
Due to their transient nature, clouds represent anomalies relative to the underlying landscape of interest. Hence, the challenge of cloud identification can be considered a specific case in the more general problem of anomaly detection. The confounding effects of transient anomalies are particularly troublesome for spatiotemporal analysis of land surface processes. While spatiotemporal characterization provides a statistical basis to quantify the most significant temporal patterns and their spatial distributions without the need for a priori assumptions about the observed changes, the presence of transient anomalies can obscure the statistical properties of the spatiotemporal processes of interest. The objective of this study is to implement and evaluate a robust approach to distinguish clouds and other transient anomalies from diurnal and annual thermal cycles observed with time-lapse thermography. The approach uses Robust Principal Component Analysis (RPCA) to statistically distinguish low-rank (L) and sparse (S) components of the land surface temperature image time series, followed by a spatiotemporal characterization of its low rank component to quantify the dominant diurnal and annual thermal cycles in the study area. RPCA effectively segregates clouds, sensor anomalies, swath gaps, geospatial displacements and transient thermal anomalies into the sparse component time series. Spatiotemporal characterization of the low-rank component time series clearly resolves a variety of diurnal and annual thermal cycles for different land covers and water bodies while segregating transient anomalies potentially of interest.
more » « less- Award ID(s):
- 2226649
- PAR ID:
- 10496999
- Publisher / Repository:
- MDPI
- Date Published:
- Journal Name:
- Remote Sensing
- Volume:
- 16
- Issue:
- 2
- ISSN:
- 2072-4292
- Page Range / eLocation ID:
- 255
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
This paper extends robust principal component analysis (RPCA) to nonlinear manifolds. Suppose that the observed data matrix is the sum of a sparse component and a component drawn from some low dimensional manifold. Is it possible to separate them by using similar ideas as RPCA? Is there any benefit in treating the manifold as a whole as opposed to treating each local region independently? We answer these two questions affirmatively by proposing and analyzing an optimization framework that separates the sparse component from the manifold under noisy data. Theoretical error bounds are provided when the tangent spaces of the manifold satisfy certain incoherence conditions. We also provide a near optimal choice of the tuning parameters for the proposed optimization formulation with the help of a new curvature estimation method. The efficacy of our method is demonstrated on both synthetic and real datasets.more » « less
-
Abstract Statistical methods are required to evaluate and quantify the uncertainty in environmental processes, such as land and sea surface temperature, in a changing climate. Typically, annual harmonics are used to characterize the variation in the seasonal temperature cycle. However, an often overlooked feature of the climate seasonal cycle is the semi‐annual harmonic, which can account for a significant portion of the variance of the seasonal cycle and varies in amplitude and phase across space. Together, the spatial variation in the annual and semi‐annual harmonics can play an important role in driving processes that are tied to seasonality (e.g., ecological and agricultural processes). We propose a multivariate spatiotemporal model to quantify the spatial and temporal change in minimum and maximum temperature seasonal cycles as a function of the annual and semi‐annual harmonics. Our approach captures spatial dependence, temporal dynamics, and multivariate dependence of these harmonics through spatially and temporally varying coefficients. We apply the model to minimum and maximum temperature over North American for the years 1979–2018. Formal model inference within the Bayesian paradigm enables the identification of regions experiencing significant changes in minimum and maximum temperature seasonal cycles due to the relative effects of changes in the two harmonics.
-
Spatiotemporal traffic data imputation is of great significance in intelligent transportation systems and data-driven decision-making processes. To perform efficient learning and accurate reconstruction from partially observed traffic data, we assert the importance of characterizing both global and local trends in time series. In the literature, substantial works have demonstrated the effectiveness of utilizing the low-rank property of traffic data by matrix/tensor completion models. In this study, we first introduce a Laplacian kernel to temporal regularization for characterizing local trends in traffic time series, which can be formulated as a circular convolution. Then, we develop a low-rank Laplacian convolutional representation (LCR) model by putting the circulant matrix nuclear norm and the Laplacian kernelized temporal regularization together, which is proved to meet a unified framework that has a fast Fourier transform (FFT) solution in log-linear time complexity. Through extensive experiments on several traffic datasets, we demonstrate the superiority of LCR over several baseline models for imputing traffic time series of various time series behaviors (e.g., data noises and strong/weak periodicity) and reconstructing sparse speed fields of vehicular traffic flow. The proposed LCR model is also an efficient solution to large-scale traffic data imputation over the existing imputation models.more » « less
-
In this study, we explore the use of low rank and sparse constraints for the noninvasive estimation of epicardial and endocardial extracellular potentials from body-surface electrocardiographic data to locate the focus of premature ventricular contractions (PVCs). The proposed strategy formulates the dynamic spatiotemporal distribution of cardiac potentials by means of low rank and sparse decomposition, where the low rank term represents the smooth background and the anomalous potentials are extracted in the sparse matrix. Compared to the most previous potential-based approaches, the proposed low rank and sparse constraints are batch spatiotemporal constraints that capture the underlying relationship of dynamic potentials. The resulting optimization problem is solved using alternating direction method of multipliers . Three sets of simulation experiments with eight different ventricular pacing sites demonstrate that the proposed model outperforms the existing Tikhonov regularization (zero-order, second-order) and L1-norm based method at accurately reconstructing the potentials and locating the ventricular pacing sites. Experiments on a total of 39 cases of real PVC data also validate the ability of the proposed method to correctly locate ectopic pacing sites.more » « less