skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.

Title: Advances in neural information processing systems
This paper extends robust principal component analysis (RPCA) to nonlinear manifolds. Suppose that the observed data matrix is the sum of a sparse component and a component drawn from some low dimensional manifold. Is it possible to separate them by using similar ideas as RPCA? Is there any benefit in treating the manifold as a whole as opposed to treating each local region independently? We answer these two questions affirmatively by proposing and analyzing an optimization framework that separates the sparse component from the manifold under noisy data. Theoretical error bounds are provided when the tangent spaces of the manifold satisfy certain incoherence conditions. We also provide a near optimal choice of the tuning parameters for the proposed optimization formulation with the help of a new curvature estimation method. The efficacy of our method is demonstrated on both synthetic and real datasets.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Advances in neural information processing systems
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Sparse principal component analysis and sparse canonical correlation analysis are two essential techniques from high-dimensional statistics and machine learning for analyzing large-scale data. Both problems can be formulated as an optimization problem with nonsmooth objective and nonconvex constraints. Because nonsmoothness and nonconvexity bring numerical difficulties, most algorithms suggested in the literature either solve some relaxations of them or are heuristic and lack convergence guarantees. In this paper, we propose a new alternating manifold proximal gradient method to solve these two high-dimensional problems and provide a unified convergence analysis. Numerical experimental results are reported to demonstrate the advantages of our algorithm. 
    more » « less
  2. This paper proposes a data-driven method to pinpoint the source of a new emerging dynamical phenomenon in the power grid, referred to “forced oscillations” in the difficult but highly risky case where there is a resonance phenomenon. By exploiting the low-rank and sparse properties of synchrophasor measurements, the localization problem is formulated as a matrix decomposition problem, which can be efficiently solved by the exact augmented Lagrange multiplier algorithm. An online detection scheme is developed based on the problem formulation. The data-driven nature of the proposed method allows for a very efficient implementation. The efficacy of the proposed method is illustrated in a 68-bus power system. The proposed method may possibly be more broadly useful in other situations for identifying the source of forced oscillations in resonant systems. Index Terms—Forced oscillations, resonant systems, phasor measurement unit (PMU), robust principal component analysis (RPCA), Big Data. 
    more » « less
  3. Due to their transient nature, clouds represent anomalies relative to the underlying landscape of interest. Hence, the challenge of cloud identification can be considered a specific case in the more general problem of anomaly detection. The confounding effects of transient anomalies are particularly troublesome for spatiotemporal analysis of land surface processes. While spatiotemporal characterization provides a statistical basis to quantify the most significant temporal patterns and their spatial distributions without the need for a priori assumptions about the observed changes, the presence of transient anomalies can obscure the statistical properties of the spatiotemporal processes of interest. The objective of this study is to implement and evaluate a robust approach to distinguish clouds and other transient anomalies from diurnal and annual thermal cycles observed with time-lapse thermography. The approach uses Robust Principal Component Analysis (RPCA) to statistically distinguish low-rank (L) and sparse (S) components of the land surface temperature image time series, followed by a spatiotemporal characterization of its low rank component to quantify the dominant diurnal and annual thermal cycles in the study area. RPCA effectively segregates clouds, sensor anomalies, swath gaps, geospatial displacements and transient thermal anomalies into the sparse component time series. Spatiotemporal characterization of the low-rank component time series clearly resolves a variety of diurnal and annual thermal cycles for different land covers and water bodies while segregating transient anomalies potentially of interest.

    more » « less
  4. Abstract

    Measured intensity in high-energy monochromatic X-ray diffraction (HEXD) experiments provides information regarding the microstructure of the crystalline material under study. The location of intensity on an areal detector is determined by the lattice spacing and orientation of crystals so that changes in theheterogeneityof these quantities are reflected in the spreading of diffraction peaks over time. High temporal resolution of such dynamics can now be experimentally observed using technologies such as the mixed-mode pixel array detector (MM-PAD) which facilitates in situ dynamic HEXD experiments to study plasticity and its underlying mechanisms. In this paper, we define and demonstrate a feature computed directly from such diffraction time series data quantifying signal spread in a manner that is correlated with plastic deformation of the sample. A distinguishing characteristic of the analysis is the capability to describe the evolution from the distinct diffraction peaks of an undeformed alloy sample through to the non-uniform Debye–Scherrer rings developed upon significant plastic deformation. We build on our previous work modeling data using an overcomplete dictionary by treating temporal measurements jointly to improve signal spread recovery. We demonstrate our approach in simulations and on experimental HEXD measurements captured using the MM-PAD. Our method for characterizing the temporal evolution of signal spread is shown to provide an informative means of data analysis that adds to the capabilities of existing methods. Our work draws on ideas from convolutional sparse coding and requires solving a coupled convex optimization problem based on the alternating direction method of multipliers.

    more » « less
  5. Spectral clustering is one of the fundamental unsupervised learning methods and is widely used in data analysis. Sparse spectral clustering (SSC) imposes sparsity to the spectral clustering, and it improves the interpretability of the model. One widely adopted model for SSC in the literature is an optimization problem over the Stiefel manifold with nonsmooth and nonconvex objective. Such an optimization problem is very challenging to solve. Existing methods usually solve its convex relaxation or need to smooth its nonsmooth objective using certain smoothing techniques. Therefore, they were not targeting solving the original formulation of SSC. In this paper, we propose a manifold proximal linear method (ManPL) that solves the original SSC formulation without twisting the model. We also extend the algorithm to solve multiple-kernel SSC problems, for which an alternating ManPL algorithm is proposed. Convergence and iteration complexity results of the proposed methods are established. We demonstrate the advantage of our proposed methods over existing methods via clustering of several data sets, including University of California Irvine and single-cell RNA sequencing data sets. 
    more » « less