skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: An Alternating Manifold Proximal Gradient Method for Sparse Principal Component Analysis and Sparse Canonical Correlation Analysis
Sparse principal component analysis and sparse canonical correlation analysis are two essential techniques from high-dimensional statistics and machine learning for analyzing large-scale data. Both problems can be formulated as an optimization problem with nonsmooth objective and nonconvex constraints. Because nonsmoothness and nonconvexity bring numerical difficulties, most algorithms suggested in the literature either solve some relaxations of them or are heuristic and lack convergence guarantees. In this paper, we propose a new alternating manifold proximal gradient method to solve these two high-dimensional problems and provide a unified convergence analysis. Numerical experimental results are reported to demonstrate the advantages of our algorithm.  more » « less
Award ID(s):
1915842 1934568 1953189 1953210
NSF-PAR ID:
10181909
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
INFORMS Journal on Optimization
ISSN:
2575-1484
Page Range / eLocation ID:
ijoo.2019.0032
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Sparse principal component analysis is an important technique for simultaneous dimensionality reduction and variable selection with high-dimensional data. In this work we combine the unique geometric structure of the sparse principal component analysis problem with recent advances in convex optimization to develop novel gradient-based sparse principal component analysis algorithms. These algorithms enjoy the same global convergence guarantee as the original alternating direction method of multipliers, and can be more efficiently implemented with the rich toolbox developed for gradient methods from the deep learning literature. Most notably, these gradient-based algorithms can be combined with stochastic gradient descent methods to produce efficient online sparse principal component analysis algorithms with provable numerical and statistical performance guarantees. The practical performance and usefulness of the new algorithms are demonstrated in various simulation studies. As an application, we show how the scalability and statistical accuracy of our method enable us to find interesting functional gene groups in high-dimensional RNA sequencing data.

     
    more » « less
  2. We consider the problem of inferring the conditional independence graph (CIG) of a sparse, high-dimensional, stationary matrix-variate Gaussian time series. All past work on high-dimensional matrix graphical models assumes that independent and identically distributed (i.i.d.) observations of the matrix-variate are available. Here we allow dependent observations. We consider a sparse-group lasso-based frequency-domain formulation of the problem with a Kronecker-decomposable power spectral density (PSD), and solve it via an alternating direction method of multipliers (ADMM) approach. The problem is biconvex which is solved via flip-flop optimization. We provide sufficient conditions for local convergence in the Frobenius norm of the inverse PSD estimators to the true value. This result also yields a rate of convergence. We illustrate our approach using numerical examples utilizing both synthetic and real data. 
    more » « less
  3. We consider the problem of inferring the conditional independence graph (CIG) of a sparse, high-dimensional, stationary matrix-variate Gaussian time series. All past work on matrix graphical models assume that i.i.d. observations of matrix-variate are available. Here we allow dependent observations. We consider a sparse-group lasso based frequency-domain formulation of the problem with a Kronecker-decomposable power spectral density (PSD), and solve it via an alternating direction method of multipliers (ADMM) approach. The problem is bi-convex which is solved via flip-flop optimization. We provide sufficient conditions for local convergence in the Frobenius norm of the inverse PSD estimators to the true value. This results also yields a rate of convergence. We illustrate our approach using numerical examples. 
    more » « less
  4. Sparse learning models have shown promising performance in the high dimensional machine learning applications. The main challenge of sparse learning models is how to optimize it efficiently. Most existing methods solve this problem by relaxing it as a convex problem, incurring large estimation bias. Thus, the sparse learning model with nonconvex constraint has attracted much attention due to its better performance. But it is difficult to optimize due to the non-convexity.In this paper, we propose a linearly convergent stochastic second-order method to optimize this nonconvex problem for large-scale datasets. The proposed method incorporates second-order information to improve the convergence speed. Theoretical analysis shows that our proposed method enjoys linear convergence rate and guarantees to converge to the underlying true model parameter. Experimental results have verified the efficiency and correctness of our proposed method.

     
    more » « less
  5. Convolutional operator learning is gaining attention in many signal processing and computer vision applications. Learning kernels has mostly relied on so-called patch-domain approaches that extract and store many overlapping patches across training signals. Due to memory demands, patch-domain methods have limitations when learning kernels from large datasets – particularly with multi-layered structures, e.g., convolutional neural networks – or when applying the learned kernels to high-dimensional signal recovery problems. The so-called convolution approach does not store many overlapping patches, and thus overcomes the memory problems particularly with careful algorithmic designs; it has been studied within the “synthesis” signal model, e.g., convolutional dictionary learning. This paper proposes a new convolutional analysis operator learning (CAOL) framework that learns an analysis sparsifying regularizer with the convolution perspective, and develops a new convergent Block Proximal Extrapolated Gradient method using a Majorizer (BPEG-M) to solve the corresponding block multi-nonconvex problems. To learn diverse filters within the CAOL framework, this paper introduces an orthogonality constraint that enforces a tight-frame filter condition, and a regularizer that promotes diversity between filters. Numerical experiments show that, with sharp majorizers, BPEG-M significantly accelerates the CAOL convergence rate compared to the state-of-the-art block proximal gradient (BPG) method. Numerical experiments for sparse-view computational tomography show that a convolutional sparsifying regularizer learned via CAOL significantly improves reconstruction quality compared to a conventional edge-preserving regularizer. Using more and wider kernels in a learned regularizer better preserves edges in reconstructed images. 
    more » « less