skip to main content

Title: A Primal-Dual Analysis of Global Optimality in Nonconvex Low-Rank Matrix Recovery
We propose a primal-dual based framework for analyzing the global optimality of nonconvex low-rank matrix recovery. Our analysis are based on the restricted strongly convex and smooth conditions, which can be verified for a broad family of loss functions. In addition, our analytic framework can directly handle the widely-used incoherence constraints through the lens of duality. We illustrate the applicability of the proposed framework to matrix completion and one-bit matrix completion, and prove that all these problems have no spurious local minima. Our results not only improve the sample complexity required for characterizing the global optimality of matrix completion, but also resolve an open problem in Ge et al. (2017) regarding one-bit matrix completion. Numerical experiments show that primal-dual based algorithm can successfully recover the global optimum for various low-rank problems.
Authors:
; ; ;
Award ID(s):
1652539 1618948
Publication Date:
NSF-PAR ID:
10063547
Journal Name:
International Conference on Machine Learning
Sponsoring Org:
National Science Foundation
More Like this
  1. Aardal, Karen ; Sanità, Laura (Ed.)
    This paper considers the interplay between semidefinite programming, matrix rank, and graph coloring. Karger, Motwani, and Sudan [10] give a vector program for which a coloring of the graph can be encoded as a semidefinite matrix of low rank. By complementary slackness conditions of semidefinite programming, if an optimal dual solution has sufficiently high rank, any optimal primal solution must have low rank. We attempt to characterize graphs for which we can show that the corresponding dual optimal solution must have sufficiently high rank. In the case of the original Karger, Motwani, and Sudan vector program, we show that any graph which is a k-tree has sufficiently high dual rank, and we can extract the coloring from the corresponding low-rank primal solution. We can also show that if the graph is not uniquely colorable, then no sufficiently high rank dual optimal solution can exist. This allows us to completely characterize the planar graphs for which dual optimal solutions have sufficiently high dual rank, since it is known that the uniquely colorable planar graphs are precisely the planar 3-trees. We then modify the semidefinite program to have an objective function with costs, and explore when we can create a cost functionmore »whose optimal dual solution has sufficiently high rank. We show that it is always possible to construct such a cost function given the graph coloring. The construction of the cost function gives rise to a heuristic for graph coloring which we show works well in the case of planar graphs; we enumerated all maximal planar graphs with a K4 of up to 14 vertices, and the heuristics successfully colored 99.75% of them. Our research was motivated by the Colin de Verdière graph invariant [5] (and a corresponding conjecture of Colin de Verdière), in which matrices that have some similarities to the dual feasible matrices must have high rank in the case that graphs are of a certain type; for instance, planar graphs have rank that would imply the 4-colorability of the primal solution. We explore the connection between the conjecture and the rank of the dual solutions.« less
  2. null (Ed.)
    Abstract One of the classical approaches for estimating the frequencies and damping factors in a spectrally sparse signal is the MUltiple SIgnal Classification (MUSIC) algorithm, which exploits the low-rank structure of an autocorrelation matrix. Low-rank matrices have also received considerable attention recently in the context of optimization algorithms with partial observations, and nuclear norm minimization (NNM) has been widely used as a popular heuristic of rank minimization for low-rank matrix recovery problems. On the other hand, it has been shown that NNM can be viewed as a special case of atomic norm minimization (ANM), which has achieved great success in solving line spectrum estimation problems. However, as far as we know, the general ANM (not NNM) considered in many existing works can only handle frequency estimation in undamped sinusoids. In this work, we aim to fill this gap and deal with damped spectrally sparse signal recovery problems. In particular, inspired by the dual analysis used in ANM, we offer a novel optimization-based perspective on the classical MUSIC algorithm and propose an algorithm for spectral estimation that involves searching for the peaks of the dual polynomial corresponding to a certain NNM problem, and we show that this algorithm is in factmore »equivalent to MUSIC itself. Building on this connection, we also extend the classical MUSIC algorithm to the missing data case. We provide exact recovery guarantees for our proposed algorithms and quantify how the sample complexity depends on the true spectral parameters. In particular, we provide a parameter-specific recovery bound for low-rank matrix recovery of jointly sparse signals rather than use certain incoherence properties as in existing literature. Simulation results also indicate that the proposed algorithms significantly outperform some relevant existing methods (e.g., ANM) in frequency estimation of damped exponentials.« less
  3. The matrix completion problem seeks to recover a $d\times d$ ground truth matrix of low rank $r\ll d$ from observations of its individual elements. Real-world matrix completion is often a huge-scale optimization problem, with $d$ so large that even the simplest full-dimension vector operations with $O(d)$ time complexity become prohibitively expensive. Stochastic gradient descent (SGD) is one of the few algorithms capable of solving matrix completion on a huge scale, and can also naturally handle streaming data over an evolving ground truth. Unfortunately, SGD experiences a dramatic slow-down when the underlying ground truth is ill-conditioned; it requires at least $O(\kappa\log(1/\epsilon))$ iterations to get $\epsilon$-close to ground truth matrix with condition number $\kappa$. In this paper, we propose a preconditioned version of SGD that preserves all the favorable practical qualities of SGD for huge-scale online optimization while also making it agnostic to $\kappa$. For a symmetric ground truth and the Root Mean Square Error (RMSE) loss, we prove that the preconditioned SGD converges to $\epsilon$-accuracy in $O(\log(1/\epsilon))$ iterations, with a rapid linear convergence rate as if the ground truth were perfectly conditioned with $\kappa=1$. In our numerical experiments, we observe a similar acceleration for ill-conditioned matrix completion under the 1-bit cross-entropymore »loss, as well as pairwise losses such as the Bayesian Personalized Ranking (BPR) loss.« less
  4. We present a new perspective on graph based methods for collaborative ranking for recommender systems. Unlike user-based or item-based methods that compute a weighted average of ratings given by the nearest neighbors, or low-rank approximation methods using convex optimization and the nuclear norm, we formulate matrix completion as a series of semi-supervised learning problems, and propagate the known ratings to the missing ones on the user-user or item-item graph globally. The semi-supervised learning problems are expressed as Laplace-Beltrami equations on a manifold, or namely, harmonic extension, and can be discretized by a point integral method. Our approach, named LDM (low dimensional manifold), does not impose a low-rank Euclidean subspace on the data points, but instead minimizes the dimension of the underlying manifold. It turns out to be particularly effective in generating rankings of items, showing decent computational efficiency and robust ranking quality compared to state-of-the-art methods.
  5. We address the problem of high-rank matrix completion with side information. In contrast to existing work dealing with side information, which assume that the data matrix is low-rank, we consider the more general scenario where the columns of the data matrix are drawn from a union of low-dimensional subspaces, which can lead to a high rank matrix. Our goal is to complete the matrix while taking advantage of the side information. To do so, we use the self-expressive property of the data, searching for a sparse representation of each column of matrix as a combination of a few other columns. More specifically, we propose a factorization of the data matrix as the product of side information matrices with an unknown interaction matrix, under which each column of the data matrix can be reconstructed using a sparse combination of other columns. As our proposed optimization, searching for missing entries and sparse coefficients, is non-convex and NP-hard, we propose a lifting framework, where we couple sparse coefficients and missing values and define an equivalent optimization that is amenable to convex relaxation. We also propose a fast implementation of our convex framework using a Linearized Alternating Direction Method. By extensive experiments on bothmore »synthetic and real data, and, in particular, by studying the problem of multi-label learning, we demonstrate that our method outperforms existing techniques in both low-rank and high-rank data regimes« less