This work considers the minimization of a general convex function f (X) over the cone of positive semi-definite matrices whose optimal solution X* is of low-rank. Standard first-order convex solvers require performing an eigenvalue decomposition in each iteration, severely limiting their scalability. A natural nonconvex reformulation of the problem factors the variable X into the product of a rectangular matrix with fewer columns and its transpose. For a special class of matrix sensing and completion problems with quadratic objective functions, local search algorithms applied to the factored problem have been shown to be much more efficient and, in spite of being nonconvex, to converge to the global optimum. The purpose of this work is to extend this line of study to general convex objective functions f (X) and investigate the geometry of the resulting factored formulations. Specifically, we prove that when f (X) satisfies the restricted well-conditioned assumption, each critical point of the factored problem either corresponds to the optimal solution X* or a strict saddle where the Hessian matrix has a strictly negative eigenvalue. Such a geometric structure of the factored formulation ensures that many local search algorithms can converge to the global optimum with random initializations.
more »
« less
Global optimality in low-rank matrix optimization
This paper considers the minimization of a general objective function f (X) over the set of non-square n × m matrices where the optimal solution X* is low-rank. To reduce the computational burden, we factorize the variable X into a product of two smaller matrices and optimize over these two matrices instead of X. We analyze the global geometry for a general and yet well-conditioned objective function f (X) whose restricted strong convexity and restricted strong smoothness constants are comparable. In particular, we show that the reformulated objective function has no spurious local minima and obeys the strict saddle property. These geometric properties imply that a number of iterative optimization algorithms (such as gradient descent) can provably solve the factored problem with global convergence.
more »
« less
- Award ID(s):
- 1464205
- PAR ID:
- 10099575
- Date Published:
- Journal Name:
- 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP)
- Page Range / eLocation ID:
- 1275 to 1279
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We study the polyhedral convex hull structure of a mixed-integer set which arises in a class of cardinality-constrained concave submodular minimization problems. This class of problems has an objective function in the form of $f(a^Tx)$, where f is a univariate concave function, a is a non-negative vector, and x is a binary vector of appropriate dimension. Such minimization problems frequently appear in applications that involve risk-aversion or economies of scale. We propose three classes of strong valid linear inequalities for this convex hull and specify their facet conditions when a has two distinct values. We show how to use these inequalities to obtain valid inequalities for general a that contains multiple values. We further provide a complete linear convex hull description for this mixed-integer set when a contains two distinct values and the cardinality constraint upper bound is two. Our computational experiments on the mean-risk optimization problem demonstrate the effectiveness of the proposed inequalities in a branch-and-cut framework.more » « less
-
We consider using gradient descent to minimize the nonconvex function $$f(X)=\phi(XX^{T})$$ over an $$n\times r$$ factor matrix $$X$$, in which $$\phi$$ is an underlying smooth convex cost function defined over $$n\times n$$ matrices. While only a second-order stationary point $$X$$ can be provably found in reasonable time, if $$X$$ is additionally \emph{rank deficient}, then its rank deficiency certifies it as being globally optimal. This way of certifying global optimality necessarily requires the search rank $$r$$ of the current iterate $$X$$ to be \emph{overparameterized} with respect to the rank $$r^{\star}$ of the global minimizer $$X^{\star}$$. Unfortunately, overparameterization significantly slows down the convergence of gradient descent, from a linear rate with $$r=r^{\star}$$ to a sublinear rate when $$r>r^{\star}$$, even when $$\phi$$ is strongly convex. In this paper, we propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case, while also making it agnostic to possible ill-conditioning in the global minimizer $$X^{\star}$$.more » « less
-
We propose a unified framework to solve general low-rank plus sparse matrix recovery problems based on matrix factorization, which covers a broad family of objective functions satisfying the restricted strong convexity and smoothness conditions. Based on projected gradient descent and the double thresholding operator, our proposed generic algorithm is guaranteed to converge to the unknown low-rank and sparse matrices at a locally linear rate, while matching the best-known robustness guarantee (i.e., tolerance for sparsity). At the core of our theory is a novel structural Lipschitz gradient condition for low-rank plus sparse matrices, which is essential for proving the linear convergence rate of our algorithm, and we believe is of independent interest to prove fast rates for general superposition-structured models. We illustrate the application of our framework through two concrete examples: robust matrix sensing and robust PCA. Empirical experiments corroborate our theory.more » « less
-
A function f∶{0,1}n→ {0,1} is called an approximate AND-homomorphism if choosing x,y∈n uniformly at random, we have that f(x∧ y) = f(x)∧ f(y) with probability at least 1−ε, where x∧ y = (x1∧ y1,…,xn∧ yn). We prove that if f∶ {0,1}n → {0,1} is an approximate AND-homomorphism, then f is δ-close to either a constant function or an AND function, where δ(ε) → 0 as ε→ 0. This improves on a result of Nehama, who proved a similar statement in which δ depends on n. Our theorem implies a strong result on judgement aggregation in computational social choice. In the language of social choice, our result shows that if f is ε-close to satisfying judgement aggregation, then it is δ(ε)-close to an oligarchy (the name for the AND function in social choice theory). This improves on Nehama’s result, in which δ decays polynomially with n. Our result follows from a more general one, in which we characterize approximate solutions to the eigenvalue equation f = λ g, where is the downwards noise operator f(x) = y[f(x ∧ y)], f is [0,1]-valued, and g is {0,1}-valued. We identify all exact solutions to this equation, and show that any approximate solution in which f and λ g are close is close to an exact solution.more » « less
An official website of the United States government

