skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Global exponential convergence of gradient methods over the nonconvex landscape of the linear quadratic regulator
— In large-scale and model-free settings, first-order algorithms are often used in an attempt to find the optimal control action without identifying the underlying dynamics. The convergence properties of these algorithms remain poorly understood because of nonconvexity. In this paper, we revisit the continuous-time linear quadratic regulator problem and take a step towards demystifying the efficiency of gradient-based strategies. Despite the lack of convexity, we establish a linear rate of convergence to the globally optimal solution for the gradient descent algorithm. The key component of our analysis is that we relate the gradient-flow dynamics associated with the nonconvex formulation to that of a convex reparameterization. This allows us to provide convergence guarantees for the nonconvex approach from its convex counterpart.  more » « less
Award ID(s):
1846369
PAR ID:
10132892
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2019 IEEE 58th Conference on Decision and Control (CDC)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Stochastic (sub)gradient methods require step size schedule tuning to perform well in practice. Classical tuning strategies decay the step size polynomially and lead to optimal sublinear rates on (strongly) convex problems. An alternative schedule, popular in nonconvex optimization, is called geometric step decay and proceeds by halving the step size after every few epochs. In recent work, geometric step decay was shown to improve exponentially upon classical sublinear rates for the class of sharp convex functions. In this work, we ask whether geometric step decay similarly improves stochastic algorithms for the class of sharp weakly convex problems. Such losses feature in modern statistical recovery problems and lead to a new challenge not present in the convex setting: the region of convergence is local, so one must bound the probability of escape. Our main result shows that for a large class of stochastic, sharp, nonsmooth, and nonconvex problems a geometric step decay schedule endows well-known algorithms with a local linear (or nearly linear) rate of convergence to global minimizers. This guarantee applies to the stochastic projected subgradient, proximal point, and prox-linear algorithms. As an application of our main result, we analyze two statistical recovery tasks—phase retrieval and blind deconvolution—and match the best known guarantees under Gaussian measurement models and establish new guarantees under heavy-tailed distributions. 
    more » « less
  2. In practical instances of nonconvex matrix factorization, the rank of the true solution r^{\star} is often unknown, so the rank rof the model can be over-specified as r>r^{\star}. This over-parameterized regime of matrix factorization significantly slows down the convergence of local search algorithms, from a linear rate with r=r^{\star} to a sublinear rate when r>r^{\star}. We propose an inexpensive preconditioner for the matrix sensing variant of nonconvex matrix factorization that restores the convergence rate of gradient descent back to linear, even in the over-parameterized case, while also making it agnostic to possible ill-conditioning in the ground truth. Classical gradient descent in a neighborhood of the solution slows down due to the need for the model matrix factor to become singular. Our key result is that this singularity can be corrected by \ell_{2} regularization with a specific range of values for the damping parameter. In fact, a good damping parameter can be inexpensively estimated from the current iterate. The resulting algorithm, which we call preconditioned gradient descent or PrecGD, is stable under noise, and converges linearly to an information theoretically optimal error bound. Our numerical experiments find that PrecGD works equally well in restoring the linear convergence of other variants of nonconvex matrix factorization in the over-parameterized regime. 
    more » « less
  3. We consider using gradient descent to minimize the nonconvex function $$f(X)=\phi(XX^{T})$$ over an $$n\times r$$ factor matrix $$X$$, in which $$\phi$$ is an underlying smooth convex cost function defined over $$n\times n$$ matrices. While only a second-order stationary point $$X$$ can be provably found in reasonable time, if $$X$$ is additionally \emph{rank deficient}, then its rank deficiency certifies it as being globally optimal. This way of certifying global optimality necessarily requires the search rank $$r$$ of the current iterate $$X$$ to be \emph{overparameterized} with respect to the rank $$r^{\star}$ of the global minimizer $$X^{\star}$$. Unfortunately, overparameterization significantly slows down the convergence of gradient descent, from a linear rate with $$r=r^{\star}$$ to a sublinear rate when $$r>r^{\star}$$, even when $$\phi$$ is strongly convex. In this paper, we propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case, while also making it agnostic to possible ill-conditioning in the global minimizer $$X^{\star}$$. 
    more » « less
  4. We introduce a generic scheme to solve nonconvex optimization problems using gradient-based algorithms originally designed for minimizing convex functions. Even though these methods may originally require convexity to operate, the proposed approach allows one to use them without assuming any knowledge about the convexity of the objective. In general, the scheme is guaranteed to produce a stationary point with a worst-case efficiency typical of first-order methods, and when the objective turns out to be convex, it automatically accelerates in the sense of Nesterov and achieves near-optimal convergence rate in function values. We conclude the paper by showing promising experimental results obtained by applying our approach to incremental algorithms such as SVRG and SAGA for sparse matrix factorization and for learning neural networks 
    more » « less
  5. Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)
    We study accelerated optimization methods in the Gaussian phase retrieval problem. In this setting, we prove that gradient methods with Polyak or Nesterov momentum have similar implicit regularization to gradient descent. This implicit regularization ensures that the algorithms remain in a nice region, where the cost function is strongly convex and smooth despite being nonconvex in general. This ensures that these accelerated methods achieve faster rates of convergence than gradient descent. Experimental evidence demonstrates that the accelerated methods converge faster than gradient descent in practice. 
    more » « less