Gradient sampling (GS) methods for the minimization of objective functions that may be nonconvex and/or nonsmooth are proposed, analyzed, and tested. One of the most computationally expensive components of contemporary GS methods is the need to solve a convex quadratic subproblem in each iteration. By contrast, the methods proposed in this paper allow the use of inexact solutions of these subproblems, which, as proved in the paper, can be incorporated without the loss of theoretical convergence guarantees. Numerical experiments show that, by exploiting inexact subproblem solutions, one can consistently reduce the computational effort required by a GS method. Additionally, a strategy is proposed for aggregating gradient information after a subproblem is solved (potentially inexactly) as has been exploited in bundle methods for nonsmooth optimization. It is proved that the aggregation scheme can be introduced without the loss of theoretical convergence guarantees. Numerical experiments show that incorporating this gradient aggregation approach can also reduce the computational effort required by a GS method.
more »
« less
A new inexact gradient descent method with applications to nonsmooth convex optimization
The paper proposes and develops a novel inexact gradient method (IGD) for minimizing smooth functions with Lipschitzian gradients. We show that the sequence of gradients generated by IGD converges to zero. The convergence of iterates to stationary points is guaranteed under the Kurdyka- Lojasiewicz property of the objective function with convergence rates depending on the KL exponent. The newly developed IGD is applied to designing two novel gradient-based methods of nonsmooth convex optimization such as the inexact proximal point methods (GIPPM) and the inexact augmented Lagrangian method (GIALM) for convex programs with linear equality constraints. These two methods inherit global convergence properties from IGD and are confirmed by numerical experiments to have practical advantages over some well-known algorithms of nonsmooth convex optimization
more »
« less
- Award ID(s):
- 2204519
- PAR ID:
- 10515872
- Publisher / Repository:
- Taylor & Francis
- Date Published:
- Journal Name:
- Optimization Methods and Software
- ISSN:
- 1055-6788
- Page Range / eLocation ID:
- 1 to 29
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)The full approximation storage (FAS) scheme is a widely used multigrid method for nonlinear problems. In this paper, a new framework to design and analyze FAS-like schemes for convex optimization problems is developed. The new method, the fast subspace descent (FASD) scheme, which generalizes classical FAS, can be recast as an inexact version of nonlinear multigrid methods based on space decomposition and subspace correction. The local problem in each subspace can be simplified to be linear and one gradient descent iteration (with an appropriate step size) is enough to ensure a global linear (geometric) convergence of FASD for convex optimization problems.more » « less
-
Many modern large-scale and distributed optimization problems can be cast into a form in which the objective function is a sum of a smooth term and a nonsmooth regularizer. Such problems can be solved via a proximal gradient method which generalizes standard gradient descent to a nonsmooth setup. In this paper, we leverage the tools from control theory to study global convergence of proximal gradient flow algorithms. We utilize the fact that the proximal gradient algorithm can be interpreted as a variable-metric gradient method on the forward-backward envelope. This continuously differentiable function can be obtained from the augmented Lagrangian associated with the original nonsmooth problem and it enjoys a number of favorable properties. We prove that global exponential convergence can be achieved even in the absence of strong convexity. Moreover, for in-network optimization problems, we provide a distributed implementation of the gradient flow dynamics based on the proximal augmented Lagrangian and prove global exponential stability for strongly convex problems.more » « less
-
Augmented Lagrangian (AL) methods have proven remarkably useful in solving optimization problems with complicated constraints. The last decade has seen the development of overall complexity guarantees for inexact AL variants. Yet, a crucial gap persists in addressing nonsmooth convex constraints. To this end, we present a smoothed augmented Lagrangian (AL) framework where nonsmooth terms are progressively smoothed with a smoothing parameter $$\eta_k$$. The resulting AL subproblems are $$\eta_k$$-smooth, allowing for leveraging accelerated schemes. By a careful selection of the inexactness level (for inexact subproblem resolution), the penalty parameter $$\rho_k$$, and smoothing parameter $$\eta_k$$ at epoch k, we derive rate and complexity guarantees of $$\tilde{\mathcal{O}}(1/\epsilon^{3/2})$$ and $$\tilde{\mathcal{O}}(1/\epsilon)$$ in convex and strongly convex regimes for computing an -optimal solution, when $$\rho_k$$ increases at a geometric rate, a significant improvement over the best available guarantees for AL schemes for convex programs with nonsmooth constraints. Analogous guarantees are developed for settings with $$\rho_k=\rho$$ as well as $$\eta_k=\eta$$. Preliminary numerics on a fused Lasso problem display promise.more » « less
-
One of the beauties of the projected gradient descent method lies in its rather simple mechanism and yet stable behavior with inexact, stochastic gradients, which has led to its wide-spread use in many machine learning applications. However, once we replace the projection operator with a simpler linear program, as is done in the Frank-Wolfe method, both simplicity and stability take a serious hit. The aim of this paper is to bring them back without sacrificing the efficiency. In this paper, we propose the first one-sample stochastic Frank-Wolfe algorithm, called 1-SFW, that avoids the need to carefully tune the batch size, step size, learning rate, and other complicated hyper parameters. In particular, 1-SFW achieves the optimal convergence rate of for reaching an -suboptimal solution in the stochastic convex setting, and a approximate solution for a stochastic monotone DR-submodular maximization problem. Moreover, in a general non-convex setting, 1-SFW finds an -first-order stationary point after at most iterations, achieving the current best known convergence rate. All of this is possible by designing a novel unbiased momentum estimator that governs the stability of the optimization process while using a single sample at each iteration.more » « less
An official website of the United States government

