Gradient sampling (GS) methods for the minimization of objective functions that may be nonconvex and/or nonsmooth are proposed, analyzed, and tested. One of the most computationally expensive components of contemporary GS methods is the need to solve a convex quadratic subproblem in each iteration. By contrast, the methods proposed in this paper allow the use of inexact solutions of these subproblems, which, as proved in the paper, can be incorporated without the loss of theoretical convergence guarantees. Numerical experiments show that, by exploiting inexact subproblem solutions, one can consistently reduce the computational effort required by a GS method. Additionally, a strategy is proposed for aggregating gradient information after a subproblem is solved (potentially inexactly) as has been exploited in bundle methods for nonsmooth optimization. It is proved that the aggregation scheme can be introduced without the loss of theoretical convergence guarantees. Numerical experiments show that incorporating this gradient aggregation approach can also reduce the computational effort required by a GS method.
more »
« less
This content will become publicly available on March 25, 2025
A new inexact gradient descent method with applications to nonsmooth convex optimization
The paper proposes and develops a novel inexact gradient method (IGD) for minimizing smooth
functions with Lipschitzian gradients. We show that the sequence of
gradients generated by IGD converges to zero. The convergence of iterates to stationary points is guaranteed
under the Kurdyka- Lojasiewicz property of the objective function with convergence rates depending on
the KL exponent. The newly developed IGD is applied to designing two novel gradient-based methods of nonsmooth convex optimization such as the inexact proximal point methods (GIPPM) and the inexact augmented
Lagrangian method (GIALM) for convex programs with linear equality constraints. These two methods inherit
global convergence properties from IGD and are confirmed by numerical experiments to have practical advantages over some well-known algorithms of nonsmooth convex optimization
more »
« less
- Award ID(s):
- 2204519
- PAR ID:
- 10515872
- Publisher / Repository:
- Taylor & Francis
- Date Published:
- Journal Name:
- Optimization Methods and Software
- ISSN:
- 1055-6788
- Page Range / eLocation ID:
- 1 to 29
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Many modern large-scale and distributed optimization problems can be cast into a form in which the objective function is a sum of a smooth term and a nonsmooth regularizer. Such problems can be solved via a proximal gradient method which generalizes standard gradient descent to a nonsmooth setup. In this paper, we leverage the tools from control theory to study global convergence of proximal gradient flow algorithms. We utilize the fact that the proximal gradient algorithm can be interpreted as a variable-metric gradient method on the forward-backward envelope. This continuously differentiable function can be obtained from the augmented Lagrangian associated with the original nonsmooth problem and it enjoys a number of favorable properties. We prove that global exponential convergence can be achieved even in the absence of strong convexity. Moreover, for in-network optimization problems, we provide a distributed implementation of the gradient flow dynamics based on the proximal augmented Lagrangian and prove global exponential stability for strongly convex problems.more » « less
-
arXiv:2311.11180v1 (Ed.)This paper presents a subgradient-based algorithm for constrained nonsmooth convex optimization that does not require projections onto the feasible set. While the well-established Frank-Wolfe algorithm and its variants already avoid projections, they are primarily designed for smooth objective functions. In con- trast, our proposed algorithm can handle nonsmooth problems with general convex functional inequality constraints. It achieves an ϵ-suboptimal solution in O(ϵ^−2) iterations, with each iteration requiring only a single (potentially inexact) Linear Minimization Oracle (LMO) call and a (possibly inexact) subgra- dient computation. This performance is consistent with existing lower bounds. Similar performance is observed when deterministic subgradients are replaced with stochastic subgradients. In the special case where there are no functional inequality constraints, our algorithm competes favorably with a recent nonsmooth projection-free method designed for constraint-free problems. Our approach uti- lizes a simple separation scheme in conjunction with a new Lagrange multiplier update rule.more » « less
-
Stochastic gradient descent (SGD) is one of the most widely used optimization methods for parallel and distributed processing of large datasets. One of the key limitations of distributed SGD is the need to regularly communicate the gradients between different computation nodes. To reduce this communication bottleneck, recent work has considered a one-bit variant of SGD, where only the sign of each gradient element is used in optimization. In this paper, we extend this idea by proposing a stochastic variant of the proximal-gradient method that also uses one-bit per update element. We prove the theoretical convergence of the method for non-convex optimization under a set of explicit assumptions. Our results indicate that the compressed method can match the convergence rate of the uncompressed one, making the proposed method potentially appealing for distributed processing of large datasets.more » « less
-
This paper proposes and develops new linesearch methods with inexact gradient information for finding stationary points of nonconvex continuously differentiable functions on finite-dimensional spaces. Some abstract convergence results for a broad class of linesearch methods are established. A general scheme for inexact reduced gradient (IRG) methods is proposed, where the errors in the gradient approximation automatically adapt with the magnitudes of the exact gradients. The sequences of iterations are shown to obtain stationary accumulation points when different stepsize selections are employed. Convergence results with constructive convergence rates for the developed IRG methods are established under the Kurdyka–Łojasiewicz property. The obtained results for the IRG methods are confirmed by encouraging numerical experiments, which demonstrate advantages of automatically controlled errors in IRG methods over other frequently used error selections.more » « less