In this paper, we consider a nonconvex unconstrained optimization problem minimizing a twice differentiable objective function with Hölder continuous Hessian. Specifically, we first propose a Newton-conjugate gradient (Newton-CG) method for finding an approximate first- and second-order stationary point of this problem, assuming the associated Hölder parameters are explicitly known. Then, we develop a parameter-free Newton-CG method without requiring any prior knowledge of these parameters. To the best of our knowledge, this method is the first parameter-free second-order method achieving the best-known iteration and operation complexity for finding an approximate first- and second-order stationary point of this problem. Finally, we present preliminary numerical results to demonstrate the superior practical performance of our parameter-free Newton-CG method over a well-known regularized Newton method. Funding: C. He was partially financially supported by the Wallenberg AI, Autonomous Systems and Software Program funded by the Knut and Alice Wallenberg Foundation. H. Huang was partially financially supported by the National Science Foundation [Award IIS-2347592]. Z. Lu was partially financially supported by the National Science Foundation [Award IIS-2211491], the Office of Naval Research [Award N00014-24-1-2702], and the Air Force Office of Scientific Research [Award FA9550-24-1-0343].
more »
« less
Inexact Newton-CG algorithms with complexity guarantees
Abstract We consider variants of a recently developed Newton-CG algorithm for nonconvex problems (Royer, C. W. & Wright, S. J. (2018) Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. SIAM J. Optim., 28, 1448–1477) in which inexact estimates of the gradient and the Hessian information are used for various steps. Under certain conditions on the inexactness measures, we derive iteration complexity bounds for achieving $$\epsilon $$-approximate second-order optimality that match best-known lower bounds. Our inexactness condition on the gradient is adaptive, allowing for crude accuracy in regions with large gradients. We describe two variants of our approach, one in which the step size along the computed search direction is chosen adaptively, and another in which the step size is pre-defined. To obtain second-order optimality, our algorithms will make use of a negative curvature direction on some steps. These directions can be obtained, with high probability, using the randomized Lanczos algorithm. In this sense, all of our results hold with high probability over the run of the algorithm. We evaluate the performance of our proposed algorithms empirically on several machine learning models. Our approach is a first attempt to introduce inexact Hessian and/or gradient information into the Newton-CG algorithm of Royer & Wright (2018, Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. SIAM J. Optim., 28, 1448–1477).
more »
« less
- Award ID(s):
- 2023239
- PAR ID:
- 10475747
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- IMA Journal of Numerical Analysis
- Volume:
- 43
- Issue:
- 3
- ISSN:
- 0272-4979
- Page Range / eLocation ID:
- 1855 to 1897
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We study the problem of finding an 𝜀-first-order stationary point (FOSP) of a smooth function, given access only to gradient information. The best-known gradient query complexity for this task, assuming both the gradient and Hessian of the objective function are Lipschitz continuous, is O(𝜀−7/4). In this work, we propose a method with a gradient complexity of O(𝑑1/4𝜀−13/8), where 𝑑 is the problem dimension, leading to an improved complexity when 𝑑 = O(𝜀−1/2). To achieve this result, we design an optimization algorithm that, underneath, involves solving two online learning problems. Specifically, we first reformulate the task of finding a stationary point for a nonconvex problem as minimizing the regret in an online convex optimization problem, where the loss is determined by the gradient of the objective function. Then, we introduce a novel optimistic quasi-Newton method to solve this online learning problem, with the Hessian approximation update itself framed as an online learning problem in the space of matrices. Beyond improving the complexity bound for achieving an 𝜀-FOSP using a gradient oracle, our result provides the first guarantee suggesting that quasi-Newton methods can potentially outperform gradient descent-type methods in nonconvex settings.more » « less
-
Beattie, C.A.; Benner, P.; Embree, M.; Gugercin, S.; Lefteriu, S. (Ed.)This paper introduces reduced order model (ROM) based Hessian approximations for use in inexact Newton methods for the solution of optimization problems implicitly constrained by a large-scale system, typically a discretization of a partial differential equation (PDE). The direct application of an inexact Newton method to this problem requires the solution of many PDEs per optimization iteration. To reduce the computational complexity, a ROM Hessian approximation is proposed. Since only the Hessian is approximated, but the original objective function and its gradient is used, the resulting inexact Newton method maintains the first-order global convergence property, under suitable assumptions. Thus even computationally inexpensive lower fidelity ROMs can be used, which is different from ROM approaches that replace the original optimization problem by a sequence of ROM optimization problem and typically need to accurately approximate function and gradient information of the original problem. In the proposed approach, the quality of the ROM Hessian approximation determines the rate of convergence, but not whether the method converges. The projection based ROM is constructed from state and adjoint snapshots, and is relatively inexpensive to compute. Numerical examples on semilinear parabolic optimal control problems demonstrate that the proposed approach can lead to substantial savings in terms of overall PDE solves required.more » « less
-
Quantum linear system algorithms (QLSAs) have the potential to speed up algorithms that rely on solving linear systems. Interior point methods (IPMs) yield a fundamental family of polynomial-time algorithms for solving optimization problems. IPMs solve a Newton linear system at each iteration to compute the search direction; thus, QLSAs can potentially speed up IPMs. Due to the noise in contemporary quantum computers, quantum-assisted IPMs (QIPMs) only admit an inexact solution to the Newton linear system. Typically, an inexact search direction leads to an infeasible solution, so, to overcome this, we propose an inexact-feasible QIPM (IF-QIPM) for solving linearly constrained quadratic optimization problems. We also apply the algorithm to ℓ1-norm soft margin support vector machine (SVM) problems, and demonstrate that our algorithm enjoys a speedup in the dimension over existing approaches. This complexity bound is better than any existing classical or quantum algorithm that produces a classical solution.more » « less
-
We analyze inexact Riemannian gradient descent (RGD) where Riemannian gradients and retractions are inexactly (and cheaply) computed. Our focus is on understanding when inexact RGD converges and what is the complexity in the general nonconvex and constrained setting. We answer these questions in a general framework of tangential Block Majorization-Minimization (tBMM). We establish that tBMM converges to an 𝜖-stationary point within 𝑂(𝜖−2) iterations. Under a mild assumption, the results still hold when the subproblem is solved inexactly in each iteration provided the total optimality gap is bounded. Our general analysis applies to a wide range of classical algorithms with Riemannian constraints including inexact RGD and proximal gradient method on Stiefel manifolds. We numerically validate that tBMM shows improved performance over existing methods when applied to various problems, including nonnegative tensor decomposition with Riemannian constraints, regularized nonnegative matrix factorization, and low-rank matrix recovery problems.more » « less
An official website of the United States government

