In this paper, we present an accelerated quasiNewton proximal extragradient method for solving unconstrained smooth convex optimization problems. With access only to the gradients of the objective function, we prove that our method can achieve a convergence rate of ${\bigO}\bigl(\min\{\frac{1}{k^2}, \frac{\sqrt{d\log k}}{k^{2.5}}\}\bigr)$, where $d$ is the problem dimension and $k$ is the number of iterations. In particular, in the regime where $k = \bigO(d)$, our method matches the \emph{optimal rate} of $\mathcal{O}(\frac{1}{k^2})$ by Nesterov's accelerated gradient (NAG). Moreover, in the the regime where $k = \Omega(d \log d)$, it outperforms NAG and converges at a \emph{faster rate} of $\mathcal{O}\bigl(\frac{\sqrt{d\log k}}{k^{2.5}}\bigr)$. To the best of our knowledge, this result is the first to demonstrate a provable gain for a quasiNewtontype method over NAG in the convex setting. To achieve such results, we build our method on a recent variant of the MonteiroSvaiter acceleration framework and adopt an online learning perspective to update the Hessian approximation matrices, in which we relate the convergence rate of our method to the dynamic regret of a specific online convex optimization problem in the space of matrices.
more »
« less
This content will become publicly available on May 13, 2025
Krylov Cubic Regularized Newton: A Subspace SecondOrder Method with DimensionFree Convergence Rate
Secondorder optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in highdimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second order updates within a lowerdimensional subspace, giving rise to \textit{subspace secondorder} methods. However, the majority of existing subspace secondorder methods randomly select subspaces, consequently resulting in slower convergence rates depending on the problem's dimension $d$. In this paper, we introduce a novel subspace cubic regularized Newton method that achieves a dimensionindependent global convergence rate of $\bigO\left(\frac{1}{mk}+\frac{1}{k^2}\right)$ for solving convex optimization problems. Here, $m$ represents the subspace dimension, which can be significantly smaller than $d$. Instead of adopting a random subspace, our primary innovation involves performing the cubic regularized Newton update within the \emph{Krylov subspace} associated with the Hessian and the gradient of the objective function. This result marks the first instance of a dimensionindependent convergence rate for a subspace secondorder method. Furthermore, when specific spectral conditions of the Hessian are met, our method recovers the convergence rate of a fulldimensional cubic regularized Newton method. Numerical experiments show our method converges faster than existing random subspace methods, especially for highdimensional problems.
more »
« less
 Award ID(s):
 2007668
 NSFPAR ID:
 10526935
 Publisher / Repository:
 PMLR
 Date Published:
 ISSN:
 26403498
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We propose a stochastic variancereduced cubic regularized Newton method (SVRC) for nonconvex optimization. At the core of our algorithm is a novel semistochastic gradient along with a semistochastic Hessian, which are specifically designed for cubic regularization method. We show that our algorithm is guaranteed to converge to an $(\epsilon,\sqrt{\epsilon})$approximate local minimum within $\tilde{O}(n^{4/5}/\epsilon^{3/2})$ secondorder oracle calls, which outperforms the stateoftheart cubic regularization algorithms including subsampled cubic regularization. Our work also sheds light on the application of variance reduction technique to highorder nonconvex optimization methods. Thorough experiments on various nonconvex optimization problems support our theory.more » « less

We propose a randomized algorithm with quadratic convergence rate for convex optimization problems with a selfconcordant, composite, strongly convex objective function. Our method is based on performing an approximate Newton step using a random projection of the Hessian. Our first contribution is to show that, at each iteration, the embedding dimension (or sketch size) can be as small as the effective dimension of the Hessian matrix. Leveraging this novel fundamental result, we design an algorithm with a sketch size proportional to the effective dimension and which exhibits a quadratic rate of convergence. This result dramatically improves on the classical linearquadratic convergence rates of stateoftheart subsampled Newton methods. However, in most practical cases, the effective dimension is not known beforehand, and this raises the question of how to pick a sketch size as small as the effective dimension while preserving a quadratic convergence rate. Our second and main contribution is thus to propose an adaptive sketch size algorithm with quadratic convergence rate and which does not require prior knowledge or estimation of the effective dimension: at each iteration, it starts with a small sketch size, and increases it until quadratic progress is achieved. Importantly, we show that the embedding dimension remains proportional to the effective dimension throughout the entire path and that our method achieves stateoftheart computational complexity for solving convex optimization programs with a strongly convex component. We discuss and illustrate applications to linear and quadratic programming, as well as logistic regression and other generalized linear models.more » « less

QuasiNewton algorithms are among the most popular iterative methods for solving unconstrained minimization problems, largely due to their favorable superlinear convergence property. However, existing results for these algorithms are limited as they provide either (i) a global convergence guarantee with an asymptotic superlinear convergence rate, or (ii) a local nonasymptotic superlinear rate for the case that the initial point and the initial Hessian approximation are chosen properly. In particular, no current analysis for quasiNewton methods guarantees global convergence with an explicit superlinear convergence rate. In this paper, we close this gap and present the first globally convergent quasiNewton method with an explicit non asymptotic superlinear convergence rate. Unlike classical quasiNewton methods, we build our algorithm upon the hybrid proximal extragradient method and propose a novel online learning framework for updating the Hessian approximation matrices. Specifically, guided by the convergence analysis, we formulate the Hessian approximation update as an online convex optimization problem in the space of matrices, and we relate the bounded regret of the online problem to the superlinear convergence of our method.more » « less

We propose a continuoustime secondorder optimization algorithm for solving unconstrained convex optimization problems with bounded Hessian. We show that this alternative algorithm has a comparable convergence rate to that of the continuoustime Newton–Raphson method, however structurally, it is amenable to a more efficient distributed implementation. We present a distributed implementation of our proposed optimization algorithm and prove its convergence via Lyapunov analysis. A numerical example demonstrates our results.more » « less