NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Generalized Optimistic Methods for Convex-Concave Saddle Point Problems

https://doi.org/10.1137/24M1630475

Jiang, Ruichen; Mokhtari, Aryan (September 2025, SIAM Journal on Optimization)

Free, publicly-accessible full text available September 30, 2026
Non-asymptotic global convergence rates of BFGS with exact line search

https://doi.org/10.1007/s10107-025-02256-7

Jin, Qiujiang; Jiang, Ruichen; Mokhtari, Aryan (August 2025, Mathematical Programming)

Abstract In this paper, we explore the non-asymptotic global convergence rates of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method implemented with exact line search. Notably, due to Dixon’s equivalence result, our findings are also applicable to other quasi-Newton methods in the convex Broyden class employing exact line search, such as the Davidon-Fletcher-Powell (DFP) method. Specifically, we focus on problems where the objective function is strongly convex with Lipschitz continuous gradient and Hessian. Our results hold for any initial point and any symmetric positive definite initial Hessian approximation matrix. The analysis unveils a detailed three-phase convergence process, characterized by distinct linear and superlinear rates, contingent on the iteration progress. Additionally, our theoretical findings demonstrate the trade-offs between linear and superlinear convergence rates for BFGS when we modify the initial Hessian approximation matrix, a phenomenon further corroborated by our numerical experiments.
more » « less
Improved Complexity for Smooth Nonconvex Optimization: A Two-Level Online Learning Approach with Quasi-Newton Methods

https://doi.org/10.1145/3717823.3718308

Jiang, Ruichen; Mokhtari, Aryan; Patitucci, Francisco (June 2025, ACM)

We study the problem of finding an 𝜀-first-order stationary point (FOSP) of a smooth function, given access only to gradient information. The best-known gradient query complexity for this task, assuming both the gradient and Hessian of the objective function are Lipschitz continuous, is O(𝜀−7/4). In this work, we propose a method with a gradient complexity of O(𝑑1/4𝜀−13/8), where 𝑑 is the problem dimension, leading to an improved complexity when 𝑑 = O(𝜀−1/2). To achieve this result, we design an optimization algorithm that, underneath, involves solving two online learning problems. Specifically, we first reformulate the task of finding a stationary point for a nonconvex problem as minimizing the regret in an online convex optimization problem, where the loss is determined by the gradient of the objective function. Then, we introduce a novel optimistic quasi-Newton method to solve this online learning problem, with the Hessian approximation update itself framed as an online learning problem in the space of matrices. Beyond improving the complexity bound for achieving an 𝜀-FOSP using a gradient oracle, our result provides the first guarantee suggesting that quasi-Newton methods can potentially outperform gradient descent-type methods in nonconvex settings.
more » « less
Free, publicly-accessible full text available June 15, 2026
Stochastic Newton Proximal Extragradient Method

Jiang, Ruichen; Dereziński, Michał; Mokhtari, Aryan (December 2024, Advances in Neural Information Processing Systems (NeurIPS 2024))

Stochastic second-order methods are known to achieve fast local convergence in strongly convex optimization by relying on noisy Hessian estimates to precondition the gradient. Yet, most of these methods achieve superlinear convergence only when the stochastic Hessian noise diminishes, requiring an increase in the per-iteration cost as time progresses. Recent work in \cite{na2022hessian} addressed this issue via a Hessian averaging scheme that achieves a superlinear convergence rate without increasing the per-iteration cost. However, the considered method exhibits a slow global convergence rate, requiring up to ~O(κ^2) iterations to reach the superlinear rate of ~O((1/t)^{t/2}), where κ is the problem's condition number. In this paper, we propose a novel stochastic Newton proximal extragradient method that significantly improves these bounds, achieving a faster global linear rate and reaching the same fast superlinear rate in ~O(κ) iterations. We achieve this by developing a novel extension of the Hybrid Proximal Extragradient (HPE) framework, which simultaneously achieves fast global and local convergence rates for strongly convex functions with access to a noisy Hessian oracle.
more » « less
Full Text Available
Non-asymptotic global convergence analysis of BFGS with the Armijo-Wolfe line search

Jin, Qiujiang; Jiang, Ruichen; Mokhtari, Aryan (December 2024, NeurIPS)

Full Text Available
Stochastic Newton Proximal Extragradient Method

Jiang, Ruichen; Dereziński, Michał; Mokhtari, Aryan (November 2024, https://doi.org/10.48550/arXiv.2406.01478)

Stochastic second-order methods accelerate local convergence in strongly convex optimization by using noisy Hessian estimates to precondition gradients. However, they typically achieve superlinear convergence only when Hessian noise diminishes, which increases per-iteration costs. Prior work [arXiv:2204.09266] introduced a Hessian averaging scheme that maintains low per-iteration cost while achieving superlinear convergence, but with slow global convergence, requiring 𝑂 ~ ( 𝜅 2 ) O ~ (κ 2 ) iterations to reach the superlinear rate of 𝑂 ~ ( ( 1 / 𝑡 ) 𝑡 / 2 ) O ~ ((1/t) t/2 ), where 𝜅 κ is the condition number. This paper proposes a stochastic Newton proximal extragradient method that improves these bounds, delivering faster global linear convergence and achieving the same fast superlinear rate in only 𝑂 ~ ( 𝜅 ) O ~ (κ) iterations. The method extends the Hybrid Proximal Extragradient (HPE) framework, yielding improved global and local convergence guarantees for strongly convex functions with access to a noisy Hessian oracle.
more » « less
Full Text Available
Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization

Jiang, Ruichen; Kavis, Ali; Jin, Qiujiang; Sanghavi, Sujay; Mokhtari, Aryan (December 2024, NeurIPS proceedings)

Full Text Available
Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate

Jiang, Ruichen; Raman, Parameswaran; Sabach, Shoham; Mokhtari, Aryan; Hong, Mingyi; Cevher, Volkan (May 2024, PMLR)

Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second order updates within a lower-dimensional subspace, giving rise to \textit{subspace second-order} methods. However, the majority of existing subspace second-order methods randomly select subspaces, consequently resulting in slower convergence rates depending on the problem's dimension $$d$$. In this paper, we introduce a novel subspace cubic regularized Newton method that achieves a dimension-independent global convergence rate of $$\bigO\left(\frac{1}{mk}+\frac{1}{k^2}\right)$$ for solving convex optimization problems. Here, $$m$$ represents the subspace dimension, which can be significantly smaller than $$d$$. Instead of adopting a random subspace, our primary innovation involves performing the cubic regularized Newton update within the \emph{Krylov subspace} associated with the Hessian and the gradient of the objective function. This result marks the first instance of a dimension-independent convergence rate for a subspace second-order method. Furthermore, when specific spectral conditions of the Hessian are met, our method recovers the convergence rate of a full-dimensional cubic regularized Newton method. Numerical experiments show our method converges faster than existing random subspace methods, especially for high-dimensional problems.
more » « less
Full Text Available
Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth Convex Optimization

Jiang, Ruichen; Mokhtari, Aryan (December 2023, NeurIPS proceedings)

In this paper, we present an accelerated quasi-Newton proximal extragradient method for solving unconstrained smooth convex optimization problems. With access only to the gradients of the objective function, we prove that our method can achieve a convergence rate of $${\bigO}\bigl(\min\{\frac{1}{k^2}, \frac{\sqrt{d\log k}}{k^{2.5}}\}\bigr)$$, where $$d$$ is the problem dimension and $$k$$ is the number of iterations. In particular, in the regime where $$k = \bigO(d)$$, our method matches the \emph{optimal rate} of $$\mathcal{O}(\frac{1}{k^2})$$ by Nesterov's accelerated gradient (NAG). Moreover, in the the regime where $$k = \Omega(d \log d)$$, it outperforms NAG and converges at a \emph{faster rate} of $$\mathcal{O}\bigl(\frac{\sqrt{d\log k}}{k^{2.5}}\bigr)$$. To the best of our knowledge, this result is the first to demonstrate a provable gain for a quasi-Newton-type method over NAG in the convex setting. To achieve such results, we build our method on a recent variant of the Monteiro-Svaiter acceleration framework and adopt an online learning perspective to update the Hessian approximation matrices, in which we relate the convergence rate of our method to the dynamic regret of a specific online convex optimization problem in the space of matrices.
more » « less
Full Text Available
Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem

Cao, Jincheng; Jiang, Ruichen; Abolfazli, Nazanin; Yazdandoost_Hamedani, Erfan; Mokhtari, Aryan (December 2023, Advances in Neural Information Processing Systems)
Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S (Ed.)
Full Text Available

« Prev Next »

Search for: All records