We study the problem of finding an π-first-order stationary point (FOSP) of a smooth function, given access only to gradient information. The best-known gradient query complexity for this task, assuming both the gradient and Hessian of the objective function are Lipschitz continuous, is O(πβ7/4). In this work, we propose a method with a gradient complexity of O(π1/4πβ13/8), where π is the problem dimension, leading to an improved complexity when π = O(πβ1/2). To achieve this result, we design an optimization algorithm that, underneath, involves solving two online learning problems. Specifically, we first reformulate the task of finding a stationary point for a nonconvex problem as minimizing the regret in an online convex optimization problem, where the loss is determined by the gradient of the objective function. Then, we introduce a novel optimistic quasi-Newton method to solve this online learning problem, with the Hessian approximation update itself framed as an online learning problem in the space of matrices. Beyond improving the complexity bound for achieving an π-FOSP using a gradient oracle, our result provides the first guarantee suggesting that quasi-Newton methods can potentially outperform gradient descent-type methods in nonconvex settings.
more »
« less
Closing the Computational-Query Depth Gap in Parallel Stochastic Convex Optimization
This paper presents a new parallel algorithm for minimizing Lipschitz convex functions using a stochastic subgradient oracle. The proposed method matches the state-of-the-art in terms of total queries and query depth (parallel rounds of queries) from [CJJLLST23], while improving the computational depth by a polynomial factor for sufficiently small accuracy. When combined with previous methods, this result closes the gap between the best-known query depth and computational depth in parallel stochastic convex optimization. The approach builds on the ball acceleration framework from [CJJJLST20, ACJJS21], which reduces optimization to minimizing a Gaussian-convolved regularization of the function within Euclidean balls. By developing new stability properties of the Hessian of this induced function, the authors reduce ball-constrained problems to stochastic unconstrained quadratic minimization. Although concentration results for the asymmetric Hessian approximations are lacking, the authors design an efficient parallel method for solving these quadratics. Interestingly, the algorithm can be further enhanced using fast matrix multiplication, yielding nearly-linear work if the matrix multiplication exponent is 2.
more »
« less
- Award ID(s):
- 2505865
- PAR ID:
- 10631958
- Publisher / Repository:
- https://doi.org/10.48550/arXiv.2406.07373
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Meka, Raghu (Ed.)We provide a general method to convert a "primal" black-box algorithm for solving regularized convex-concave minimax optimization problems into an algorithm for solving the associated dual maximin optimization problem. Our method adds recursive regularization over a logarithmic number of rounds where each round consists of an approximate regularized primal optimization followed by the computation of a dual best response. We apply this result to obtain new state-of-the-art runtimes for solving matrix games in specific parameter regimes, obtain improved query complexity for solving the dual of the CVaR distributionally robust optimization (DRO) problem, and recover the optimal query complexity for finding a stationary point of a convex function.more » « less
-
In this paper, we revisit the bilevel optimization problem, in which the upper-level objective function is generally nonconvex and the lower-level objective function is strongly convex. Although this type of problem has been studied extensively, it still remains an open question how to achieve an $$\mathcal{O}(\epsilon^{-1.5})$$ sample complexity in Hessian/Jacobian-free stochastic bilevel optimization without any second-order derivative computation. To fill this gap, we propose a novel Hessian/Jacobian-free bilevel optimizer named FdeHBO, which features a simple fully single-loop structure, a projection-aided finite-difference Hessian/Jacobian-vector approximation, and momentum-based updates. Theoretically, we show that FdeHBO requires $$\mathcal{O}(\epsilon^{-1.5})$$ iterations (each using $$\mathcal{O}(1)$$ samples and only first-order gradient information) to find an $$\epsilon$$-accurate stationary point. As far as we know, this is the first Hessian/Jacobian-free method with an $$\mathcal{O}(\epsilon^{-1.5})$$ sample complexity for nonconvex-strongly-convex stochastic bilevel optimization.more » « less
-
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. Randomized sketching has emerged as a powerful technique for constructing estimates of the Hessian which can be used to perform approximate Newton steps. This involves multiplication by a random sketching matrix, which introduces a trade-off between the computational cost of sketching and the convergence rate of the optimization algorithm. A theoretically desirable but practically much too expensive choice is to use a dense Gaussian sketching matrix, which produces unbiased estimates of the exact Newton step and which offers strong problem-independent convergence guarantees. We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching, without substantially affecting its convergence properties. This approach, called Newton LESS, is based on a recently introduced sketching technique: LEverage Score Sparsified (LESS) embeddings. We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings, not just up to constant factors but even down to lower order terms, for a large class of optimization tasks. In particular, this leads to a new state-of-the-art convergence result for an iterative least squares solver. Finally, we extend LESS embeddings to include uniformly sparsified random sign matrices which can be implemented efficiently and which perform well in numerical experiments.more » « less
-
We propose a stochastic variance-reduced cubic regularized Newton method (SVRC) for non-convex optimization. At the core of our algorithm is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularization method. We show that our algorithm is guaranteed to converge to an $$(\epsilon,\sqrt{\epsilon})$$-approximate local minimum within $$\tilde{O}(n^{4/5}/\epsilon^{3/2})$$ second-order oracle calls, which outperforms the state-of-the-art cubic regularization algorithms including subsampled cubic regularization. Our work also sheds light on the application of variance reduction technique to high-order non-convex optimization methods. Thorough experiments on various non-convex optimization problems support our theory.more » « less
An official website of the United States government

