Although upper bound guarantees for bilevel optimization have been widely studied, progress on lower bounds has been limited due to the complexity of the bilevel structure. In this work, we focus on the smooth nonconvex-strongly-convex setting and develop new hard instances that yield nontrivial lower bounds under deterministic and stochastic first-order oracle models. In the deterministic case, we prove that any first-order zero-respecting algorithm requires at least $$\Omega(\kappa^{3/2}\epsilon^{-2})$$ oracle calls to find an $$\epsilon$$-accurate stationary point, improving the optimal lower bounds known for single-level nonconvex optimization and for nonconvex-strongly-convex min-max problems. In the stochastic case, we show that at least $$\Omega(\kappa^{5/2}\epsilon^{-4})$$ stochastic oracle calls are necessary, again strengthening the best known bounds in related settings. Our results expose substantial gaps between current upper and lower bounds for bilevel optimization and suggest that even simplified regimes, such as those with quadratic lower-level objectives, warrant further investigation toward understanding the optimal complexity of bilevel optimization under standard first-order oracles.
more »
« less
Achieving O(\epsilon^{-1.5}) Complexity in Hessian/Jacobian-free Stochastic Bilevel Optimization
In this paper, we revisit the bilevel optimization problem, in which the upper-level objective function is generally nonconvex and the lower-level objective function is strongly convex. Although this type of problem has been studied extensively, it still remains an open question how to achieve an $$\mathcal{O}(\epsilon^{-1.5})$$ sample complexity in Hessian/Jacobian-free stochastic bilevel optimization without any second-order derivative computation. To fill this gap, we propose a novel Hessian/Jacobian-free bilevel optimizer named FdeHBO, which features a simple fully single-loop structure, a projection-aided finite-difference Hessian/Jacobian-vector approximation, and momentum-based updates. Theoretically, we show that FdeHBO requires $$\mathcal{O}(\epsilon^{-1.5})$$ iterations (each using $$\mathcal{O}(1)$$ samples and only first-order gradient information) to find an $$\epsilon$$-accurate stationary point. As far as we know, this is the first Hessian/Jacobian-free method with an $$\mathcal{O}(\epsilon^{-1.5})$$ sample complexity for nonconvex-strongly-convex stochastic bilevel optimization.
more »
« less
- PAR ID:
- 10505562
- Publisher / Repository:
- Advances in Neural Information Processing Systems
- Date Published:
- Format(s):
- Medium: X
- Location:
- Advances in Neural Information Processing Systems
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This work proposes a new algorithm β the Single-timescale Double-momentum Stochastic Approximation (SUSTAIN) βfor tackling stochastic unconstrained bilevel optimization problems. We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior works which rely on two-timescale or double loop techniques, we design a stochastic momentum-assisted gradient estimator for both the upper and lower level updates. The latter allows us to control the error in the stochastic gradient updates due to inaccurate solution to both subproblems. If the upper objective function is smooth but possibly non-convex, we show that SUSTAIN requires $${O}(\epsilon^{-3/2})$$ iterations (each using $O(1)$ samples) to find an $$\epsilon$$-stationary solution. The $$\epsilon$$-stationary solution is defined as the point whose squared norm of the gradient of the outer function is less than or equal to $$\epsilon$$. The total number of stochastic gradient samples required for the upper and lower level objective functions match the best-known complexity for single-level stochastic gradient algorithms. We also analyze the case when the upper level objective function is strongly-convex.more » « less
-
Bilevel optimization is one of the fundamental problems in machine learning and optimization. Recent theoretical developments in bilevel optimization focus on finding the first-order stationary points for nonconvex-strongly-convex cases. In this paper, we analyze algorithms that can escape saddle points in nonconvex-strongly-convex bilevel optimization. Specifically, we show that the perturbed approximate implicit differentiation (AID) with a warm start strategy finds an Ο΅-approximate local minimum of bilevel optimization in $$\tilde O(\epsilon^{-2})$$ iterations with high probability. Moreover, we propose an inexact NEgative-curvature-Originated-from-Noise Algorithm (iNEON), an algorithm that can escape saddle point and find local minimum of stochastic bilevel optimization. As a by-product, we provide the first nonasymptotic analysis of perturbed multi-step gradient descent ascent (GDmax) algorithm that converges to local minimax point for minimax problems.more » « less
-
We study the problem of finding an π-first-order stationary point (FOSP) of a smooth function, given access only to gradient information. The best-known gradient query complexity for this task, assuming both the gradient and Hessian of the objective function are Lipschitz continuous, is O(πβ7/4). In this work, we propose a method with a gradient complexity of O(π1/4πβ13/8), where π is the problem dimension, leading to an improved complexity when π = O(πβ1/2). To achieve this result, we design an optimization algorithm that, underneath, involves solving two online learning problems. Specifically, we first reformulate the task of finding a stationary point for a nonconvex problem as minimizing the regret in an online convex optimization problem, where the loss is determined by the gradient of the objective function. Then, we introduce a novel optimistic quasi-Newton method to solve this online learning problem, with the Hessian approximation update itself framed as an online learning problem in the space of matrices. Beyond improving the complexity bound for achieving an π-FOSP using a gradient oracle, our result provides the first guarantee suggesting that quasi-Newton methods can potentially outperform gradient descent-type methods in nonconvex settings.more » « less
-
In this paper, we propose a sequential minimax optimization (SMO) method for solving a class of constrained bilevel optimization problems in which the lower level part is a possibly nonsmooth convex optimization problem, whereas the upper level part is a possibly nonconvex optimization problem. Specifically, SMO applies a first order method to solve a sequence of minimax subproblems, which are obtained by employing a hybrid of modified augmented Lagrangian and penalty schemes on the bilevel optimization problems. Under suitable assumptions, we establish an operation complexity of [Formula: see text] and [Formula: see text], measured in terms of fundamental operations, for SMO in finding an [Formula: see text]-KarushβKuhnβTucker solution of the bilevel optimization problems with merely convex and strongly convex lower level objective functions, respectively. The latter result improves the previous best known operation complexity by a factor of [Formula: see text]. Preliminary numerical results demonstrate significantly superior computational performance compared with the recently developed first order penalty method. Funding: This work was partially supported by the Air Force Office of Scientific Research Award [FA9550-24-1-0343], the Office of Naval Research Award [N00014-24-1-2702], and the National Science Foundation Awards [2211491 and 2435911]. It was primarily conducted during Sanyou Mei's PhD studies at the University of Minnesota.more » « less
An official website of the United States government

