We consider the problem of finding stationary points in Bilevel optimization when the lower-level problem is unconstrained and strongly convex. The problem has been extensively studied in recent years; the main technical challenge is to keep track of lower-level solutions $y^*(x)$ in response to the changes in the upper-level variables $$x$$. Subsequently, all existing approaches tie their analyses to a genie algorithm that knows lower-level solutions and, therefore, need not query any points far from them. We consider a dual question to such approaches: suppose we have an oracle, which we call $y^*$-aware, that returns an $$O(\epsilon)$$-estimate of the lower-level solution, in addition to first-order gradient estimators {\it locally unbiased} within the $$\Theta(\epsilon)$$-ball around $y^*(x)$. We study the complexity of finding stationary points with such an $y^*$-aware oracle: we propose a simple first-order method that converges to an $$\epsilon$$ stationary point using $$O(\epsilon^{-6}), O(\epsilon^{-4})$$ access to first-order $y^*$-aware oracles. Our upper bounds also apply to standard unbiased first-order oracles, improving the best-known complexity of first-order methods by $$O(\epsilon)$$ with minimal assumptions. We then provide the matching $$\Omega(\epsilon^{-6})$$, $$\Omega(\epsilon^{-4})$$ lower bounds without and with an additional smoothness assumption on $y^*$-aware oracles, respectively. Our results imply that any approach that simulates an algorithm with an $y^*$-aware oracle must suffer the same lower bounds.
more »
« less
On The Complexity of First-Order Methods in Stochastic Bilevel Optimization
We consider the problem of finding stationary points in Bilevel optimization when the lower-level problem is unconstrained and strongly convex. The problem has been extensively studied in recent years; the main technical challenge is to keep track of lower-level solutions $y^*(x)$ in response to the changes in the upper-level variables $$x$$. Subsequently, all existing approaches tie their analyses to a genie algorithm that knows lower-level solutions and, therefore, need not query any points far from them. We consider a dual question to such approaches: suppose we have an oracle, which we call $y^*$-aware, that returns an $$O(\epsilon)$$-estimate of the lower-level solution, in addition to first-order gradient estimators locally unbiased within the $$\Theta(\epsilon)$$-ball around $y^*(x)$. We study the complexity of finding stationary points with such an $y^*$-aware oracle: we propose a simple first-order method that converges to an $$\epsilon$$ stationary point using $$O(\epsilon^{-6}), O(\epsilon^{-4})$$ access to first-order $y^*$-aware oracles. Our upper bounds also apply to standard unbiased first-order oracles, improving the best-known complexity of first-order methods by $$O(\epsilon)$$ with minimal assumptions. We then provide the matching $$\Omega(\epsilon^{-6})$$, $$\Omega(\epsilon^{-4})$$ lower bounds without and with an additional smoothness assumption, respectively. Our results imply that any approach that simulates an algorithm with an $y^*$-aware oracle must suffer the same lower bounds.
more »
« less
- Award ID(s):
- 2023239
- PAR ID:
- 10530331
- Publisher / Repository:
- Proceedings of Machine Learning Research
- Date Published:
- Volume:
- 235
- Page Range / eLocation ID:
- 25784--25811
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We give nearly matching upper and lower bounds on the oracle complexity of finding ϵ-stationary points (∥∇F(x)∥≤ϵ in stochastic convex optimization. We jointly analyze the oracle complexity in both the local stochastic oracle model and the global oracle (or, statistical learning) model. This allows us to decompose the complexity of finding near-stationary points into optimization complexity and sample complexity, and reveals some surprising differences between the complexity of stochastic optimization versus learning. Notably, we show that in the global oracle/statistical learning model, only logarithmic dependence on smoothness is required to find a near-stationary point, whereas polynomial dependence on smoothness is necessary in the local stochastic oracle model. In other words, the separation in complexity between the two models can be exponential, and the folklore understanding that smoothness is required to find stationary points is only weakly true for statistical learning. Our upper bounds are based on extensions of a recent “recursive regularization” technique proposed by Allen-Zhu (2018). We show how to extend the technique to achieve near-optimal rates, and in particular show how to leverage the extra information available in the global oracle model. Our algorithm for the global model can be implemented efficiently through finite sum methods, and suggests an interesting new computational-statistical tradeoffmore » « less
-
In this paper, we revisit the bilevel optimization problem, in which the upper-level objective function is generally nonconvex and the lower-level objective function is strongly convex. Although this type of problem has been studied extensively, it still remains an open question how to achieve an $$\mathcal{O}(\epsilon^{-1.5})$$ sample complexity in Hessian/Jacobian-free stochastic bilevel optimization without any second-order derivative computation. To fill this gap, we propose a novel Hessian/Jacobian-free bilevel optimizer named FdeHBO, which features a simple fully single-loop structure, a projection-aided finite-difference Hessian/Jacobian-vector approximation, and momentum-based updates. Theoretically, we show that FdeHBO requires $$\mathcal{O}(\epsilon^{-1.5})$$ iterations (each using $$\mathcal{O}(1)$$ samples and only first-order gradient information) to find an $$\epsilon$$-accurate stationary point. As far as we know, this is the first Hessian/Jacobian-free method with an $$\mathcal{O}(\epsilon^{-1.5})$$ sample complexity for nonconvex-strongly-convex stochastic bilevel optimization.more » « less
-
In this work, we study first-order algorithms for solving Bilevel Optimization (BO) where the objective functions are smooth but possibly nonconvex in both levels and the variables are restricted to closed convex sets. As a first step, we study the landscape of BO through the lens of penalty methods, in which the upper- and lower-level objectives are combined in a weighted sum with penalty parameter . In particular, we establish a strong connection between the penalty function and the hyper-objective by explicitly characterizing the conditions under which the values and derivatives of the two must be -close. A by-product of our analysis is the explicit formula for the gradient of hyper-objective when the lower-level problem has multiple solutions under minimal conditions, which could be of independent interest. Next, viewing the penalty formulation as -approximation of the original BO, we propose first-order algorithms that find an -stationary solution by optimizing the penalty formulation with . When the perturbed lower-level problem uniformly satisfies the {\it small-error} proximal error-bound (EB) condition, we propose a first-order algorithm that converges to an -stationary point of the penalty function using in total accesses to first-order stochastic gradient oracles. Under an additional assumption on stochastic oracles, we show that the algorithm can be implemented in a fully {\it single-loop} manner, {\it i.e.,} with samples per iteration, and achieves the improved oracle-complexity of .more » « less
-
Aggarwal, Divesh (Ed.)We study the problem of function inversion with preprocessing where, given a function f : [N] → [N] and a point y in its image, the goal is to find an x such that f(x) = y using at most T oracle queries to f and S bits of preprocessed advice that depend on f. The seminal work of Corrigan-Gibbs and Kogan [TCC 2019] initiated a line of research that shows many exciting connections between the non-adaptive setting of this problem and other areas of theoretical computer science. Specifically, they introduced a very weak class of algorithms (strongly non-adaptive) where the points queried by the oracle depend only on the inversion point y, and are independent of the answers to the previous queries and the S bits of advice. They showed that proving even mild lower bounds on strongly non-adaptive algorithms for function inversion would imply a breakthrough result in circuit complexity. We prove that every strongly non-adaptive algorithm for function inversion (and even for its special case of permutation inversion) must have ST = Ω(N log (N) log (T)). This gives the first improvement to the long-standing lower bound of ST = Ω(N log N) due to Yao [STOC 90]. As a corollary, we conclude the first separation between strongly non-adaptive and adaptive algorithms for permutation inversion, where the adaptive algorithm by Hellman [TOIT 80] achieves the trade-off ST = O(N log N). Additionally, we show equivalence between lower bounds for strongly non-adaptive data structures and the one-way communication complexity of certain partial functions. As an example, we recover our lower bound on function inversion in the communication complexity framework.more » « less
An official website of the United States government

