skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on February 13, 2025

Title: Non-Convex Bilevel Optimization with Time-Varying Objective Functions
Bilevel optimization has become a powerful tool in a wide variety of machine learning problems. However, the current nonconvex bilevel optimization considers an offline dataset and static functions, which may not work well in emerging online applications with streaming data and time-varying functions. In this work, we study online bilevel optimization (OBO) where the functions can be time-varying and the agent continuously updates the decisions with online streaming data. To deal with the function variations and the unavailability of the true hypergradients in OBO, we propose a single-loop online bilevel optimizer with window averaging (SOBOW), which updates the outer-level decision based on a window average of the most recent hypergradient estimations stored in the memory. Compared to existing algorithms, SOBOW is computationally efficient and does not need to know previous functions. To handle the unique technical difficulties rooted in single-loop update and function variations for OBO, we develop a novel analytical technique that disentangles the complex couplings between decision variables, and carefully controls the hypergradient estimation error. We show that SOBOW can achieve a sublinear bilevel local regret under mild conditions. Extensive experiments across multiple domains corroborate the effectiveness of SOBOW.  more » « less
Award ID(s):
2311274
PAR ID:
10505561
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Conference on Neural Information Processing Systems
Date Published:
Format(s):
Medium: X
Location:
Neural Information Processing Systems
Sponsoring Org:
National Science Foundation
More Like this
  1. The use of Command governors (CGs) to enforce pointwise-in-time state and control constraints by minimally altering reference commands to closed-loop systems has been proposed for a range of aerospace applications. In this paper, we revisit the design of the CG and describe approaches to its implementation based directly on a bilevel (inner loop + outer loop) optimization problem in the formulation of CG. Such approaches do not require offline construction and storage of constraint admissible sets nor the use of online trajectory prediction, and hence can be beneficial in situations when the reference command is high-dimensional and constraints are nonlinear and change with time or are reconfigured online. In the case of linear systems with linear constraints, or in the case of nonlinear systems with linear constraints for which a quadratic Lyapunov function is available, the inner loop optimization problem is explicitly solvable and the bilevel optimization reduces to a single stage optimization. In other cases, a reformulation of the bilevel optimization problem into a mathematical programming problem with equilibrium constraints (MPEC) can be used to arrive at a single stage optimization problem. By applying the bilevel optimization-based CG to the classical low thrust orbital transfer problem, in which the dynamics are represented by Gauss-Variational Equations (GVEs) and the nominal controller is of Lyapunov type, we demonstrate that constraints, such as on the radius of periapsis to avoid planetary collision, on the osculating orbit eccentricity and on the thrust magnitude can be handled. Furthermore, in this case the parameters of the Lyapunov function can be simultaneously optimized online resulting in faster maneuvers. 
    more » « less
  2. This work proposes a new algorithm – the Single-timescale Double-momentum Stochastic Approximation (SUSTAIN) –for tackling stochastic unconstrained bilevel optimization problems. We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior works which rely on two-timescale or double loop techniques, we design a stochastic momentum-assisted gradient estimator for both the upper and lower level updates. The latter allows us to control the error in the stochastic gradient updates due to inaccurate solution to both subproblems. If the upper objective function is smooth but possibly non-convex, we show that SUSTAIN requires ${O}(\epsilon^{-3/2})$ iterations (each using $O(1)$ samples) to find an $\epsilon$-stationary solution. The $\epsilon$-stationary solution is defined as the point whose squared norm of the gradient of the outer function is less than or equal to $\epsilon$. The total number of stochastic gradient samples required for the upper and lower level objective functions match the best-known complexity for single-level stochastic gradient algorithms. We also analyze the case when the upper level objective function is strongly-convex. 
    more » « less
  3. Standard federated optimization methods successfully apply to stochastic problems with singlelevel structure. However, many contemporary ML problems – including adversarial robustness, hyperparameter tuning, actor-critic – fall under nested bilevel programming that subsumes minimax and compositional optimization. In this work, we propose FEDNEST: A federated alternating stochastic gradient method to address general nested problems. We establish provable convergence rates for FEDNEST in the presence of heterogeneous data and introduce variations for bilevel, minimax, and compositional optimization. FEDNEST introduces multiple innovations including federated hypergradient computation and variance reduction to address inner-level heterogeneity. We complement our theory with experiments on hyperparameter & hyper-representation learning and minimax optimization that demonstrate the benefits of our method in practice. 
    more » « less
  4. In this paper, we revisit the bilevel optimization problem, in which the upper-level objective function is generally nonconvex and the lower-level objective function is strongly convex. Although this type of problem has been studied extensively, it still remains an open question how to achieve an $\mathcal{O}(\epsilon^{-1.5})$ sample complexity in Hessian/Jacobian-free stochastic bilevel optimization without any second-order derivative computation. To fill this gap, we propose a novel Hessian/Jacobian-free bilevel optimizer named FdeHBO, which features a simple fully single-loop structure, a projection-aided finite-difference Hessian/Jacobian-vector approximation, and momentum-based updates. Theoretically, we show that FdeHBO requires $\mathcal{O}(\epsilon^{-1.5})$ iterations (each using $\mathcal{O}(1)$ samples and only first-order gradient information) to find an $\epsilon$-accurate stationary point. As far as we know, this is the first Hessian/Jacobian-free method with an $\mathcal{O}(\epsilon^{-1.5})$ sample complexity for nonconvex-strongly-convex stochastic bilevel optimization. 
    more » « less
  5. This paper proposes a set of novel optimization algorithms for solving a class of convex optimization problems with time-varying streaming cost functions. We develop an approach to track the optimal solution with a bounded error. Unlike prior work, our algorithm is executed only by using the first-order derivatives of the cost function, which makes it computationally efficient for optimization with time-varying cost function. We compare our algorithms to the gradient descent algorithm and show why gradient descent is not an effective solution for optimization problems with time-varying cost. Several examples, including solving a model predictive control problem cast as a convex optimization problem with a streaming time-varying cost function, demonstrate our results. 
    more » « less