skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM to 12:00 PM ET on Tuesday, March 25 due to maintenance. We apologize for the inconvenience.


Title: Penalty Method for Inversion-Free Deep Bilevel Optimization
Solving a bilevel optimization problem is at the core of several machine learning problems such as hyperparameter tuning, data denoising, meta- and few-shot learning, and training-data poisoning. Different from simultaneous or multi-objective optimization, the steepest descent direction for minimizing the upper-level cost in a bilevel problem requires the inverse of the Hessian of the lower-level cost. In this work, we propose a novel algorithm for solving bilevel optimization problems based on the classical penalty function approach. Our method avoids computing the Hessian inverse and can handle constrained bilevel problems easily. We prove the convergence of the method under mild conditions and show that the exact hypergradient is obtained asymptotically. Our method's simplicity and small space and time complexities enable us to effectively solve large-scale bilevel problems involving deep neural networks. We present results on data denoising, few-shot learning, and training-data poisoning problems in a large-scale setting. Our results show that our approach outperforms or is comparable to previously proposed methods based on automatic differentiation and approximate inversion in terms of accuracy, run-time, and convergence speed.  more » « less
Award ID(s):
1946231
PAR ID:
10319519
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of Machine Learning Research 157, 2021
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta- learning, continual learning, and reinforcement learning. Conventional BO methods need to differentiate through the low-level optimization process with implicit dif- ferentiation, which requires expensive calculations related to the Hessian matrix. There has been a recent quest for first-order methods for BO, but the methods pro- posed to date tend to be complicated and impractical for large-scale deep learning applications. In this work, we propose a simple first-order BO algorithm that de- pends only on first-order gradient information, requires no implicit differentiation, and is practical and efficient for large-scale non-convex functions in deep learning. We provide a non-asymptotic convergence analysis of the proposed method to stationary points for non-convex objectives and present empirical results that show its superior practical performance. 
    more » « less
  2. Abstract In this paper, we introduce a bilevel optimization framework for addressing inverse mean-field games, alongside an exploration of numerical methods tailored for this bilevel problem. The primary benefit of our bilevel formulation lies in maintaining the convexity of the objective function and the linearity of constraints in the forward problem. Our paper focuses on inverse mean-field games characterized by unknown obstacles and metrics. We show numerical stability for these two types of inverse problems. More importantly, we, for the first time, establish the identifiability of the inverse mean-field game with unknown obstacles via the solution of the resultant bilevel problem. The bilevel approach enables us to employ an alternating gradient-based optimization algorithm with a provable convergence guarantee. To validate the effectiveness of our methods in solving the inverse problems, we have designed comprehensive numerical experiments, providing empirical evidence of its efficacy. 
    more » « less
  3. In recent years, there is a growing need to train machine learning models on a huge volume of data. Therefore, designing efficient distributed optimization algorithms for empirical risk minimization (ERM) has become an active and challenging research topic. In this paper, we propose a flexible framework for distributed ERM training through solving the dual problem, which provides a unified description and comparison of existing methods. Our approach requires only approximate solutions of the sub-problems involved in the optimization process, and is versatile to be applied on many large-scale machine learning problems including classification, regression, and structured prediction. We show that our framework enjoys global linear convergence for a broad class of non-strongly-convex problems, and some specific choices of the sub-problems can even achieve much faster convergence than existing approaches by a refined analysis. This improved convergence rate is also reflected in the superior empirical performance of our method. 
    more » « less
  4. Ruiz, Francisco; Dy, Jennifer; van de Meent, Jan-Willem (Ed.)
    In this paper, we study a class of bilevel optimization problems, also known as simple bilevel optimization, where we minimize a smooth objective function over the optimal solution set of another convex constrained optimization problem. Several iterative methods have been developed for tackling this class of problems. Alas, their convergence guarantees are either asymptotic for the upper-level objective, or the convergence rates are slow and sub-optimal. To address this issue, in this paper, we introduce a novel bilevel optimization method that locally approximates the solution set of the lower-level problem via a cutting plane and then runs a conditional gradient update to decrease the upper-level objective. When the upper-level objective is convex, we show that our method requires $${O}(\max\{1/\epsilon_f,1/\epsilon_g\})$$ iterations to find a solution that is $$\epsilon_f$$-optimal for the upper-level objective and $$\epsilon_g$$-optimal for the lower-level objective. Moreover, when the upper-level objective is non-convex, our method requires $${O}(\max\{1/\epsilon_f^2,1/(\epsilon_f\epsilon_g)\})$$ iterations to find an $$(\epsilon_f,\epsilon_g)$$-optimal solution. We also prove stronger convergence guarantees under the Holderian error bound assumption on the lower-level problem. To the best of our knowledge, our method achieves the best-known iteration complexity for the considered class of bilevel problems. 
    more » « less
  5. Federated bilevel learning has received increasing attention in various emerging machine learning and communication applications. Recently, several Hessian-vector-based algorithms have been proposed to solve the federated bilevel optimization problem. However, several important properties in federated learning such as the partial client participation and the linear speedup for convergence (i.e., the convergence rate and complexity are improved linearly with respect to the number of sampled clients) in the presence of non-i.i.d.~datasets, still remain open. In this paper, we fill these gaps by proposing a new federated bilevel algorithm named FedMBO with a novel client sampling scheme in the federated hypergradient estimation. We show that FedMBO achieves a convergence rate of $$\mathcal{O}\big(\frac{1}{\sqrt{nK}}+\frac{1}{K}+\frac{\sqrt{n}}{K^{3/2}}\big)$$ on non-i.i.d.~datasets, where $$n$$ is the number of participating clients in each round, and $$K$$ is the total number of iteration. This is the first theoretical linear speedup result for non-i.i.d.~federated bilevel optimization. Extensive experiments validate our theoretical results and demonstrate the effectiveness of our proposed method. 
    more » « less