 NSFPAR ID:
 10339524
 Date Published:
 Journal Name:
 Management Science
 Volume:
 68
 Issue:
 1
 ISSN:
 00251909
 Page Range / eLocation ID:
 9 to 26
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

null (Ed.)The predictthenoptimize framework is fundamental in many practical settings: predict the unknown parameters of an optimization problem, and then solve the problem using the predicted values of the parameters. A natural loss function in this environment is to consider the cost of the decisions induced by the predicted parameters, in contrast to the prediction error of the parameters. This loss function was recently introduced in [Elmachtoub and Grigas, 2017], which called it the Smart PredictthenOptimize (SPO) loss. Since the SPO loss is nonconvex and noncontinuous, standard results for deriving generalization bounds do not apply. In this work, we provide an assortment of generalization bounds for the SPO loss function. In particular, we derive bounds based on the Natarajan dimension that, in the case of a polyhedral feasible region, scale at most logarithmically in the number of extreme points, but, in the case of a general convex set, have poor dependence on the dimension. By exploiting the structure of the SPO loss function and an additional strong convexity assumption on the feasible region, we can dramatically improve the dependence on the dimension via an analysis and corresponding bounds that are akin to the margin guarantees in classification problems.more » « less

Ruiz, Francisco ; Dy, Jennifer ; van de Meent, JanWillem (Ed.)In this paper, we study a class of bilevel optimization problems, also known as simple bilevel optimization, where we minimize a smooth objective function over the optimal solution set of another convex constrained optimization problem. Several iterative methods have been developed for tackling this class of problems. Alas, their convergence guarantees are either asymptotic for the upperlevel objective, or the convergence rates are slow and suboptimal. To address this issue, in this paper, we introduce a novel bilevel optimization method that locally approximates the solution set of the lowerlevel problem via a cutting plane and then runs a conditional gradient update to decrease the upperlevel objective. When the upperlevel objective is convex, we show that our method requires ${O}(\max\{1/\epsilon_f,1/\epsilon_g\})$ iterations to find a solution that is $\epsilon_f$optimal for the upperlevel objective and $\epsilon_g$optimal for the lowerlevel objective. Moreover, when the upperlevel objective is nonconvex, our method requires ${O}(\max\{1/\epsilon_f^2,1/(\epsilon_f\epsilon_g)\})$ iterations to find an $(\epsilon_f,\epsilon_g)$optimal solution. We also prove stronger convergence guarantees under the Holderian error bound assumption on the lowerlevel problem. To the best of our knowledge, our method achieves the bestknown iteration complexity for the considered class of bilevel problems.more » « less

We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution in offpolicy reinforcement learning. GradientDICE fixes several problems of GenDICE (Zhang et al.,2020a), the stateoftheart for estimating such density ratios. Namely, the optimization problem in GenDICE is not a convexconcave saddlepoint problem once nonlinearity in optimization variable parameterization is introduced to ensure positivity, so any primaldual algorithm is not guaranteed to converge or find the desired solution. However, such nonlinearity is essential to ensure the consistency of GenDICE even with a tabular representation. This is a fundamental contradiction, resulting from GenDICE’s original formulation of the optimization problem. In GradientDICE, we optimize a different objective from GenDICE by using the PerronFrobenius theorem and eliminating GenDICE’s use of divergence. Consequently, nonlinearity in parameterization is not necessary for GradientDICE, which is provably convergent under linear function approximation.more » « less

This work develops analysis and algorithms for solving a class of bilevel optimization problems where the lowerlevel (LL) problems have linear constraints. Most of the existing approaches for constrained bilevel problems rely on value functionbased approximate reformulations, which suffer from issues such as nonconvex and nondifferentiable constraints. In contrast, in this work, we develop an implicit gradientbased approach, which is easy to implement, and is suitable for machine learning applications. We first provide an indepth understanding of the problem, by showing that the implicit objective for such problems is in general nondifferentiable. However, if we add some small (linear) perturbation to the LL objective, the resulting implicit objective becomes differentiable almost surely. This key observation opens the door for developing (deterministic and stochastic) gradientbased algorithms similar to the stateoftheart ones for unconstrained bilevel problems. We show that when the implicit function is assumed to be stronglyconvex, convex, and weaklyconvex, the resulting algorithms converge with guaranteed rate. Finally, we experimentally corroborate the theoretical findings and evaluate the performance of the proposed framework on numerical and adversarial learning problems.more » « less

Abstract We introduce the concept of decision‐focused surrogate modeling for solving computationally challenging nonlinear optimization problems in real‐time settings. The proposed data‐driven framework seeks to learn a simpler, for example, convex, surrogate optimization model that is trained to minimize the
decision prediction error , which is defined as the difference between the optimal solutions of the original and the surrogate optimization models. The learning problem, formulated as a bilevel program, can be viewed as a data‐driven inverse optimization problem to which we apply a decomposition‐based solution algorithm from previous work. We validate our framework through numerical experiments involving the optimization of common nonlinear chemical processes such as chemical reactors, heat exchanger networks, and material blending systems. We also present a detailed comparison of decision‐focused surrogate modeling with standard data‐driven surrogate modeling methods and demonstrate that our approach is significantly more data‐efficient while producing simple surrogate models with high decision prediction accuracy.