 Award ID(s):
 2039917
 NSFPAR ID:
 10342622
 Date Published:
 Journal Name:
 INFORMS Journal on Computing
 Volume:
 34
 Issue:
 2
 ISSN:
 10919856
 Page Range / eLocation ID:
 890 to 908
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

null (Ed.)Traditionally, in the bilevel optimization framework, a leader chooses her actions by solving an upperlevel problem, assuming that a follower chooses an optimal reaction by solving a lowerlevel problem. However, in many settings, the lowerlevel problems might be nontrivial, thus requiring the use of tailored algorithms for their solution. More importantly, in practice, such problems might be inexactly solved by heuristics and approximation algorithms. Motivated by this consideration, we study a broad class of bilevel optimization problems where the follower might not optimally react to the leader’s actions. In particular, we present a modeling framework in which the leader considers that the follower might use one of a number of known algorithms to solve the lowerlevel problem, either approximately or heuristically. Thus, the leader can hedge against the follower’s use of suboptimal solutions. We provide algorithmic implementations of the framework for a class of nonlinear bilevel knapsack problem (BKP), and we illustrate the potential impact of incorporating this realistic feature through numerical experiments in the context of defenderattacker problems.more » « less

Mixed strategies are often evaluated based on the expected payoff that they guarantee. This is not always desirable. In this paper, we consider games for which maximizing the expected payoff deviates from the actual goal of the players. To address this issue, we introduce the notion of a (u,p)maxmin strategy which ensures receiving a minimum utility of u with probability at least p. We then give approximation algorithms for the problem of finding a (u, p)maxmin strategy for these games. The first game that we consider is Colonel Blotto, a wellstudied game that was introduced in 1921. In the Colonel Blotto game, two colonels divide their troops among a set of battlefields. Each battlefield is won by the colonel that puts more troops in it. The payoff of each colonel is the weighted number of battlefields that she wins. We show that maximizing the expected payoff of a player does not necessarily maximize her winning probability for certain applications of Colonel Blotto. For example, in presidential elections, the players’ goal is to maximize the probability of winning more than half of the votes, rather than maximizing the expected number of votes that they get. We give an exact algorithm for a natural variant of continuous version of this game. More generally, we provide constant and logarithmic approximation algorithms for finding (u, p)maxmin strategies. We also introduce a security game version of Colonel Blotto which we call auditing game. It is played between two players, a defender and an attacker. The goal of the defender is to prevent the attacker from changing the outcome of an instance of Colonel Blotto. Again, maximizing the expected payoff of the defender is not necessarily optimal. Therefore we give a constant approximation for (u, p)maxmin strategies.more » « less

Markov games model interactions among multiple players in a stochastic, dynamic environment. Each player in a Markov game maximizes its expected total discounted reward, which depends upon the policies of the other players. We formulate a class of Markov games, termed affine Markov games, where an affine reward function couples the players’ actions. We introduce a novel solution concept, the softBellman equilibrium, where each player is boundedly rational and chooses a softBellman policy rather than a purely rational policy as in the wellknown Nash equilibrium concept. We provide conditions for the existence and uniqueness of the softBellman equilibrium and propose a nonlinear leastsquares algorithm to compute such an equilibrium in the forward problem. We then solve the inverse game problem of inferring the players’ reward parameters from observed stateaction trajectories via a projectedgradient algorithm. Experiments in a predatorprey OpenAI Gym environment show that the reward parameters inferred by the proposed algorithm outper form those inferred by a baseline algorithm: they reduce the KullbackLeibler divergence between the equilibrium policies and observed policies by at least two orders of magnitude.more » « less

Abstract Selecting facility locations requires significant investment to anticipate and prepare for disruptive events like earthquakes, floods, or labor strikes. In practice, location choices account for facility capacities, which often cannot change during disruptions. When a facility fails, demand transfers to others only if spare capacity exists. Thus, capacitated reliable facility location problems (CRFLP) under uncertainty are more complex than uncapacitated versions. To manage uncertainty and decide effectively, stochastic programming (SP) methods are often employed. Two commonly used SP methods are approximation methods, i.e., Sample Average Approximation (SAA), and decomposition methods, i.e., Progressive Hedging Algorithm (PHA). SAA needs large sample sizes for performance guarantee and turn into computationally intractable. On the other hand, PHA, as an exact method for convex problems, suffers from the need to iteratively solve numerous subproblems which are computationally costly. In this paper, we developed two novel algorithms integrating SAA and PHA for solving the CRFLP under uncertainty. The developed methods are innovative in that they blend the complementary aspects of PHA and SAA in terms of exactness and computational efficiency, respectively. Further, the developed methods are practical in that they allow the specialist to adjust the tradeoff between the exactness and speed of attaining a solution. We present the effectiveness of the developed integrated approaches, Sampling Based Progressive Hedging Algorithm (SBPHA) and Discarding SBPHA (dSBPHA), over the pure strategies (i.e. SAA). The validation of the methods is demonstrated through twostage stochastic CRFLP. Promising results are attained for CRFLP, and the method has great potential to be generalized for SP problems.

Abstract We study the
one‐warehouse multi‐retailer problem under deterministic dynamic demand and concave batch order costs, where order batches have an identical capacity and the order cost function for each facility is concave within the batch. Under appropriate assumptions on holding cost structure, we obtain lower bounds via a decomposition that splits the two‐echelon problem into single‐facility subproblems, then propose approximation algorithms by judiciously recombining the subproblem solutions. For piecewise linear concave batch order costs with a constant number of slopes we obtain a constant‐factor approximation, while for general concave batch costs we propose an approximation within a logarithmic factor of optimality. We also extend some results to subadditive order and/or holding costs.