- Award ID(s):
- 2039917
- NSF-PAR ID:
- 10342622
- Date Published:
- Journal Name:
- INFORMS Journal on Computing
- Volume:
- 34
- Issue:
- 2
- ISSN:
- 1091-9856
- Page Range / eLocation ID:
- 890 to 908
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
null (Ed.)Traditionally, in the bilevel optimization framework, a leader chooses her actions by solving an upper-level problem, assuming that a follower chooses an optimal reaction by solving a lower-level problem. However, in many settings, the lower-level problems might be nontrivial, thus requiring the use of tailored algorithms for their solution. More importantly, in practice, such problems might be inexactly solved by heuristics and approximation algorithms. Motivated by this consideration, we study a broad class of bilevel optimization problems where the follower might not optimally react to the leader’s actions. In particular, we present a modeling framework in which the leader considers that the follower might use one of a number of known algorithms to solve the lower-level problem, either approximately or heuristically. Thus, the leader can hedge against the follower’s use of suboptimal solutions. We provide algorithmic implementations of the framework for a class of nonlinear bilevel knapsack problem (BKP), and we illustrate the potential impact of incorporating this realistic feature through numerical experiments in the context of defender-attacker problems.more » « less
-
Mixed strategies are often evaluated based on the expected payoff that they guarantee. This is not always desirable. In this paper, we consider games for which maximizing the expected payoff deviates from the actual goal of the players. To address this issue, we introduce the notion of a (u,p)-maxmin strategy which ensures receiving a minimum utility of u with probability at least p. We then give approximation algorithms for the problem of finding a (u, p)-maxmin strategy for these games. The first game that we consider is Colonel Blotto, a well-studied game that was introduced in 1921. In the Colonel Blotto game, two colonels divide their troops among a set of battlefields. Each battlefield is won by the colonel that puts more troops in it. The payoff of each colonel is the weighted number of battlefields that she wins. We show that maximizing the expected payoff of a player does not necessarily maximize her winning probability for certain applications of Colonel Blotto. For example, in presidential elections, the players’ goal is to maximize the probability of winning more than half of the votes, rather than maximizing the expected number of votes that they get. We give an exact algorithm for a natural variant of continuous version of this game. More generally, we provide constant and logarithmic approximation algorithms for finding (u, p)-maxmin strategies. We also introduce a security game version of Colonel Blotto which we call auditing game. It is played between two players, a defender and an attacker. The goal of the defender is to prevent the attacker from changing the outcome of an instance of Colonel Blotto. Again, maximizing the expected payoff of the defender is not necessarily optimal. Therefore we give a constant approximation for (u, p)-maxmin strategies.more » « less
-
Markov games model interactions among multiple players in a stochastic, dynamic environment. Each player in a Markov game maximizes its expected total discounted reward, which depends upon the policies of the other players. We formulate a class of Markov games, termed affine Markov games, where an affine reward function couples the players’ actions. We introduce a novel solution concept, the soft-Bellman equilibrium, where each player is boundedly rational and chooses a soft-Bellman policy rather than a purely rational policy as in the well-known Nash equilibrium concept. We provide conditions for the existence and uniqueness of the soft-Bellman equilibrium and propose a nonlinear least-squares algorithm to compute such an equilibrium in the forward problem. We then solve the inverse game problem of inferring the players’ reward parameters from observed state-action trajectories via a projected-gradient algorithm. Experiments in a predator-prey OpenAI Gym environment show that the reward parameters inferred by the proposed algorithm outper- form those inferred by a baseline algorithm: they reduce the Kullback-Leibler divergence between the equilibrium policies and observed policies by at least two orders of magnitude.more » « less
-
Abstract Selecting facility locations requires significant investment to anticipate and prepare for disruptive events like earthquakes, floods, or labor strikes. In practice, location choices account for facility capacities, which often cannot change during disruptions. When a facility fails, demand transfers to others only if spare capacity exists. Thus, capacitated reliable facility location problems (CRFLP) under uncertainty are more complex than uncapacitated versions. To manage uncertainty and decide effectively, stochastic programming (SP) methods are often employed. Two commonly used SP methods are approximation methods, i.e., Sample Average Approximation (SAA), and decomposition methods, i.e., Progressive Hedging Algorithm (PHA). SAA needs large sample sizes for performance guarantee and turn into computationally intractable. On the other hand, PHA, as an exact method for convex problems, suffers from the need to iteratively solve numerous sub-problems which are computationally costly. In this paper, we developed two novel algorithms integrating SAA and PHA for solving the CRFLP under uncertainty. The developed methods are innovative in that they blend the complementary aspects of PHA and SAA in terms of exactness and computational efficiency, respectively. Further, the developed methods are practical in that they allow the specialist to adjust the tradeoff between the exactness and speed of attaining a solution. We present the effectiveness of the developed integrated approaches, Sampling Based Progressive Hedging Algorithm (SBPHA) and Discarding SBPHA (d-SBPHA), over the pure strategies (i.e. SAA). The validation of the methods is demonstrated through two-stage stochastic CRFLP. Promising results are attained for CRFLP, and the method has great potential to be generalized for SP problems.
-
Abstract We study the
one‐warehouse multi‐retailer problem under deterministic dynamic demand and concave batch order costs, where order batches have an identical capacity and the order cost function for each facility is concave within the batch. Under appropriate assumptions on holding cost structure, we obtain lower bounds via a decomposition that splits the two‐echelon problem into single‐facility subproblems, then propose approximation algorithms by judiciously recombining the subproblem solutions. For piecewise linear concave batch order costs with a constant number of slopes we obtain a constant‐factor approximation, while for general concave batch costs we propose an approximation within a logarithmic factor of optimality. We also extend some results to subadditive order and/or holding costs.