skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Efficient Algorithms for Planning with Participation Constraints
We consider the problem of planning with participation constraints introduced in [24]. In this problem, a principal chooses actions in a Markov decision process, resulting in separate utilities for the principal and the agent. However, the agent can and will choose to end the process whenever his expected onward utility becomes negative. The principal seeks to compute and commit to a policy that maximizes her expected utility, under the constraint that the agent should always want to continue participating. We provide the first polynomial-time exact algorithm for this problem for finite-horizon settings, where previously only an additive ε-approximation algorithm was known. Our approach can also be extended to the (discounted) infinite-horizon case, for which we give an algorithm that runs in time polynomial in the size of the input and log(1/ε), and returns a policy that is optimal up to an additive error of ε.  more » « less
Award ID(s):
2307106 2122628 1814056
PAR ID:
10387293
Author(s) / Creator(s):
; ;
Editor(s):
Pennock, David M.; Segal, Ilya; Seuken, Sven
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the 23rd ACM Conference on Economics and Computation
Page Range / eLocation ID:
1121 to 1140
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We pose and study the problem of planning in Markov decision processes (MDPs), subject to participation constraints as studied in mechanism design. In this problem, a planner must work with a self-interested agent on a given MDP. Each action in the MDP provides an immediate reward to the planner and a (possibly different) reward to the agent. The agent has no control in choosing the actions, but has the option to end the entire process at any time. The goal of the planner is to find a policy that maximizes her cumulative reward, taking into consideration the agent's ability to terminate. We give a fully polynomial-time approximation scheme for this problem. En route, we present polynomial-time algorithms for computing (exact) optimal policies for important special cases of this problem, including when the time horizon is constant, or when the MDP exhibits a "definitive decisions" property. We illustrate our algorithms with two different game-theoretic applications: the problem of assigning rides in ride-sharing and the problem of designing screening policies. Our results imply efficient algorithms for computing (approximately) optimal policies in both applications. 
    more » « less
  2. In many real-world reinforcement learning (RL) problems, in addition to maximizing the objective, the learning agent has to maintain some necessary safety constraints. We formulate the problem of learning a safe policy as an infinite-horizon discounted Constrained Markov Decision Process (CMDP) with an unknown transition probability matrix, where the safety requirements are modeled as constraints on expected cumulative costs. We propose two model-based constrained reinforcement learning (CRL) algorithms for learning a safe policy, namely, (i) GM-CRL algorithm, where the algorithm has access to a generative model, and (ii) UC-CRL algorithm, where the algorithm learns the model using an upper confidence style online exploration method. We characterize the sample complexity of these algorithms, i.e., the the number of samples needed to ensure a desired level of accuracy with high probability, both with respect to objective maximization and constraint satisfaction. 
    more » « less
  3. Guruswami, Venkatesan (Ed.)
    We study two recent combinatorial contract design models, which highlight different sources of complexity that may arise in contract design, where a principal delegates the execution of a costly project to others. In both settings, the principal cannot observe the choices of the agent(s), only the project’s outcome (success or failure), and incentivizes the agent(s) using a contract, a payment scheme that specifies the payment to the agent(s) upon a project’s success. We present results that resolve open problems and advance our understanding of the computational complexity of both settings. In the multi-agent setting, the project is delegated to a team of agents, where each agent chooses whether or not to exert effort. A success probability function maps any subset of agents who exert effort to a probability of the project’s success. For the family of submodular success probability functions, Dütting et al. [2023] established a poly-time constant factor approximation to the optimal contract, and left open whether this problem admits a PTAS. We answer this question on the negative, by showing that no poly-time algorithm guarantees a better than 0.7-approximation to the optimal contract. For XOS functions, they give a poly-time constant approximation with value and demand queries. We show that with value queries only, one cannot get any constant approximation. In the multi-action setting, the project is delegated to a single agent, who can take any subset of a given set of actions. Here, a success probability function maps any subset of actions to a probability of the project’s success. Dütting et al. [2021a] showed a poly-time algorithm for computing an optimal contract for gross substitutes success probability functions, and showed that the problem is NP-hard for submodular functions. We further strengthen this hardness result by showing that this problem does not admit any constant factor approximation. Furthermore, for the broader class of XOS functions, we establish the hardness of obtaining a n^{-1/2+ε}-approximation for any ε > 0. 
    more » « less
  4. In many real-world reinforcement learning (RL) problems, in addition to maximizing the objective, the learning agent has to maintain some necessary safety constraints. We formulate the problem of learning a safe policy as an infinite-horizon discounted Constrained Markov Decision Process (CMDP) with an unknown transition probability matrix, where the safety requirements are modeled as constraints on expected cumulative costs. We propose two model-based constrained reinforcement learning (CRL) algorithms for learning a safe policy, namely, (i) GM-CRL algorithm, where the algorithm has access to a generative model, and (ii) UC-CRL algorithm, where the algorithm learns the model using an upper confidence style online exploration method. We characterize the sample complexity of these algorithms, i.e., the the number of samples needed to ensure a desired level of accuracy with high probability, both with respect to objective maximization and constraint satisfaction. 
    more » « less
  5. null (Ed.)
    In decision-making problems, the actions of an agent may reveal sensitive information that drives its decisions. For instance, a corporation’s investment decisions may reveal its sensitive knowledge about market dynamics. To prevent this type of information leakage, we introduce a policy synthesis algorithm that protects the privacy of the transition probabilities in a Markov decision process. We use differential privacy as the mathematical definition of privacy. The algorithm first perturbs the transition probabilities using a mechanism that provides differential privacy. Then, based on the privatized transition probabilities, we synthesize a policy using dynamic programming. Our main contribution is to bound the "cost of privacy," i.e., the difference between the expected total rewards with privacy and the expected total rewards without privacy. We also show that computing the cost of privacy has time complexity that is polynomial in the parameters of the problem. Moreover, we establish that the cost of privacy increases with the strength of differential privacy protections, and we quantify this increase. Finally, numerical experiments on two example environments validate the established relationship between the cost of privacy and the strength of data privacy protections. 
    more » « less