This article investigates a stochastic optimal control problem with linear Gaussian dynamics, quadratic performance measure, but non-Gaussian observations. The linear Gaussian dynamics characterizes a large number of interacting agents evolving under a centralized control and external disturbances. The aggregate state of the agents is only partially known to the centralized controller by means of the samples taken randomly in time and from anonymous randomly selected agents. Due to removal of the agent identity from the samples, the observation set has a non-Gaussian structure, and as a consequence, the optimal control law that minimizes a quadratic cost is essentially nonlinear and infinite-dimensional, for any finite number of agents. For infinitely many agents, however, this paper shows that the optimal control law is the solution to a reduced order, finite-dimensional linear quadratic Gaussian problem with Gaussian observations sampled only in time. For this problem, the separation principle holds and is used to develop an explicit optimal control law by combining a linear quadratic regulator with a separately designed finite-dimensional minimum mean square error state estimator. Conditions are presented under which this simple optimal control law can be adopted as a suboptimal control law for finitely many agents.
more »
« less
Exit Time Risk-Sensitive Control for Systems of Cooperative Agents
We study a sequence of many-agent exit time stochastic control problems, parameterized by the number of agents, with risk-sensitive cost structure. We identify a fully characterizing assumption, under which each such control problem corresponds to a risk-neutral stochastic control problem with additive cost, and sequentially to a risk-neutral stochastic control problem on the simplex that retains only the distribution of states of agents, while discarding further specific information about the state of each agent. Under some additional assumptions, we also prove that the sequence of value functions of these stochastic control problems converges to the value function of a deterministic control problem, which can be used for the design of nearly optimal controls for the original problem, when the number of agents is sufficiently large.
more »
« less
- Award ID(s):
- 1713032
- PAR ID:
- 10213055
- Date Published:
- Journal Name:
- Mathematics of control signals and systems
- Volume:
- 31
- ISSN:
- 1435-568X
- Page Range / eLocation ID:
- 279 - 332
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A classic reachability problem for safety of dynamic systems is to compute the set of initial states from which the state trajectory is guaranteed to stay inside a given constraint set over a given time horizon. In this paper, we leverage existing theory of reachability analysis and risk measures to devise a risk-sensitive reachability approach for safety of stochastic dynamic systems under non-adversarial disturbances over a finite time horizon. Specifically, we first introduce the notion of a risk-sensitive safe set as a set of initial states from which the risk of large constraint violations can be reduced to a required level via a control policy, where risk is quantified using the Conditional Value-at-Risk (CVaR) measure. Second, we show how the computation of a risk-sensitive safe set can be reduced to the solution to a Markov Decision Process (MDP), where cost is assessed according to CVaR. Third, leveraging this reduction, we devise a tractable algorithm to approximate a risk-sensitive safe set, and provide theoretical arguments about its correctness. Finally, we present a realistic example inspired from stormwater catchment design to demonstrate the utility of risk-sensitive reachability analysis. In particular, our approach allows a practitioner to tune the level of risk sensitivity from worst-case (which is typical for Hamilton-Jacobi reachability analysis) to risk-neutral (which is the case for stochastic reachability analysis).more » « less
-
A classic reachability problem for safety of dynamic systems is to compute the set of initial states from which the state trajectory is guaranteed to stay inside a given constraint set over a given time horizon. In this paper, we leverage existing theory of reachability analysis and risk measures to devise a risk-sensitive reachability approach for safety of stochastic dynamic systems under non-adversarial disturbances over a finite time horizon. Specifically, we first introduce the notion of a risk-sensitive safe set as a set of initial states from which the risk of large constraint violations can be reduced to a required level via a control policy, where risk is quantified using the Conditional Value-at-Risk (CVaR) measure. Second, we show how the computation of a risk-sensitive safe set can be reduced to the solution to a Markov Decision Process (MDP), where cost is assessed according to CVaR. Third, leveraging this reduction, we devise a tractable algorithm to approximate a risk-sensitive safe set, and provide theoretical arguments about its correctness. Finally, we present a realistic example inspired from stormwater catchment design to demonstrate the utility of risk-sensitive reachability analysis. In particular, our approach allows a practitioner to tune the level of risk sensitivity from worst-case (which is typical for Hamilton-Jacobi reachability analysis) to risk-neutral (which is the case for stochastic reachability analysis).more » « less
-
A classic reachability problem for safety of dynamic systems is to compute the set of initial states from which the state trajectory is guaranteed to stay inside a given constraint set over a given time horizon. In this paper, we leverage existing theory of reachability analysis and risk measures to devise a risk-sensitive reachability approach for safety of stochastic dynamic systems under non-adversarial disturbances over a finite time horizon. Specifically, we first introduce the notion of a risk-sensitive safe set asa set of initial states from which the risk of large constraint violations can be reduced to a required level via a control policy, where risk is quantified using the Conditional Value-at-Risk(CVaR) measure. Second, we show how the computation of a risk-sensitive safe set can be reduced to the solution to a Markov Decision Process (MDP), where cost is assessed according to CVaR. Third, leveraging this reduction, we devise a tractable algorithm to approximate a risk-sensitive safe set and provide arguments about its correctness. Finally, we present a realistic example inspired from stormwater catchment design to demonstrate the utility of risk-sensitive reachability analysis. In particular, our approach allows a practitioner to tune the level of risk sensitivity from worst-case (which is typical for Hamilton-Jacobi reachability analysis) to risk-neutral (which is the case for stochastic reachability analysis).more » « less
-
We study a distributed policy evaluation problem in which a group of agents with jointly observed states and private local actions and rewards collaborate to learn the value function of a given policy via local computation and communication. This problem arises in various large-scale multi-agent systems, including power grids, intelligent transportation systems, wireless sensor networks, and multi-agent robotics. We develop and analyze a new distributed temporal-difference learning algorithm that minimizes the mean-square projected Bellman error. Our approach is based on a stochastic primal-dual method and we improve the best-known convergence rate from $$O(1/\sqrt{T})$$ to $O(1/T)$, where $$T$$ is the total number of iterations. Our analysis explicitly takes into account the Markovian nature of the sampling and addresses a broader class of problems than the commonly-used i.i.d. sampling scenario.more » « less
An official website of the United States government

