Optimizing risk-averse objectives in discounted MDPs is challenging because most models do not admit direct dynamic programming equations and require complex history-dependent policies. In this paper, we show that the risk-averse total reward criterion, under the Entropic Risk Measure (ERM) and Entropic Value at Risk (EVaR) risk measures, can be optimized by a stationary policy, making it simple to analyze, interpret, and deploy. We propose exponential value iteration, policy iteration, and linear programming to compute optimal policies. Compared with prior work, our results only require the relatively mild condition of transient MDPs and allow for both positive and negative rewards. Our results indicate that the total reward criterion may be preferable to the discounted criterion in a broad range of risk-averse reinforcement learning domains.
more »
« less
Risk-Averse Learning by Temporal Difference Methods with Markov Risk Measures
We propose a novel reinforcement learning methodology where the system performance is evaluated by a Markov coherent dynamic risk measure with the use of linear value function approximations. We construct projected risk-averse dynamic programming equations and study their properties. We propose new risk-averse counterparts of the basic and multi-step methods of temporal differences and we prove their convergence with probability one. We also perform an empirical study on a complex transportation problem.
more »
« less
- Award ID(s):
- 1907522
- PAR ID:
- 10224995
- Date Published:
- Journal Name:
- Journal of machine learning research
- Volume:
- 22
- ISSN:
- 1532-4435
- Page Range / eLocation ID:
- 1 - 34
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider a risk-averse stochastic capacity planning problem under uncertain demand in each period. Using a scenario tree representation of the uncertainty, we formulate a multistage stochastic integer program to adjust the capacity expansion plan dynamically as more information on the uncertainty is revealed. Specifically, in each stage, a decision maker optimizes capacity acquisition and resource allocation to minimize certain risk measures of maintenance and operational cost. We compare it with a two-stage approach that determines the capacity acquisition for all the periods up front. Using expected conditional risk measures, we derive a tight lower bound and an upper bound for the gaps between the optimal objective values of risk-averse multistage models and their two-stage counterparts. Based on these derived bounds, we present general guidelines on when to solve risk-averse two-stage or multistage models. Furthermore, we propose approximation algorithms to solve the two models more efficiently, which are asymptotically optimal under an expanding market assumption. We conduct numerical studies using randomly generated and real-world instances with diverse sizes, to demonstrate the tightness of the analytical bounds and efficacy of the approximation algorithms. We find that the gaps between risk-averse multistage and two-stage models increase as the variability of the uncertain parameters increases and decrease as the decision maker becomes more risk averse. Moreover, a stagewise-dependent scenario tree attains much higher gaps than a stagewise-independent counterpart, whereas the latter produces tighter analytical bounds. History: Accepted by Andrea Lodi, Area Editor for Design & Analysis of Algorithms–Discrete. Funding: This work of Dr. X. Yu was partially supported by the U.S. National Science Foundation Division of Information and Intelligent Systems [Grant 2331782]. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2023.0396 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2023.0396 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .more » « less
-
Building on Pomatto, Strack, and Tamuz (2020), we identify a tight condition for when background risk can induce first-order stochastic dominance. Using this condition, we show that under plausible levels of background risk, no theory of choice under risk can simultaneously satisfy the following three economic postulates: (i) decision-makers are risk averse over small gambles, (ii) their preferences respect stochastic dominance, and (iii) they account for background risk. This impossibility result applies to expected utility theory, prospect theory, rank-dependent utility, and many other models. (JEL D81, D91)more » « less
-
The performance of a model predictive controller depends on the accuracy of the objective and prediction model of the system. Although significant efforts have been dedicated to improving the robustness of model predictive control (MPC), they typically do not take a risk-averse perspective. In this paper, we propose a risk-aware MPC framework, which estimates the underlying parameter distribution using online Bayesian learning and derives a risk-aware control policy by reformulating classical MPC problems as Bayesian Risk Optimization (BRO) problems. The consistency of the Bayesian estimator and the convergence of the control policy are rigorously proved. Furthermore, we investigate the consistency requirement and propose a risk monitoring mechanism to guarantee the satisfaction of the consistency requirement. Simulation results demonstrate the effectiveness of the proposed approach.more » « less
-
The safe internal transportation of hazardous materials within healthcare facilities is critical to mitigating risks to patients, staff, and visitors. This paper presents a risk-averse path planning framework for autonomously handling hazardous materials in healthcare systems. We model the indoor environment with grid-based obstacle and risk maps, where risk arises from pedestrian flow density and proximity to critical zones. Our novel risk-averse path planning approach integrates risk directly into each transition cost, thereby enabling more robust and secure path selection. We further improve efficiency through (i) a bidirectional variant that cuts search time and (ii) a post-optimization step that minimizes unnecessary heading changes while respecting a risk budget. We evaluated our framework on multiple simulated grid maps and compared it with established methods, measuring path length, average risk, and computational time. The results demonstrate that the proposed framework consistently generates safe and efficient paths while minimizing computational overhead.more » « less
An official website of the United States government

