skip to main content


Title: A sequential decision making prospective on resilience
We investigate how sequential decision making analysis can be used for modeling system resilience. In the aftermath of an extreme event, agents involved in the emergency management aim at an optimal recovery process, trading off the loss due to lack of system functionality with the investment needed for a fast recovery. This process can be formulated as a sequential decision-making optimization problem, where the overall loss has to be minimized by adopting an appropriate policy, and dynamic programming applied to Markov Decision Processes (MDPs) provides a rational and computationally feasible framework for a quantitative analysis. The paper investigates how trends of post-event loss and recovery can be understood in light of the sequential decision making framework. Specifically, it is well known that system’s functionality is often taken to a level different from that before the event: this can be the result of budget constraints and/or economic opportunity, and the framework has the potential of integrating these considerations. But we focus on the specific case of an agent learning something new about the process, and reacting by updating the target functionality level of the system. We illustrate how this can happen in a simplified setting, by using Hidden-Model MPDs (HM-MDPs) for modelling the management of a set of components under model uncertainty. When an extreme event occurs, the agent updates the hazard model and, consequently, her response and long-term planning.  more » « less
Award ID(s):
1638327
PAR ID:
10065508
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Safety, Reliability, Risk, Resilience and Sustainability of Structures and Infrastructure 12th Int. Conf. on Structural Safety and Reliability, Vienna, Austria, 6–10 August 2017
Volume:
1
Page Range / eLocation ID:
2633-2640
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The functioning of interdependent civil infrastructure systems in the aftermath of a disruptive event is critical to the performance and vitality of any modern urban community. Post-event stressors and chaotic circumstances, time limitations, and complexities in the community recovery process highlight the necessity for a comprehensive decision-making framework at the community-level for post-event recovery management. Such a framework must be able to handle large-scale scheduling and decision processes, which involve difficult control problems with large combinatorial decision spaces. This study utilizes approximate dynamic programming algorithms along with heuristics for the identification of optimal community recovery actions following the occurrence of an extreme earthquake event. The proposed approach addresses the curse of dimensionality in its analysis and management of multi-state, large-scale infrastructure systems. Furthermore, the proposed approach can consider the cur-rent recovery policies of responsible public and private entities within the community and shows how their performance might be improved. A testbed community coarsely modeled after Gilroy, California, is utilized as an illustrative example. While the illustration provides optimal policies for the Electrical Power Network serving Gilroy following a severe earthquake, preliminary work shows that the methodology is computationally well suited to other infrastructure systems and hazards. 
    more » « less
  2. The functioning of interdependent civil infrastructure systems in the aftermath of a disruptive event is critical to the performance and vitality of any modern urban community. Post-event stressors and chaotic circumstances, time limitations, and complexities in the community recovery process highlight the necessity for a comprehensive decision-making framework at the community-level for post-event recovery management. Such a framework must be able to handle large-scale scheduling and decision processes, which involve difficult control problems with large combinatorial decision spaces. This study utilizes approximate dynamic programming algorithms along with heuristics for the identification of optimal community recovery actions following the occurrence of an extreme earthquake event. The proposed approach addresses the curse of dimensionality in its analysis and management of multi-state, large-scale infrastructure systems. Furthermore, the proposed approach can consider the cur-rent recovery policies of responsible public and private entities within the community and shows how their performance might be improved. A testbed community coarsely modeled after Gilroy, California, is utilized as an illustrative example. While the illustration provides optimal policies for the Electrical Power Network serving Gilroy following a severe earthquake, preliminary work shows that the methodology is computationally well suited to other infrastructure systems and hazards. 
    more » « less
  3. Abstract To be responsive to dynamically changing real-world environments, an intelligent agent needs to perform complex sequential decision-making tasks that are often guided by commonsense knowledge. The previous work on this line of research led to the framework called interleaved commonsense reasoning and probabilistic planning (i corpp ), which used P-log for representing commmonsense knowledge and Markov Decision Processes (MDPs) or Partially Observable MDPs (POMDPs) for planning under uncertainty. A main limitation of i corpp is that its implementation requires non-trivial engineering efforts to bridge the commonsense reasoning and probabilistic planning formalisms. In this paper, we present a unified framework to integrate i corpp ’s reasoning and planning components. In particular, we extend probabilistic action language pBC + to express utility, belief states, and observation as in POMDP models. Inheriting the advantages of action languages, the new action language provides an elaboration tolerant representation of POMDP that reflects commonsense knowledge. The idea led to the design of the system pbcplus2pomdp , which compiles a pBC + action description into a POMDP model that can be directly processed by off-the-shelf POMDP solvers to compute an optimal policy of the pBC + action description. Our experiments show that it retains the advantages of i corpp while avoiding the manual efforts in bridging the commonsense reasoner and the probabilistic planner. 
    more » « less
  4. This paper introduces a simple efficient learning algorithms for general sequential decision making. The algorithm combines Optimism for exploration with Maximum Likelihood Estimation for model estimation, which is thus named OMLE. We prove that OMLE learns the near-optimal policies of an enormously rich class of sequential decision making problems in a polynomial number of samples. This rich class includes not only a majority of known tractable model-based Reinforcement Learning (RL) problems (such as tabular MDPs, factored MDPs, low witness rank problems, tabular weakly-revealing/observable POMDPs and multi-step decodable POMDPs ), but also many new challenging RL problems especially in the partially observable setting that were not previously known to be tractable. Notably, the new problems addressed by this paper include (1) observable POMDPs with continuous observation and function approximation, where we achieve the first sample complexity that is completely independent of the size of observation space; (2) well-conditioned low-rank sequential decision making problems (also known as Predictive State Representations (PSRs)), which include and generalize all known tractable POMDP examples under a more intrinsic representation; (3) general sequential decision making problems under SAIL condition, which unifies our existing understandings of model-based RL in both fully observable and partially observable settings. SAIL condition is identified by this paper, which can be viewed as a natural generalization of Bellman/witness rank to address partial observability. This paper also presents a reward-free variant of OMLE algorithm, which learns approximate dynamic models that enable the computation of near-optimal policies for all reward functions simultaneously. 
    more » « less
  5. Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker. We model these problems as Exo-MDPs (Markov Decision Processes with Exogenous Inputs) and design a class of data-efficient algorithms for them termed Hindsight Learning (HL). Our HL algorithms achieve data efficiency by leveraging a key insight: having samples of the exogenous variables, past decisions can be revisited in hindsight to infer counterfactual consequences that can accelerate policy improvements. We compare HL against classic baselines in the multi-secretary and airline revenue management problems. We also scale our algorithms to a business-critical cloud resource management problem – allocating Virtual Machines (VMs) to physical machines, and simulate their performance with real datasets from a large public cloud provider. We find that HL algorithms outperform domain-specific heuristics, as well as state-of-the-art reinforcement learning methods. 
    more » « less