skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, January 16 until 2:00 AM ET on Friday, January 17 due to maintenance. We apologize for the inconvenience.


Title: Predicting the Condition Evolution of Controlled Infrastructure Components Modeled by Markov Processes
When the operation and maintenance (O&M) of infrastructure components is modeled as a Markov Decision Process (MDP), the stochastic evolution following the optimal policy is completely described by a Markov transition matrix. This paper illustrates how to predict relevant features of the time evolution of these controlled components. We are interested in assessing if a critical state is reachable, in assessing the probability of reaching that state within a time period, of visiting that state before another, and in returning to that state. We present analytical methods to address these questions and discuss their computational complexity. Outcomes of these analyses can provide the decision makers with deeper understanding of the component evolution and suggest revising the control policy. We formulate the framework for MDPs and extend it to Partially Observable Markov Decision Processes (POMDPs).  more » « less
Award ID(s):
1663479
PAR ID:
10113825
Author(s) / Creator(s):
;
Date Published:
Journal Name:
13th International Conference on Applications of Statistics and Probability in Civil Engineering(ICASP13), Seoul, South Korea, May 26-30, 2019
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The operation and maintenance of infrastructure components and systems can be modeled as a Markov process, partially or fully observable. Information about the current condition can be summarized by the “inner” state of a finite state controller. When a control policy is assigned, the stochastic evolution of the system is completely described by a Markov transition function. This article applies finite state Markov chain analyses to identify relevant features of the time evolution of a controlled system. We focus on assessing if some critical conditions are reachable (or if some actions will ever be taken), in identifying the probability of these critical events occurring within a time period, their expected time of occurrence, their long-term frequency, and the probability that some events occur before others. We present analytical methods based on linear algebra to address these questions, discuss their computational complexity and the structure of the solution. The analyses can be performed after a policy is selected for a Markov decision process (MDP) or a partially observable MDP. Their outcomes depend on the selected policy and examining these outcomes can provide the decision makers with deeper understanding of the consequences of following that policy, and may also suggest revising it. 
    more » « less
  2. What policy should be employed in a Markov decision process with uncertain parameters? Robust optimization answer to this question is to use rectangular uncertainty sets, which independently reflect available knowledge about each state, and then obtains a decision policy that maximizes expected reward for the worst-case decision process parameters from these uncertainty sets. While this rectangularity is convenient computationally and leads to tractable solutions, it often produces policies that are too conservative in practice, and does not facilitate knowledge transfer between portions of the state space or across related decision processes. In this work, we propose non-rectangular uncertainty sets that bound marginal moments of state-action features defined over entire trajectories through a decision process. This enables generalization to different portions of the state space while retaining appropriate uncertainty of the decision process. We develop algorithms for solving the resulting robust decision problems, which reduce to finding an optimal policy for a mixture of decision processes, and demonstrate the benefits of our approach experimentally. 
    more » « less
  3. Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. We consider for the first time the semiparametric efficiency limits of OPE in Markov decision processes (MDPs), where actions, rewards, and states are memoryless. We show existing OPE estimators may fail to be efficient in this setting. We develop a new estimator based on cross-fold estimation of q-functions and marginalized density ratios, which we term double reinforcement learning (DRL). We show that DRL is efficient when both components are estimated at fourth-root rates and is also doubly robust when only one component is consistent. We investigate these properties empirically and demonstrate the performance benefits due to harnessing memorylessness. 
    more » « less
  4. Condition-based maintenance of multi-component systems is a prevalent engineering problem due to its effectiveness in reducing the operational and maintenance costs of the system. However, developing the exact optimal maintenance decisions for the large multi-component system is computationally challenging, even not feasible, due to the exponential growth in system state and action space size with the number of components in the system. To address the scalability issue in CBM of large multi-component systems, we propose a Component-Wise Markov Decision Process(CW-MDP) and an Adjusted Component-Wise Markov Decision Process (ACW-MDP) to obtain an approximation of the optimal system-level CBM decision policy for large systems with heterogeneous components. We propose using an extended single-component action space to model the impact of system-level setup cost on a component-level solution. The theoretical gap between the proposed approach and system-level optima is also derived. Additionally, theoretical convergence and the relationship between ACW-MDP and CW-MDP are derived. The study further shows extensive numerical studies to demonstrate the effectiveness of component-wise solutions for solving large multi-component systems. 
    more » « less
  5. The planning domain has experienced increased interest in the formal synthesis of decision-making policies. This formal synthesis typically entails finding a policy which satisfies formal specifications in the form of some well-defined logic. While many such logics have been proposed with varying degrees of expressiveness and complexity in their capacity to capture desirable agent behavior, their value is limited when deriving decision-making policies which satisfy certain types of asymptotic behavior in general system models. In particular, we are interested in specifying constraints on the steady-state behavior of an agent, which captures the proportion of time an agent spends in each state as it interacts for an indefinite period of time with its environment. This is sometimes called the average or expected behavior of the agent and the associated planning problem is faced with significant challenges unless strong restrictions are imposed on the underlying model in terms of the connectivity of its graph structure. In this paper, we explore this steady-state planning problem that consists of deriving a decision-making policy for an agent such that constraints on its steady-state behavior are satisfied. A linear programming solution for the general case of multichain Markov Decision Processes (MDPs) is proposed and we prove that optimal solutions to the proposed programs yield stationary policies with rigorous guarantees of behavior. 
    more » « less