When the operation and maintenance (O&M) of infrastructure components is modeled as a Markov Decision Process (MDP), the stochastic evolution following the optimal policy is completely described by a Markov transition matrix. This paper illustrates how to predict relevant features of the time evolution of these controlled components. We are interested in assessing if a critical state is reachable, in assessing the probability of reaching that state within a time period, of visiting that state before another, and in returning to that state. We present analytical methods to address these questions and discuss their computational complexity. Outcomes of these analyses can provide the decision makers with deeper understanding of the component evolution and suggest revising the control policy. We formulate the framework for MDPs and extend it to Partially Observable Markov Decision Processes (POMDPs).
more »
« less
Predicting the Evolution of Controlled Systems Modeled by Finite Markov Processes
The operation and maintenance of infrastructure components and systems can be modeled as a Markov process, partially or fully observable. Information about the current condition can be summarized by the “inner” state of a finite state controller. When a control policy is assigned, the stochastic evolution of the system is completely described by a Markov transition function. This article applies finite state Markov chain analyses to identify relevant features of the time evolution of a controlled system. We focus on assessing if some critical conditions are reachable (or if some actions will ever be taken), in identifying the probability of these critical events occurring within a time period, their expected time of occurrence, their long-term frequency, and the probability that some events occur before others. We present analytical methods based on linear algebra to address these questions, discuss their computational complexity and the structure of the solution. The analyses can be performed after a policy is selected for a Markov decision process (MDP) or a partially observable MDP. Their outcomes depend on the selected policy and examining these outcomes can provide the decision makers with deeper understanding of the consequences of following that policy, and may also suggest revising it.
more »
« less
- Award ID(s):
- 1663479
- PAR ID:
- 10298075
- Date Published:
- Journal Name:
- IEEE Transactions on Reliability
- ISSN:
- 0018-9529
- Page Range / eLocation ID:
- 1 to 19
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We study optimal pricing in a single-server queueing system that can be observable or unobservable, depending on how customers receive information to estimate sojourn time. Our primary objective is to determine whether the service provider is better off making the system observable or unobservable under optimal pricing. We formulate the optimal pricing problem using Markov decision process (MDP) models for both observable and unobservable systems. For unobservable systems, the problem is studied using an MDP with a fixed-point equation as equilibrium constraints. We show that the MDPs for both observable and unobservable queues are special cases of a generalized arrivals-based MDP model, in which the optimal arrival rate (rather than price) is set in each state. Then, we show that the optimal policy that solves the generalized MDP exhibits a monotone structure in that the optimal arrival rate is non-increasing in the queue length, which allows for developing efficient algorithms to determine optimal pricing policies. Next, we show that if no customers overestimate sojourn time in the observable system, it is in the interest of the service provider to make the system observable. We also show that if all customers overestimate sojourn time, the service provider is better off making the system unobservable. Lastly, we learn from numerical results that when customers are heterogeneous in estimating their sojourn time, the service provider is expected to receive a higher gain by making the system observable if on average customers do not significantly overestimate sojourn time.more » « less
-
We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves near-optimal worst-case regret bounds when the underlying Markov decision process (MDP) is communicating with a finite, although unknown, diameter. Our main result is a high probability regret upper bound of [Formula: see text] for any communicating MDP with S states, A actions, and diameter D. Here, regret compares the total reward achieved by the algorithm to the total expected reward of an optimal infinite-horizon undiscounted average reward policy in time horizon T. This result closely matches the known lower bound of [Formula: see text]. Our techniques involve proving some novel results about the anti-concentration of Dirichlet distribution, which may be of independent interest.more » « less
-
Traffic systems exhibit supply-side uncertainty which is alleviated through real-time information. This article explores subscription models for a private agency sharing data at a fixed rate. A multiclass strategy-based equilibrium model is developed for two classes of subscribed and unsubscribed travelers, whose optimal strategy given the link-state costs is modeled as a Markov decision process (MDP) and a partially-observable MDP, respectively. A utility-based subscription choice model is formulated to study the impacts of subscription rates on the percentage of travelers choosing to subscribe. Solutions to the fixed-point formulation are determined using iterative algorithms. The proposed subscription model can be used for designing optimal subscription rates in various settings where real-time information can be a valuable routing tool such as express lanes, parking systems, roadside delivery, and routing of vulnerable road users.more » « less
-
Traffic systems exhibit supply-side uncertainty which is alleviated through real-time information. This article explores subscription models for a private agency sharing data at a fixed rate. A multiclass strategy-based equilibrium model is developed for two classes of subscribed and unsubscribed travelers, whose optimal strategy given the link-state costs is modeled as a Markov decision process (MDP) and a partially-observable MDP, respectively. A utility-based subscription choice model is formulated to study the impacts of subscription rates on the percentage of travelers choosing to subscribe. Solutions to the fixed-point formulation are determined using iterative algorithms. The proposed subscription model can be used for designing optimal subscription rates in various settings where real-time information can be a valuable routing tool such as express lanes, parking systems, roadside delivery, and routing of vulnerable road users.more » « less
An official website of the United States government

