Often in manufacturing systems, scenarios arise where the demand for maintenance exceeds the capacity of maintenance resources. This results in the problem of allocating the limited resources among machines competing for them. This maintenance scheduling problem can be formulated as a Markov decision process (MDP) with the goal of finding the optimal dynamic maintenance action given the current system state. However, as the system becomes more complex, solving an MDP suffers from the curse of dimensionality. To overcome this issue, we propose a two-stage approach that first optimizes a static condition-based maintenance (CBM) policy using a genetic algorithm (GA) and then improves the policy online via Monte Carlo tree search (MCTS). The static policy significantly reduces the state space of the online problem by allowing us to ignore machines that are not sufficiently degraded. Furthermore, we formulate MCTS to seek a maintenance schedule that maximizes the long-term production volume of the system to reconcile the conflict between maintenance and production objectives. We demonstrate that the resulting online policy is an improvement over the static CBM policy found by GA.
more »
« less
Component-wise Markov decision process for solving condition-based maintenance of large multi-component systems with economic dependence
Condition-based maintenance of multi-component systems is a prevalent engineering problem due to its effectiveness in reducing the operational and maintenance costs of the system. However, developing the exact optimal maintenance decisions for the large multi-component system is computationally challenging, even not feasible, due to the exponential growth in system state and action space size with the number of components in the system. To address the scalability issue in CBM of large multi-component systems, we propose a Component-Wise Markov Decision Process(CW-MDP) and an Adjusted Component-Wise Markov Decision Process (ACW-MDP) to obtain an approximation of the optimal system-level CBM decision policy for large systems with heterogeneous components. We propose using an extended single-component action space to model the impact of system-level setup cost on a component-level solution. The theoretical gap between the proposed approach and system-level optima is also derived. Additionally, theoretical convergence and the relationship between ACW-MDP and CW-MDP are derived. The study further shows extensive numerical studies to demonstrate the effectiveness of component-wise solutions for solving large multi-component systems.
more »
« less
- Award ID(s):
- 2323082
- PAR ID:
- 10553801
- Publisher / Repository:
- Taylor & Francis
- Date Published:
- Journal Name:
- IISE Transactions
- ISSN:
- 2472-5854
- Page Range / eLocation ID:
- 1 to 14
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
When the operation and maintenance (O&M) of infrastructure components is modeled as a Markov Decision Process (MDP), the stochastic evolution following the optimal policy is completely described by a Markov transition matrix. This paper illustrates how to predict relevant features of the time evolution of these controlled components. We are interested in assessing if a critical state is reachable, in assessing the probability of reaching that state within a time period, of visiting that state before another, and in returning to that state. We present analytical methods to address these questions and discuss their computational complexity. Outcomes of these analyses can provide the decision makers with deeper understanding of the component evolution and suggest revising the control policy. We formulate the framework for MDPs and extend it to Partially Observable Markov Decision Processes (POMDPs).more » « less
-
We develop a general reinforcement learning framework for mean field control (MFC) problems. Such problems arise for instance as the limit of collaborative multi-agent control problems when the number of agents is very large. The asymptotic problem can be phrased as the optimal control of a non-linear dynamics. This can also be viewed as a Markov decision process (MDP) but the key difference with the usual RL setup is that the dynamics and the reward now depend on the state's probability distribution itself. Alternatively, it can be recast as a MDP on the Wasserstein space of measures. In this work, we introduce generic model-free algorithms based on the state-action value function at the mean field level and we prove convergence for a prototypical Q-learning method. We then implement an actor-critic method and report numerical results on two archetypal problems: a finite space model motivated by a cyber security application and a continuous space model motivated by an application to swarm motion.more » « less
-
The problem of allocating limited resources to maintain components of a multicomponent system, known as selective maintenance, is naturally formulated as a high-dimensional Markov decision process (MDP). Unfortunately, these problems are difficult to solve exactly for realistically sized systems. With this motivation, we contribute an approximate dynamic programming (ADP) algorithm for solving the selective maintenance problem for a series–parallel system with binary-state components. To the best of our knowledge, this paper describes the first application of ADP to maintain multicomponent systems. Our ADP is compared, using a numerical example from the literature, against exact solutions to the corresponding MDP. We then summarize the results of a more comprehensive set of experiments that demonstrate the ADP’s favorable performance on larger instances in comparison to both the exact (but computationally intensive) MDP approach and the heuristic (but computationally faster) one-step-lookahead approach. Finally, we demonstrate that the ADP is capable of solving an extension of the basic selective maintenance problem in which maintenance resources are permitted to be shared across stages.more » « less
-
Interval Markov decision processes are a class of Markov models where the transition probabilities between the states belong to intervals. In this paper, we study the problem of efficient estimation of the optimal policies in Interval Markov Decision Processes (IMDPs) with continuous action- space. Given an IMDP, we show that the pessimistic (resp. the optimistic) value iterations, i.e., the value iterations under the assumption of a competitive adversary (resp. cooperative agent), are monotone dynamical systems and are contracting with respect to the infinity-norm. Inspired by this dynamical system viewpoint, we introduce another IMDP, called the action-space relaxation IMDP. We show that the action-space relaxation IMDP has two key features: (i) its optimal value is an upper bound for the optimal value of the original IMDP, and (ii) its value iterations can be efficiently solved using tools and techniques from convex optimization. We then consider the policy optimization problems at each step of the value iterations as a feedback controller of the value function. Using this system- theoretic perspective, we propose an iteration-distributed imple- mentation of the value iterations for approximating the optimal value of the action-space relaxation IMDP.more » « less
An official website of the United States government

