skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning
We characterize the class of nondeterministic 𝜔-automata that can be used for the analysis of finite Markov decision processes (MDPs). We call these automata ‘good-for-MDPs’ (GFM). We show that GFM automata are closed under classic simulation as well as under more powerful simulation relations that leverage properties of optimal control strategies for MDPs. This closure enables us to exploit state-space reduction techniques, such as those based on direct and delayed simulation, that guarantee simulation equivalence. We demonstrate the promise of GFM automata by defining a new class of automata with favorable properties—they are BĂźchi automata with low branching degree obtained through a simple construction—and show that going beyond limit-deterministic automata may significantly benefit reinforcement learning.  more » « less
Award ID(s):
2009022
PAR ID:
10216133
Author(s) / Creator(s):
; ; ; ; ;
Editor(s):
Biere, Armin; Parker, David
Date Published:
Journal Name:
Tools and Algorithms for the Construction and Analysis of Systems
Page Range / eLocation ID:
306-323
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Bouajjani, A.; HolĂ­k, L.; Wu, Z. (Ed.)
    When omega-regular objectives were first proposed in model-free reinforcement learning (RL) for controlling MDPs, deterministic Rabin automata were used in an attempt to provide a direct translation from their transitions to scalar values. While these translations failed, it has turned out that it is possible to repair them by using good-for-MDPs (GFM) Buechi automata instead. These are nondeterministic Buechi automata with a restricted type of nondeterminism, albeit not as restricted as in good-for-games automata. Indeed, deterministic Rabin automata have a pretty straightforward translation to such GFM automata, which is bi-linear in the number of states and pairs. Interestingly, the same cannot be said for deterministic Streett automata: a translation to nondeterministic Rabin or Buechi automata comes at an exponential cost, even without requiring the target automaton to be good-for-MDPs. Do we have to pay more than that to obtain a good-for-MDPs automaton? The surprising answer is that we have to pay significantly less when we instead expand the good-for-MDPs property to alternating automata: like the nondeterministic GFM automata obtained from deterministic Rabin automata, the alternating good-for-MDPs automata we produce from deterministic Streett automata are bi-linear in the size of the deterministic automaton and its index. They can therefore be exponentially more succinct than the minimal nondeterministic Buechi automaton. 
    more » « less
  2. null (Ed.)
    Register automata have been used as a convenient model for specifying and verifying database driven systems. An important problem in such systems is to provide views that hide or restructure certain information about the data or process, extending classical notions of database views. In this paper we carry out a formal investigation of views of register automata by considering simple views that project away some of the registers. We show that classical register automata are not able to describe such projections and introduce more powerful register automata that are able to do so. We also show useful properties of these automata such as closure under projection and decidability of verifying temporal properties of their runs 
    more » « less
  3. Computation of optimal recovery decisions for community resilience assurance post-hazard is a combinatorial decision-making problem under uncertainty. It involves solving a large-scale optimization problem, which is significantly aggravated by the introduction of uncertainty. In this paper, we draw upon established tools from multiple research communities to provide an effective solution to this challenging problem. We provide a stochastic model of damage to the water network (WN) within a testbed community following a severe earthquake and compute near-optimal recovery actions for restoration of the water network. We formulate this stochastic decision-making problem as a Markov Decision Process (MDP), and solve it using a popular class of heuristic algorithms known as rollout. A simulation-based representation of MDPs is utilized in conjunction with rollout and the Optimal Computing Budget Allocation (OCBA) algorithm to address the resulting stochastic simulation optimization problem. Our method employs non-myopic planning with efficient use of simulation budget. We show, through simulation results, that rollout fused with OCBA performs competitively with respect to rollout with total equal allocation (TEA) at a meagre simulation budget of 5-10% of rollout with TEA, which is a crucial step towards addressing large-scale community recovery problems following natural disasters. 
    more » « less
  4. Microservice-based application deployments need to administer safety properties while serving requests. However, today such properties can be specified only in limited ways that can lead to overly permissive policies and the potential for illegitimate flow of information across microservices, or ad hoc policy implementations. We argue that a range of use cases require safety properties for the flow of requests across the whole microservice network, rather than only between adjacent hops. To begin specifying such expressive policies, we propose a system for declaring and deploying service tree policies. These policies are compiled down into declarative filters that are inserted into microservice deployment manifests. We use a light-weight dynamic monitor based enforcement mechanism, using ideas from automata theory. Experiments with our preliminary prototype show that we can capture a wide class of policies that we describe as case studies. 
    more » « less
  5. This paper proposes a unified distributed secondary control for the grid-forming (GFM) and grid-feeding (GFE) converters in DC microgrids. An optimization problem is formulated for the secondary control and the objective function considers regulating the global average of the GFM and GFE converter output voltages and proportional current sharing among all GFM and GFE converters. A unified distributed control is then designed to generate voltage and current references respectively for GFM and GFE converters based on the formulated optimization problem. The dynamic model of the DC microgrid under the proposed control is also developed, and steady-state analysis is performed to show that the proposed distributed control can achieve the control objectives in steady state. The performance of the proposed control is validated through real-time simulations in OPAL-RT on an 8-DG DC microgrid system. 
    more » « less