skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Search for: All records

Award ID contains: 2146563

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Efficient verification algorithms for neural networks often depend on various abstract domains such as intervals, zonotopes, and linear star sets. The choice of the abstract domain presents an expressiveness vs. scalability trade-off: simpler domains are less precise but yield faster algorithms. This paper investigates the hexatope and octatope abstract domains in the context of neural net verification. Hexatopes are affine transformations of higher-dimensional hexagons, defined by difference constraint systems, and octatopes are affine transformations of higher-dimensional octagons, defined by unit-two-variable-per-inequality constraint systems. These domains generalize the idea of zonotopes which can be viewed as affine transformations of hypercubes. On the other hand, they can be considered as a restriction of linear star sets, which are affine transformations of arbitrary H-Polytopes. This distinction places hexatopes and octatopes firmly between zonotopes and linear star sets in their expressive power, but what about the efficiency of decision procedures? An important analysis problem for neural networks is the exact range computation problem that asks to compute the exact set of possible outputs given a set of possible inputs. For this, three computational procedures are needed: (1) optimization of a linear cost function; (2) affine mapping; and (3) over-approximating the intersection with a half-space. While zonotopes allow an efficient solution for these approaches, star sets solves these procedures via linear programming. We show that these operations are faster for hexatopes and octatopes than they are for the more expressive linear star sets by reducing the linear optimization problem over these domains to the minimum cost network flow, which can be solved in strongly polynomial time using the Out-of-Kilter algorithm. Evaluating exact range computation on several ACAS Xu neural network benchmarks, we find that hexatopes and octatopes show promise as a practical abstract domain for neural network verification. 
    more » « less
    Free, publicly-accessible full text available December 1, 2025
  2. Endriss, Ulle ; Melo, Francisco (Ed.)
    Alternating-time temporal logic (ATL) extends branching time logic by enabling quantification over paths that result from the strategic choices made by multiple agents in various coalitions within the system. While classical temporal logics express properties of “closed” systems, ATL can express properties of “open” systems resulting from interactions among several agents. Reinforcement learning (RL) is a sampling-based approach to decision-making where learning agents, guided by a scalar reward function, discover optimal policies through repeated interactions with the environment. The challenge of translating high-level objectives into scalar rewards for RL has garnered increased interest, particularly following the success of model-free RL algorithms. This paper presents an approach for deploying model-free RL to verify multi-agent systems against ATL specifications. The key contribution of this paper is a verification procedure for model-free RL of quantitative and non-nested classic ATL properties, based on Q-learning, demonstrated on a natural subclass of non-nested ATL formulas. 
    more » « less
    Free, publicly-accessible full text available October 30, 2025
  3. Amato, Nancy ; Driggs-Campbell, Katie ; Ekenna, Chinwe ; Morales, Marco ; O'Kane, Jason (Ed.)
    We present an approach for systematically anticipating the actions and policies employed by oblivious environments in concurrent stochastic games, while maximizing a reward function. Our main contribution lies in the synthesis of a finite information state machine (ISM) whose alphabet ranges over the actions of the environment. Each state of the ISM is mapped to a belief state about the policy used by the environment. We introduce a notion of consistency that guarantees that the belief states tracked by the ISM stays within a fixed distance of the precise belief state obtained by knowledge of the full history. We provide methods for checking consistency of an automaton and a synthesis approach which, upon successful termination, yields an ISM. We construct a Markov Decision Process (MDP) that serves as the starting point for computing optimal policies for maximizing a reward function defined over plays. We present an experimental evaluation over benchmark examples including human activity data for tasks such as cataract surgery and furniture assembly, wherein our approach successfully anticipates the policies and actions of the environment in order to maximize the reward. 
    more » « less
    Free, publicly-accessible full text available October 7, 2025
  4. Hillston, Jane ; Soudjiani, Sadegh (Ed.)
    We study the problem of inferring the discount factor of an agent optimizing a discounted reward objective in a finite state Markov Decision Process (MDP). Discounted reward objectives are common in sequential optimization, reinforcement learning, and algorithmic game theory. The discount factor is an important parameter used in formulating the discounted reward. It captures the “time value” of the reward- i.e., how much reward at hand would equal a promised reward at a future time. Knowing an agent’s discount factor can provide valuable insights into their decision-making, and help predict their preferences in previously unseen environments. However, pinpointing the exact value of the discount factor used by the agent is a challenging problem. Ad-hoc guesses are often incorrect. This paper focuses on the problem of computing the range of possible discount factors for a rational agent given their policy. A naive solution to this problem can be quite expensive. A classic result by Smallwood shows that the interval [0, 1) of possible discount factor can be partitioned into finitely many sub-intervals, such that the optimal policy remains the same for each such sub-interval. Furthermore, optimal policies for neighboring sub-intervals differ for a single state. We show how Smallwood’s result can be exploited to search for discount factor intervals for which a given policy is optimal by reducing it to polynomial root isolation. We extend the result to situations where the policy is suboptimal, but with a value function that is close to optimal. We develop numerical approaches to solve the discount factor elicitation problem and demonstrate the effectiveness of our algorithms through some case studies. 
    more » « less
    Free, publicly-accessible full text available August 29, 2025
  5. Gurfinkel, Arie ; Ganesh, Vijay (Ed.)
    In reinforcement learning, an agent incrementally refines a behavioral policy through a series of episodic interactions with its environment. This process can be characterized as explicit reinforcement learning, as it deals with explicit states and concrete transitions. Building upon the concept of symbolic model checking, we propose a symbolic variant of reinforcement learning, in which sets of states are represented through predicates and transitions are represented by predicate transformers. Drawing inspiration from regular model checking, we choose regular languages over the states as our predicates, and rational transductions as predicate transformations. We refer to this framework as regular reinforcement learning, and study its utility as a symbolic approach to reinforcement learning. Theoretically, we establish results around decidability, approximability, and efficient learnability in the context of regular reinforcement learning. Towards practical applications, we develop a deep regular reinforcement learning algorithm, enabled by the use of graph neural networks. We showcase the applicability and effectiveness of (deep) regular reinforcement learning through empirical evaluation on a diverse set of case studies. 
    more » « less
    Free, publicly-accessible full text available July 26, 2025
  6. A barrier certificate, defined over the states of a dynamical system, is a real-valued function whose zero level set characterizes an in- ductively verifiable state invariant separating reachable states from unsafe ones. When combined with powerful decision procedures— such as sum-of-squares programming (SOS) or satisfiability-modulo- theory solvers (SMT)—barrier certificates enable an automated de- ductive verification approach to safety. The barrier certificate ap- proach has been extended to refute LTL and l -regular specifications by separating consecutive transitions of corresponding l -automata in the hope of denying all accepting runs. Unsurprisingly, such tactics are bound to be conservative as refutation of recurrence properties requires reasoning about the well-foundedness of the transitive closure of the transition relation. This paper introduces the notion of closure certificates as a natural extension of barrier certificates from state invariants to transition invariants. We aug- ment these definitions with SOS and SMT based characterization for automating the search of closure certificates and demonstrate their effectiveness over some case studies. 
    more » « less
    Free, publicly-accessible full text available May 14, 2025
  7. Regular decision processes (RDPs) are a subclass of non- Markovian decision processes where the transition and reward functions are guarded by some regular property of the past (a lookback). While RDPs enable intuitive and succinct rep- resentation of non-Markovian decision processes, their ex- pressive power coincides with finite-state Markov decision processes (MDPs). We introduce omega-regular decision pro- cesses (ODPs) where the non-Markovian aspect of the transi- tion and reward functions are extended to an ω-regular looka- head over the system evolution. Semantically, these looka- heads can be considered as promises made by the decision maker or the learning agent about her future behavior. In par- ticular, we assume that if the promised lookaheads are not fulfilled, then the decision maker receives a payoff of ⊥ (the least desirable payoff), overriding any rewards collected by the decision maker. We enable optimization and learning for ODPs under the discounted-reward objective by reducing them to lexicographic optimization and learning over finite MDPs. We present experimental results demonstrating the effectiveness of the proposed reduction. 
    more » « less
    Free, publicly-accessible full text available April 20, 2025
  8. Notions of transition invariants and closure certificates have seen recent use in the formal verification of controlled dy- namical systems against ω-regular properties. The existing approaches face limitations in two directions. First, they re- quire a closed-form mathematical expression representing the model of the system. Such an expression may be difficult to find, too complex to be of any use, or unavailable due to security or privacy constraints. Second, finding such invari- ants typically rely on optimization techniques such as sum-of- squares (SOS) or satisfiability modulo theory (SMT) solvers. This restricts the classes of systems that need to be formally verified. To address these drawbacks, we introduce a notion of neural closure certificates. We present a data-driven algo- rithm that trains a neural network to represent a closure cer- tificate. Our approach is formally correct under some mild as- sumptions, i.e., one is able to formally show that the unknown system satisfies the ω-regular property of interest if a neural closure certificate can be computed. Finally, we demonstrate the efficacy of our approach with relevant case studies. 
    more » « less
  9. Linear temporal logic (LTL) and ω-regular objectives—a su- perset of LTL—have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We in- troduce a model-based probably approximately correct (PAC) learning algorithm for ω-regular objectives in Markov deci- sion processes (MDPs). As part of the development of our algorithm, we introduce the ε-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the ω-regular objective in the limit. We prove that our algo- rithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory. 
    more » « less
  10. We present a modular approach to reinforcement learning (RL) in environments consisting of simpler components evolving in parallel. A monolithic view of such modular environments may be prohibitively large to learn, or may require unrealizable communication between the components in the form of a centralized controller. Our proposed approach is based on the assume-guarantee paradigm where the optimal control for the individual components is synthesized in isolation by making assumptions about the behaviors of neighboring components, and providing guarantees about their own behavior. We express these assume-guarantee contracts as regular languages and provide automatic translations to scalar rewards to be used in RL. By combining local probabilities of satisfaction for each component, we provide a lower bound on the probability of sat- isfaction of the complete system. By solving a Markov game for each component, RL can produce a controller for each component that maximizes this lower bound. The controller utilizes the information it receives through communication, observations, and any knowledge of a coarse model of other agents. We experimentally demonstrate the efficiency of the proposed approach on a variety of case studies. 
    more » « less