skip to main content


Search for: All records

Creators/Authors contains: "Trivedi, A."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. A basic assumption of traditional reinforcement learning is that the value of a reward does not change once it is received by an agent. The present work forgoes this assumption and considers the situation where the value of a reward decays proportionally to the time elapsed since it was obtained. Emphasizing the inflection point occurring at the time of payment, we use the term asset to refer to a reward that is currently in the possession of an agent. Adopting this language, we initiate the study of depreciating assets within the framework of infinite-horizon quantitative optimization. In particular, we propose a notion of asset depreciation, inspired by classical exponential discounting, where the value of an asset is scaled by a fixed discount factor at each time step after it is obtained by the agent. We formulate an equational characterization of optimality in this context, establish that optimal values and policies can be computed efficiently, and develop a model-free reinforcement learning approach to obtain optimal policies. 
    more » « less
    Free, publicly-accessible full text available May 30, 2024
  2. The successes of reinforcement learning in recent years are underpinned by the characterization of suitable reward functions. However, in settings where such rewards are non-intuitive, difficult to define, or otherwise error-prone in their definition, it is useful to instead learn the reward signal from expert demonstrations. This is the crux of inverse reinforcement learning (IRL). While eliciting learning requirements in the form of scalar reward signals has been shown to be effective, such representations lack explainability and lead to opaque learning. We aim to mitigate this situation by presenting a novel IRL method for eliciting declarative learning requirements in the form of a popular formal logic---Linear Temporal Logic (LTL)---from a set of traces given by the expert policy. 
    more » « less
    Free, publicly-accessible full text available May 30, 2024
  3. Free, publicly-accessible full text available August 1, 2024
  4. Chechik, M. ; Katoen, JP. ; Leucker, M. (Ed.)
    Efficient verification algorithms for neural networks often depend on various abstract domains such as intervals, zonotopes, and linear star sets. The choice of the abstract domain presents an expressiveness vs. scalability trade-off: simpler domains are less precise but yield faster algorithms. This paper investigates the octatope abstract domain in the context of neural net verification. Octatopes are affine transformations of n-dimensional octagons—sets of unit-two-variable-per-inequality (UTVPI) constraints. Octatopes generalize the idea of zonotopes which can be viewed as an affine transformation of a box. On the other hand, octatopes can be considered as a restriction of linear star set, which are affine transformations of arbitrary H-Polytopes. This distinction places octatopes firmly between zonotopes and star sets in their expressive power, but what about the efficiency of decision procedures? An important analysis problem for neural networks is the exact range computation problem that asks to compute the exact set of possible outputs given a set of possible inputs. For this, three computational procedures are needed: 1) optimization of a linear cost function; 2) affine mapping; and 3) over-approximating the intersection with a half-space. While zonotopes allow an efficient solution for these approaches, star sets solves these procedures via linear programming. We show that these operations are faster for octatopes than the more expressive linear star sets. For octatopes, we reduce these problems to min-cost flow problems, which can be solved in strongly polynomial time using the Out-of-Kilter algorithm. Evaluating exact range computation on several ACAS Xu neural network benchmarks, we find that octatopes show promise as a practical abstract domain for neural network verification. 
    more » « less
    Free, publicly-accessible full text available March 3, 2024
  5. Bouajjani, A ; Holík, L. ; Wu, Z. (Ed.)
    This paper presents an optimization based framework to automate system repair against omega-regular properties. In the proposed formalization of optimal repair, the systems are represented as Kripke structures, the properties as omega-regular languages, and the repair space as repair machines—weighted omega-regular transducers equipped with Büchi conditions—that rewrite strings and associate a cost sequence to these rewritings. To translate the resulting cost-sequences to easily interpretable payoffs, we consider several aggregator functions to map cost sequences to numbers—including limit superior, supremum, discounted-sum, and average-sum—to define quantitative cost semantics. The problem of optimal repair, then, is to determine whether traces from a given system can be rewritten to satisfy an omega-regular property when the allowed cost is bounded by a given threshold. We also consider the dual challenge of impair verification that assumes that the rewritings are resolved adversarially under some given cost restriction, and asks to decide if all traces of the system satisfy the specification irrespective of the rewritings. With a negative result to the impair verification problem, we study the problem of designing a minimal mask of the Kripke structure such that the resulting traces satisfy the specifications despite the threshold-bounded impairment. We dub this problem as the mask synthesis problem. This paper presents automata-theoretic solutions to repair synthesis, impair verification, and mask synthesis problem for limit superior, supremum, discounted-sum, and average-sum cost semantics. 
    more » « less
  6. Huisman, M. ; Păsăreanu, C. ; Zhan, N. (Ed.)
    We study the problem of finding optimal strategies in Markov decision processes with lexicographic ω-regular objectives, which are ordered collections of ordinary ω-regular objectives. The goal is to compute strategies that maximise the probability of satisfaction of the first 𝜔-regular objective; subject to that, the strategy should also maximise the probability of satisfaction of the second ω-regular objective; then the third and so forth. For instance, one may want to guarantee critical requirements first, functional ones second and only then focus on the non-functional ones. We show how to harness the classic off-the-shelf model-free reinforcement learning techniques to solve this problem and evaluate their performance on four case studies. 
    more » « less
  7. Silva, A. ; Leino, K.R.M. (Ed.)
    We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach. 
    more » « less