- Award ID(s):
- 1553873
- NSF-PAR ID:
- 10211347
- Date Published:
- Journal Name:
- Robotics science and systems
- ISSN:
- 2330-7668
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
null (Ed.)The Internet-of-Things, complex sensor networks, multi-agent cyber-physical systems are all examples of spatially distributed systems that continuously evolve in time. Such systems generate huge amounts of spatio-temporal data, and system designers are often interested in analyzing and discovering structure within the data. There has been considerable interest in learning causal and logical properties of temporal data using logics such as Signal Temporal Logic (STL); however, there is limited work on discovering such relations on spatio-temporal data. We propose the first set of algorithms for unsupervised learning for spatio-temporal data. Our method does automatic feature extraction from the spatio-temporal data by projecting it onto the parameter space of a parametric spatio-temporal reach and escape logic (PSTREL). We propose an agglomerative hierarchical clustering technique that guarantees that each cluster satisfies a distinct STREL formula. We show that our method generates STREL formulas of bounded description complexity using a novel decision-tree approach which generalizes previous unsupervised learning techniques for Signal Temporal Logic. We demonstrate the effectiveness of our approach on case studies from diverse domains such as urban transportation, epidemiology, green infrastructure, and air quality monitoring.more » « less
-
We address the problem of learning from demonstrations when the learner must satisfy safety and/or performance requirements expressed as Stochastic Temporal Logic (StTL) specifications. We extend the maximum causal entropy inverse reinforcement learning framework to account for StTL constraints and show how to encode them via a minimal set of mixed-integer linear constraints. Our method is based on a cut-and-generate algorithm that iterates between two phases: in the cut phase, we use cutting hyperplanes to approximate the feasible region of the non-linear constraint that encodes atomic predicates and in the generate phase, we propagate these hyperplanes through the schematics to generate constraints for arbitrary formulas. Our algorithmic contributions are validated in different environments and specifications.more » « less
-
We extend the learning from demonstration paradigm by providing a method for learning unknown constraints shared across tasks, using demonstrations of the tasks, their cost functions, and knowledge of the system dynamics and control constraints. Given safe demonstrations, our method uses hit-and-run sampling to obtain lower cost, and thus unsafe, trajectories. Both safe and unsafe trajectories are used to obtain a consistent representation of the unsafe set via solving an integer program. Our method generalizes across system dynamics and learns a guaranteed subset of the constraint. In addition, by leveraging a known parameterization of the constraint, we modify our method to learn parametric constraints in high dimensions. We also provide theoretical analysis on what subset of the constraint and safe set can be learnable from safe demonstrations. We demonstrate our method on linear and nonlinear system dynamics, show that it can be modified to work with suboptimal demonstrations, and that it can also be used to learn constraints in a feature space.more » « less
-
Runtime verification is a lightweight method for monitoring the formal specification of a system during its execution. It has recently been shown that a given state predicate can be monitored consistently by a set of crash-prone asynchronousdistributed monitors observing the system, only if each monitor can emit verdicts taken from alarge enough finite set. We revisit this impossibility result in the concrete context of linear-time logic (ltl ) semantics for runtime verification, that is, when the correctness of the system is specified by anltl formula on its execution traces. First, we show that monitors synthesized based on the 4-valued semantics ofltl (rv-ltl ) may result in inconsistent distributed monitoring, even for some simpleltl formulas. More generally, given anyltl formula φ, we relate the number of different verdicts required by the monitors for consistently monitoring φ, with a specific structural characteristic of φ called itsalternation number . Specifically, we show that, for everyk ≥ 0 , there is anltl formula φ with alternation numberk that cannot be verified at runtime by distributed monitors emitting verdicts from a set of cardinality smaller thank + 1. On the positive side, we define a family of logics, calleddistributed ltl (abbreviated asdltl ), parameterized byk ≥ 0, which refinesrv-ltl by incorporating2k + 4 truth values. Our main contribution is to show that, for everyk ≥ 0, everyltl formula φ with alternation numberk can be consistently monitored by distributed monitors, each running an automaton based on a (2 ⌈k /2 ⌉ +4)-valued logic taken from thedltl family. -
In recent years, researchers have made significant progress in devising reinforcement-learning algorithms for optimizing linear temporal logic (LTL) objectives and LTL-like objectives.Despite these advancements, there are fundamental limitations to how well this problem can be solved. Previous studies have alluded to this fact but have not examined it in depth.In this paper, we address the tractability of reinforcement learning for general LTL objectives from a theoretical perspective.We formalize the problem under the probably approximately correct learning in Markov decision processes (PAC-MDP) framework, a standard framework for measuring sample complexity in reinforcement learning.In this formalization, we prove that the optimal policy for any LTL formula is PAC-MDP-learnable if and only if the formula is in the most limited class in the LTL hierarchy, consisting of formulas that are decidable within a finite horizon.Practically, our result implies that it is impossible for a reinforcement-learning algorithm to obtain a PAC-MDP guarantee on the performance of its learned policy after finitely many interactions with an unconstrained environment for LTL objectives that are not decidable within a finite horizon.