skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Model, Data and Reward Repair: Trusted Machine Learning for Markov Decision Processes
When machine learning (ML) algorithms are used in mission-critical domains (e.g., self-driving cars, cyber security) or life-critical domains (e.g., surgical robotics), it is often important to ensure that the learned models satisfy some high-level correctness requirements — these requirements can be instantiated in particular domains via constraints like safety (e.g., a robot arm should not come within five meters of any human operator during any phase of performing an autonomous operation) or liveness (e.g., a car should eventually cross a 4-way intersection). Such constraints can be formally described in propositional logic, first order logic or temporal logics such as Probabilistic Computation Tree Logic (PCTL)[31]. For example, in a lane change controller we can enforce the following PCTL safety property on seeing a slow-moving truck in front: Pr>0.99[F(changedLane or reducedSpeed)] , where F is the eventually operator in PCTL logic — this property states that the car should eventually change lanes or reduce speed with high probability (greater than 0.99). Trusted Machine Learning (TML) refers to a learning methodology that ensures that the specified properties are satisfied.  more » « less
Award ID(s):
1740079
PAR ID:
10075838
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)
Page Range / eLocation ID:
194 to 199
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. While Deep Reinforcement Learning (DRL) has achieved remarkable success across various domains, it remains vulnerable to occasional catastrophic failures without additional safeguards. An effective solution to prevent these failures is to use a shield that validates and adjusts the agent’s actions to ensure compliance with a provided set of safety specifications. For real-world robotic domains, it is essential to define safety specifications over continuous state and action spaces to accurately account for system dynamics and compute new actions that minimally deviate from the agent’s original decision. In this paper, we present the first shielding approach specifically designed to ensure the satisfaction of safety requirements in continuous state and action spaces, making it suitable for practical robotic applications. Our method builds upon realizability, an essential property that confirms the shield will always be able to generate a safe action for any state in the environment. We formally prove that realizability can be verified for stateful shields, enabling the incorporation of non-Markovian safety requirements, such as loop avoidance. Finally, we demonstrate the effectiveness of our approach in ensuring safety without compromising the policy’s success rate by applying it to a navigation problem and a multi-agent particle environment1. Keywords: Shielding, Reinforcement Learning, Safety, Robotics 
    more » « less
  2. Any safety issues or cyber attacks on an Industrial Control Systems (ICS) may have catastrophic consequences on human lives and the environment. Hence, it is imperative to have resilient tools and mechanisms to protect ICS. To verify the safety and security of the control logic, complete and consistent specifications should be defined to guide the testing process. Second, it is vital to ensure that those requirements are met by the program control algorithm. In this paper, we proposed an approach to formally define the system specifications, safety, and security requirements to build an ontology that is used further to verify the control logic of the PLC software. The use of ontology allowed us to reason about semantic concepts, check the consistency of concepts, and extract specifications by inference. For the proof of concept, we studied part of an industrial chemical process to implement the proposed approach. The experimental results in this work showed that the proposed approach detects inconsistencies in the formally defined requirements and is capable of verifying the correctness and completeness of the control logic. The tools and algorithms designed and developed as part of this work will help technicians and engineers create safer and more secure control logic for ICS processes. 
    more » « less
  3. This paper studies the evaluation of learning-based object detection models in conjunction with model-checking of formal specifications defined on an abstract model of an autonomous system and its environment. In particular, we define two metrics – proposition-labeled and class-labeled confusion matrices – for evaluating object detection, and we incorporate these metrics to compute the satisfaction probability of system-level safety requirements. While confusion matrices have been effective for comparative evaluation of classification and object detection models, our framework fills two key gaps. First, we relate the performance of object detection to formal requirements defined over downstream high-level planning tasks. In particular, we provide empirical results that show that the choice of a good object detection algorithm, with respect to formal requirements on the overall system, significantly depends on the downstream planning and control design. Secondly, unlike the traditional confusion matrix, our metrics account for variations in performance with respect to the distance between the ego and the object being detected. We demonstrate this framework on a car-pedestrian example by computing the satisfaction probabilities for safety requirements formalized in Linear Temporal Logic (LTL). 
    more » « less
  4. Model uncertainties are considered in a learning-based control framework that combines control dependent barrier function (CDBF), time-varying control barrier function (TCBF), and control Lyapunov function (CLF). Tracking control is achieved by CLF, while safety-critical constraints during tracking are guaranteed by CDBF and TCBF. A reinforcement learning (RL) method is applied to jointly learn model uncertainties that related to CDBF, TCBF, and CLF. The learning-based framework eventually formulates a quadratic programming (QP) with different constraints of CDBF, TCBF and CLF involving model uncertainties. It is the first time to apply the proposed learning-based framework for safety-guaranteed tracking control of automated vehicles with uncertainties. The control performances are validated for two different single-lane change maneuvers via Simulink/CarSim® co-simulation and compared for the cases with and without learning. Moreover, the learning effects are discussed through explainable constraints in the QP formulation. 
    more » « less
  5. There is a growing trend toward AI systems interacting with humans to revolutionize a range of application domains such as healthcare and transportation. However, unsafe human-machine interaction can lead to catastrophic failures. We propose a novel approach that predicts future states by accounting for the uncertainty of human interaction, monitors whether predictions satisfy or violate safety requirements, and adapts control actions based on the predictive monitoring results. Specifically, we develop a new quantitative predictive monitor based on Signal Temporal Logic with Uncertainty (STL-U) to compute a robustness degree interval, which indicates the extent to which a sequence of uncertain predictions satisfies or violates an STL-U requirement. We also develop a new loss function to guide the uncertainty calibration of Bayesian deep learning and a new adaptive control method, both of which leverage STL-U quantitative predictive monitoring results. We apply the proposed approach to two case studies: Type 1 Diabetes management and semi-autonomous driving. Experiments show that the proposed approach improves safety and effectiveness in both case studies. 
    more » « less