Trust, Resilience and Interpretability of AI Models

Jha, Susmit

doi:10.1007/978-3-030-28423-7_

Citation Details

Trust, Resilience and Interpretability of AI Models

In this tutorial, we present our recent work on building trusted, resilient and interpretable AI models by combining symbolic methods developed for automated reasoning with connectionist learning methods that use deep neural networks. The increasing adoption of artificial intelligence and machine learning in systems, including safety-critical systems, has created a pressing need for developing scalable techniques that can be used to establish trust over their safe behavior, resilience to adversarial attacks, and interpretability to enable human audits. This tutorial is comprised of three components: review of techniques for verification of neural networks, methods for using geometric invariants to defend against adversarial attacks, and techniques for extracting logical symbolic rules by reverse engineering machine learning models. These techniques form the core of TRINITY: Trusted, Resilient and Interpretable AI framework being developed at SRI. In this tutorial, we identify the key challenges in building the TRINITY framework, and report recent results on each of these three fronts. more »

Award ID(s):: 1740079 1750009

PAR ID:: 10119094

Author(s) / Creator(s):: Jha, Susmit

Date Published:: 2019-07-01

Journal Name:: Numerical Software Verification, 2019

Page Range / eLocation ID:: 3-25

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1007/978-3-030-28423-7_

More Like this