skip to main content

Title: Explaining AI Decisions Using Efficient Methods for Learning Sparse Boolean Formulae
In this paper, we consider the problem of learning Boolean formulae from examples obtained by actively querying an oracle that can label these examples as either positive or negative. This problem has received attention in both machine learning as well as formal methods communities, and it has been shown to have exponential worst-case complexity in the general case as well as for many restrictions. In this paper, we focus on learning sparse Boolean formulae which depend on only a small (but unknown) subset of the overall vocabulary of atomic propositions. We propose two algorithms—first, based on binary search in the Hamming space, and the second, based on random walk on the Boolean hypercube, to learn these sparse Boolean formulae with a given confidence. This assumption of sparsity is motivated by the problem of mining explanations for decisions made by artificially intelligent (AI) algorithms, where the explanation of individual decisions may depend on a small but unknown subset of all the inputs to the algorithm. We demonstrate the use of these algorithms in automatically generating explanations of these decisions. These explanations will make intelligent systems more understandable and accountable to human users, facilitate easier audits and provide diagnostic information in the more » case of failure. The proposed approach treats the AI algorithm as a black-box oracle; hence, it is broadly applicable and agnostic to the specific AI algorithm. We show that the number of examples needed for both proposed algorithms only grows logarithmically with the size of the vocabulary of atomic propositions. We illustrate the practical effectiveness of our approach on a diverse set of case studies. « less
; ; ; ;
Award ID(s):
1740079 1750009
Publication Date:
Journal Name:
Journal of Automated Reasoning
Sponsoring Org:
National Science Foundation
More Like this
  1. Pe'er, I. (Ed.)
    Combinatorial group testing and compressed sensing both focus on recovering a sparse vector of dimensionality n from a much smaller number 𝑚<𝑛 of measurements. In the first approach, the problem is defined over the Boolean field – the goal is to recover a Boolean vector and measurements are Boolean; in the second approach, the unknown vector and the measurements are over the reals. Here, we focus on real-valued group testing setting that more closely fits modern testing protocols relying on quantitative measurements, such as qPCR, where the goal is recovery of a sparse, Boolean vector and the pooling matrix needs to be Boolean and sparse, but the unknown input signal vector and the measurement outcomes are nonnegative reals, and the matrix algebra implied in the test protocol is over the reals. With the recent renewed interest in group testing, focus has been on quantitative measurements resulting from qPCR, but the method proposed for sample pooling were based on matrices designed with Boolean measurements in mind. Here, we investigate constructing pooling matrices dedicated for the real-valued group testing. We provide conditions for pooling matrices to guarantee unambiguous decoding of positives in this setting. We also show a deterministic algorithm for constructingmore »matrices meeting the proposed condition, for small matrix sizes that can be implemented using a laboratory robot. Using simulated data, we show that the proposed approach leads to matrices that can be applied for higher positivity rates than combinatorial group testing matrices considered for viral testing previously. We also validate the approach through wet lab experiments involving SARS-CoV-2 nasopharyngeal swab samples.« less
  2. null (Ed.)
    Abstract One of the classical approaches for estimating the frequencies and damping factors in a spectrally sparse signal is the MUltiple SIgnal Classification (MUSIC) algorithm, which exploits the low-rank structure of an autocorrelation matrix. Low-rank matrices have also received considerable attention recently in the context of optimization algorithms with partial observations, and nuclear norm minimization (NNM) has been widely used as a popular heuristic of rank minimization for low-rank matrix recovery problems. On the other hand, it has been shown that NNM can be viewed as a special case of atomic norm minimization (ANM), which has achieved great success in solving line spectrum estimation problems. However, as far as we know, the general ANM (not NNM) considered in many existing works can only handle frequency estimation in undamped sinusoids. In this work, we aim to fill this gap and deal with damped spectrally sparse signal recovery problems. In particular, inspired by the dual analysis used in ANM, we offer a novel optimization-based perspective on the classical MUSIC algorithm and propose an algorithm for spectral estimation that involves searching for the peaks of the dual polynomial corresponding to a certain NNM problem, and we show that this algorithm is in factmore »equivalent to MUSIC itself. Building on this connection, we also extend the classical MUSIC algorithm to the missing data case. We provide exact recovery guarantees for our proposed algorithms and quantify how the sample complexity depends on the true spectral parameters. In particular, we provide a parameter-specific recovery bound for low-rank matrix recovery of jointly sparse signals rather than use certain incoherence properties as in existing literature. Simulation results also indicate that the proposed algorithms significantly outperform some relevant existing methods (e.g., ANM) in frequency estimation of damped exponentials.« less
  3. We will present a new general framework for robust and adaptive control that allows for distributed and scalable learning and control of large systems of interconnected linear subsystems. The control method is demonstrated for a linear time-invariant system with bounded parameter uncertainties, disturbances and noise. The presented scheme continuously collects measurements to reduce the uncertainty about the system parameters and adapts dynamic robust controllers online in a stable and performance-improving way. A key enabler for our approach is choosing a time-varying dynamic controller implementation, inspired by recent work on System Level Synthesis [1]. We leverage a new robustness result for this implementation to propose a general robust adaptive control algorithm. In particular, the algorithm allows us to impose communication and delay constraints on the controller implementation and is formulated as a sequence of robust optimization problems that can be solved in a distributed manner. The proposed control methodology performs particularly well when the interconnection between systems is sparse and the dynamics of local regions of subsystems depend only on a small number of parameters. As we will show on a five-dimensional exemplary chain-system, the algorithm can utilize system structure to efficiently learn and control the entire system while respecting communicationmore »and implementation constraints. Moreover, although current theoretical results require the assumption of small initial uncertainties to guarantee robustness, we will present simulations that show good closed-loop performance even in the case of large uncertainties, which suggests that this assumption is not critical for the presented technique and future work will focus on providing less conservative guarantees.« less
  4. We investigate the approximability of the following optimization problem. The input is an n× n matrix A=(Aij) with real entries and an origin-symmetric convex body K⊂ ℝn that is given by a membership oracle. The task is to compute (or approximate) the maximum of the quadratic form ∑i=1n∑j=1n Aij xixj=⟨ x,Ax⟩ as x ranges over K. This is a rich and expressive family of optimization problems; for different choices of matrices A and convex bodies K it includes a diverse range of optimization problems like max-cut, Grothendieck/non-commutative Grothendieck inequalities, small set expansion and more. While the literature studied these special cases using case-specific reasoning, here we develop a general methodology for treatment of the approximability and inapproximability aspects of these questions. The underlying geometry of K plays a critical role; we show under commonly used complexity assumptions that polytime constant-approximability necessitates that K has type-2 constant that grows slowly with n. However, we show that even when the type-2 constant is bounded, this problem sometimes exhibits strong hardness of approximation. Thus, even within the realm of type-2 bodies, the approximability landscape is nuanced and subtle. However, the link that we establish between optimization and geometry of Banach spaces allows usmore »to devise a generic algorithmic approach to the above problem. We associate to each convex body a new (higher dimensional) auxiliary set that is not convex, but is approximately convex when K has a bounded type-2 constant. If our auxiliary set has an approximate separation oracle, then we design an approximation algorithm for the original quadratic optimization problem, using an approximate version of the ellipsoid method. Even though our hardness result implies that such an oracle does not exist in general, this new question can be solved in specific cases of interest by implementing a range of classical tools from functional analysis, most notably the deep factorization theory of linear operators. Beyond encompassing the scenarios in the literature for which constant-factor approximation algorithms were found, our generic framework implies that that for convex sets with bounded type-2 constant, constant factor approximability is preserved under the following basic operations: (a) Subspaces, (b) Quotients, (c) Minkowski Sums, (d) Complex Interpolation. This yields a rich family of new examples where constant factor approximations are possible, which were beyond the reach of previous methods. We also show (under commonly used complexity assumptions) that for symmetric norms and unitarily invariant matrix norms the type-2 constant nearly characterizes the approximability of quadratic maximization.« less
  5. In many applications, data is easy to acquire but expensive and time-consuming to label, prominent examples include medical imaging and NLP. This disparity has only grown in recent years as our ability to collect data improves. Under these constraints, it makes sense to select only the most informative instances from the unlabeled pool and request an oracle (e.g., a human expert) to provide labels for those samples. The goal of active learning is to infer the informativeness of unlabeled samples so as to minimize the number of requests to the oracle. Here, we formulate active learning as an open-set recognition problem. In this paradigm, only some of the inputs belong to known classes; the classifier must identify the rest as unknown . More specifically, we leverage variational neural networks (VNNs), which produce high-confidence (i.e., low-entropy) predictions only for inputs that closely resemble the training data. We use the inverse of this confidence measure to select the samples that the oracle should label. Intuitively, unlabeled samples that the VNN is uncertain about contain features that the network has not been exposed to; thus they are more informative for future training. We carried out an extensive evaluation of our novel, probabilistic formulationmore »of active learning, achieving state-of-the-art results on MNIST, CIFAR-10, CIFAR-100, and FashionMNIST. Additionally, unlike current active learning methods, our algorithm can learn even in the presence of out-of-distribution outliers. As our experiments show, when the unlabeled pool consists of a mixture of samples from multiple datasets, our approach can automatically distinguish between samples from seen vs. unseen datasets. Overall, our results show that high-quality uncertainty measures are key for pool-based active learning.« less