skip to main content


Search for: All records

Award ID contains: 1804648

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Out-of-distribution (OOD) detection plays a crucial role in ensuring the safe deployment of deep neural network (DNN) classifiers. While a myriad of methods have focused on improving the performance of OOD detectors, a critical gap remains in interpreting their decisions. We help bridge this gap by providing explanations for OOD detectors based on learned high-level concepts. We first propose two new metrics for assessing the effectiveness of a particular set of concepts for explaining OOD detectors: 1) detection completeness, which quantifies the sufficiency of concepts for explaining an OOD-detector’s decisions, and 2) concept separability, which captures the distributional separation between in-distribution and OOD data in the concept space. Based on these metrics, we propose an unsupervised framework for learning a set of concepts that satisfy the desired properties of high detection completeness and concept separability, and demonstrate its effectiveness in providing concept-based explanations for diverse off-the-shelf OOD detectors. We also show how to identify prominent concepts contributing to the detection results, and provide further reasoning about their decisions. 
    more » « less
    Free, publicly-accessible full text available July 3, 2024
  2. Machine learning and logical reasoning have been the two foundational pillars of Artificial Intelligence (AI) since its inception, and yet, until recently the interactions between these two fields have been relatively limited. Despite their individual success and largely inde- pendent development, there are new problems on the horizon that seem solvable only via a combination of ideas from these two fields of AI. These problems can be broadly char- acterized as follows: how can learning be used to make logical reasoning and synthesis/ verification engines more efficient and powerful, and in the reverse direction, how can we use reasoning to improve the accuracy, generalizability, and trustworthiness of learning. In this perspective paper, we address the above-mentioned questions with an emphasis on certain paradigmatic trends at the intersection of learning and reasoning. Our intent here is not to be a comprehensive survey of all the ways in which learning and reasoning have been combined in the past. Rather we focus on certain recent paradigms where corrective feedback loops between learning and reasoning seem to play a particularly important role. Specifically, we observe the following three trends: first, the use of learning techniques (especially, reinforcement learning) in sequencing, selecting, and initializing proof rules in solvers/provers; second, combinations of inductive learning and deductive reasoning in the context of program synthesis and verification; and third, the use of solver layers in providing corrective feedback to machine learning models in order to help improve their accuracy, generalizability, and robustness with respect to partial specifications or domain knowledge. We believe that these paradigms are likely to have significant and dramatic impact on AI and its applications for a long time to come 
    more » « less
  3. Machine learning is being increasingly used by individuals, research institutions, and corporations. This has resulted in the surge of Machine Learning-as-a-Service (MLaaS) - cloud services that provide (a) tools and resources to learn the model, and (b) a user-friendly query interface to access the model. However, such MLaaS systems raise privacy concerns such as model extraction. In model extraction attacks, adversaries mali- ciously exploit the query interface to steal the model. More precisely, in a model extraction attack, a good approximation of a sensitive or propri- etary model held by the server is extracted (i.e. learned) by a dishonest user who interacts with the server only via the query interface. This attack was introduced by Tramèr et al. at the 2016 USENIX Security Symposium, where practical attacks for various models were shown. We believe that better understanding the efficacy of model extraction attacks is paramount to designing secure MLaaS systems. To that end, we take the first step by (a) formalizing model extraction and discussing possible defense strategies, and (b) drawing parallels between model extraction and established area of active learning. In particular, we show that re- cent advancements in the active learning domain can be used to imple- ment powerful model extraction attacks, and investigate possible defense strategies. 
    more » « less
  4. null (Ed.)
    Differential privacy protects an individual's privacy by perturbing data on an aggregated level (DP) or individual level (LDP). We report four online human-subject experiments investigating the effects of using different approaches to communicate differential privacy techniques to laypersons in a health app data collection setting. Experiments 1 and 2 investigated participants' data disclosure decisions for low-sensitive and high-sensitive personal information when given different DP or LDP descriptions. Experiments 3 and 4 uncovered reasons behind participants' data sharing decisions, and examined participants' subjective and objective comprehensions of these DP or LDP descriptions. When shown descriptions that explain the implications instead of the definition/processes of DP or LDP technique, participants demonstrated better comprehension and showed more willingness to share information with LDP than with DP, indicating their understanding of LDP's stronger privacy guarantee compared with DP. 
    more » « less
  5. null (Ed.)
  6. Over recent years, devising classification algorithms that are robust to adversarial perturbations hasemerged as a challenging problem. In particular, deep neural nets (DNNs) seem to be susceptible tosmall imperceptible changes over test instances. However, the line of work inprovablerobustness,so far, has been focused oninformation theoreticrobustness, ruling out even theexistenceof anyadversarial examples. In this work, we study whether there is a hope to benefit fromalgorithmicnature of an attacker that searches for adversarial examples, and ask whether there isanylearning taskfor which it is possible to design classifiers that are only robust againstpolynomial-timeadversaries.Indeed, numerous cryptographic tasks (e.g. encryption of long messages) can only be secure againstcomputationally bounded adversaries, and are indeedimpossiblefor computationally unboundedattackers. Thus, it is natural to ask if the same strategy could help robust learning.We show that computational limitation of attackers can indeed be useful in robust learning bydemonstrating the possibility of a classifier for some learning task for which computational andinformation theoretic adversaries of bounded perturbations have very different power. Namely, whilecomputationally unbounded adversaries can attack successfully and find adversarial examples withsmall perturbation, polynomial time adversaries are unable to do so unless they can break standardcryptographic hardness assumptions. Our results, therefore, indicate that perhaps a similar approachto cryptography (relying on computational hardness) holds promise for achieving computationallyrobust machine learning. On the reverse directions, we also show that the existence of such learningtask in which computational robustness beats information theoretic robustness requires computationalhardness by implying (average-case) hardness o fNP. 
    more » « less
  7. An emerging problem in trustworthy machine learning is to train models that pro- duce robust interpretations for their predictions. We take a step towards solving this problem through the lens of axiomatic attribution of neural networks. Our theory is grounded in the recent work, Integrated Gradients (IG) [STY17], in axiomatically attributing a neural network’s output change to its input change. We propose training objectives in classic robust optimization models to achieve robust IG attributions. Our objectives give principled generalizations of previous objectives designed for robust predictions, and they naturally degenerate to classic soft-margin training for one-layer neural networks. We also generalize previous theory and prove that the objectives for different robust optimization models are closely related. Experiments demonstrate the effectiveness of our method, and also point to intriguing problems which hint at the need for better optimization techniques or better neural network architectures for robust attribution training. 
    more » « less
  8. Wide adoption of artificial neural networks in var- ious domains has led to an increasing interest in defending adversarial attacks against them. Preprocessing defense methods such as pixel discretization are particularly attractive in practice due to their simplicity, low computational overhead, and appli- cability to various systems. It is observed that such methods work well on simple datasets like MNIST, but break on more complicated ones like ImageNet under recently proposed strong white-box attacks. To understand the conditions for success and potentials for improvement, we study the pixel discretization defense method, including more sophisticated variants that take into account the properties of the dataset being discretized. Our results again show poor resistance against the strong attacks. We analyze our results in a theoretical framework and offer strong evidence that pixel discretization is unlikely to work on all but the simplest of the datasets. Furthermore, our arguments present insights why some other preprocessing defenses may be insecure. 
    more » « less
  9. Fueled by massive amounts of data, models produced bymachine-learning (ML) algorithms, especially deep neural networks, arebeing used in diverse domains where trustworthiness is a concern, includ-ing automotive systems, finance, health care, natural language process-ing, and malware detection. Of particular concern is the use of ML algo-rithms in cyber-physical systems (CPS), such as self-driving cars andaviation, where an adversary can cause serious consequences.However, existing approaches to generating adversarial examples anddevising robust ML algorithms mostly ignore thesemanticsandcon-textof the overall system containing the ML component. For example,in an autonomous vehicle using deep learning for perception, not everyadversarial example for the neural network might lead to a harmful con-sequence. Moreover, one may want to prioritize the search for adversarialexamples towards those that significantly modify the desired semanticsof the overall system. Along the same lines, existing algorithms for con-structing robust ML algorithms ignore the specification of the overallsystem. In this paper, we argue that the semantics and specification ofthe overall system has a crucial role to play in this line of research. Wepresent preliminary research results that support this claim. 
    more » « less