skip to main content


Title: Robust Collective Classification against Structural Attacks
Collective learning methods exploit relations among data points to enhance classification performance. However, such relations, represented as edges in the underlying graphical model, expose an extra attack surface to the adversaries. We study adversarial robustness of an important class of such graphical models, Associative Markov Networks (AMN), to structural attacks, where an attacker can modify the graph structure at test time. We formulate the task of learning a robust AMN classifier as a bi-level program, where the inner problem is a challenging non-linear integer program that computes optimal structural changes to the AMN. To address this technical challenge, we first relax the attacker problem, and then use duality to obtain a convex quadratic upper bound for the robust AMN problem. We then prove a bound on the quality of the resulting approximately optimal solutions, and experimentally demonstrate the efficacy of our approach. Finally, we apply our approach in a transductive learning setting, and show that robust AMN is much more robust than state-of-the-art deep learning methods, while sacrificing little in accuracy on nonadversarial data.  more » « less
Award ID(s):
1903207 1905558
NSF-PAR ID:
10173998
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Conference on Uncertainty in Artificial Intelligence (UAI)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Detection of malicious behavior is a fundamental problem in security. One of the major challenges in using detection systems in practice is in dealing with an overwhelming number of alerts that are triggered by normal behavior (the so-called false positives), obscuring alerts resulting from actual malicious activity. While numerous methods for reducing the scope of this issue have been proposed, ultimately one must still decide how to prioritize which alerts to investigate, and most existing prioritization methods are heuristic, for example, based on suspiciousness or priority scores. We introduce a novel approach for computing a policy for prioritizing alerts using adversarial reinforcement learning. Our approach assumes that the attackers know the full state of the detection system and dynamically choose an optimal attack as a function of this state, as well as of the alert prioritization policy. The first step of our approach is to capture the interaction between the defender and attacker in a game theoretic model. To tackle the computational complexity of solving this game to obtain a dynamic stochastic alert prioritization policy, we propose an adversarial reinforcement learning framework. In this framework, we use neural reinforcement learning to compute best response policies for both the defender and the adversary to an arbitrary stochastic policy of the other. We then use these in a double-oracle framework to obtain an approximate equilibrium of the game, which in turn yields a robust stochastic policy for the defender. Extensive experiments using case studies in fraud and intrusion detection demonstrate that our approach is effective in creating robust alert prioritization policies. 
    more » « less
  2. null (Ed.)
    A quickest change detection problem is considered in a sensor network with observations whose statistical dependency structure across the sensors before and after the change is described by a decomposable graphical model (DGM). Distributed computation methods for this problem are proposed that are capable of producing the optimum centralized test statistic. The DGM leads to the proper way to collect nodes into local groups equivalent to cliques in the graph, such that a clique statistic which summarizes all the clique sensor data can be computed within each clique. The clique statistics are transmitted to a decision maker to produce the optimum centralized test statistic. In order to further improve communication efficiency, an ordered transmission approach is proposed where transmissions of the clique statistics to the fusion center are ordered and then adaptively halted when sufficient information is accumulated. This procedure is always guaranteed to provide the optimal change detection performance, despite not transmitting all the statistics from all the cliques. A lower bound on the average number of transmissions saved by ordered transmissions is provided and for the case where the change seldom occurs the lower bound approaches approximately half the number of cliques provided a well behaved distance measure between the distributions of the sensor observations before and after the change is sufficiently large. We also extend the approach to the case when the graph structure is different under each hypothesis. Numerical results show significant savings using the ordered transmission approach and validate the theoretical findings. 
    more » « less
  3. Summary

    Structural learning of Gaussian graphical models in the presence of latent variables has long been a challenging problem. Chandrasekaran et al. (2012) proposed a convex program for estimating a sparse graph plus a low-rank term that adjusts for latent variables; however, this approach poses challenges from both computational and statistical perspectives. We propose an alternative, simple solution: apply a hard-thresholding operator to existing graph selection methods. Conceptually simple and computationally attractive, the approach of thresholding the graphical lasso is shown to be graph selection consistent in the presence of latent variables under a simpler minimum edge strength condition and at an improved statistical rate. The results are extended to estimators for thresholded neighbourhood selection and constrained $\ell_{1}$-minimization for inverse matrix estimation as well. We show that our simple thresholded graph estimators yield stronger empirical results than existing methods for the latent variable graphical model problem, and we apply them to a neuroscience case study on estimating functional neural connections.

     
    more » « less
  4. Motivated by practical concerns in applying information design to markets and service systems, we consider a persuasion problem between a sender and a receiver where the receiver may not be an expected utility maximizer. In particular, the receiver’s utility may be non-linear in her belief; we deem such receivers as risk-conscious. Such utility models arise, for example, when the receiver exhibits sensitivity to the variability and the risk in the payoff on choosing an action (e.g., waiting time for a service). In the presence of such non-linearity, the standard approach of using revelation-principle style arguments fails to characterize the set of signals needed in the optimal signaling scheme. Our main contribution is to provide a theoretical framework, using results from convex analysis, to overcome this technical challenge. In particular, in general persuasion settings with risk-conscious agents, we prove that the sender’s problem can be reduced to a convex optimization program. Furthermore, using this characterization, we obtain a bound on the number of signals needed in the optimal signaling scheme. We apply our methods to study a specific setting, namely binary per-suasion, where the receiver has two possible actions (0 and 1), and the sender always prefers the receiver taking action 1. Under a mild convexity assumption on the receiver’s utility and using a geometric approach,we show that the convex program can be further reduced to a linear program. Furthermore, this linear program yields a canonical construction of the set of signals needed in an optimal signaling mechanism. In particular, this canonical set of signals only involves signals that fully reveal the state and signals that induce uncertainty between two states.We illustrate our results in the setting of signaling wait time information in an unobservable queue with customers whose utilities depend on the variance of their waiting times. 
    more » « less
  5. Abstract

    Optimal transport (OT) methods seek a transformation map (or plan) between two probability measures, such that the transformation has the minimum transportation cost. Such a minimum transport cost, with a certain power transform, is called the Wasserstein distance. Recently, OT methods have drawn great attention in statistics, machine learning, and computer science, especially in deep generative neural networks. Despite its broad applications, the estimation of high‐dimensional Wasserstein distances is a well‐known challenging problem owing to the curse‐of‐dimensionality. There are some cutting‐edge projection‐based techniques that tackle high‐dimensional OT problems. Three major approaches of such techniques are introduced, respectively, the slicing approach, the iterative projection approach, and the projection robust OT approach. Open challenges are discussed at the end of the review.

    This article is categorized under:

    Statistical and Graphical Methods of Data Analysis > Dimension Reduction

    Statistical Learning and Exploratory Methods of the Data Sciences > Manifold Learning

     
    more » « less