Representation learning of graph-structured data is challenging because both graph structure and node features carry important information. Graph Neural Networks (GNNs) provide an expressive way to fuse information from network structure and node features. However, GNNs are prone to adversarial attacks. Here we introduce Graph Information Bottleneck (GIB), an information-theoretic principle that optimally balances expressiveness and robustness of the learned representation of graph-structured data. Inheriting from the general Information Bottleneck (IB), GIB aims to learn the minimal sufficient representation for a given task by maximizing the mutual information between the representation and the target, and simultaneously constraining the mutual information between the representation and the input data. Different from the general IB, GIB regularizes the structural as well as the feature information. We design two sampling algorithms for structural regularization and instantiate the GIB principle with two new models: GIB-Cat and GIB-Bern, and demonstrate the benefits by evaluating the resilience to adversarial attacks. We show that our proposed models are more robust than state-of-the art graph defense models. GIB-based models empirically achieve up to 31% improvement with adversarial perturbation of the graph structure as well as node features.
more »
« less
Gaussian information bottleneck and the non-perturbative renormalization group
Abstract The renormalization group (RG) is a class of theoretical techniques used to explain the collective physics of interacting, many-body systems. It has been suggested that the RG formalism may be useful in finding and interpreting emergent low-dimensional structure in complex systems outside of the traditional physics context, such as in biology or computer science. In such contexts, one common dimensionality-reduction framework already in use is information bottleneck (IB), in which the goal is to compress an ‘input’ signalXwhile maximizing its mutual information with some stochastic ‘relevance’ variableY. IB has been applied in the vertebrate and invertebrate processing systems to characterize optimal encoding of the future motion of the external world. Other recent work has shown that the RG scheme for the dimer model could be ‘discovered’ by a neural network attempting to solve an IB-like problem. This manuscript explores whether IB and any existing formulation of RG are formally equivalent. A class of soft-cutoff non-perturbative RG techniques are defined by families of non-deterministic coarsening maps, and hence can be formally mapped onto IB, and vice versa. For concreteness, this discussion is limited entirely to Gaussian statistics (GIB), for which IB has exact, closed-form solutions. Under this constraint, GIB has a semigroup structure, in which successive transformations remain IB-optimal. Further, the RG cutoff scheme associated with GIB can be identified. Our results suggest that IB can be used toimposea notion of ‘large scale’ structure, such as biological function, on an RG procedure.
more »
« less
- Award ID(s):
- 1652617
- PAR ID:
- 10363794
- Publisher / Repository:
- IOP Publishing
- Date Published:
- Journal Name:
- New Journal of Physics
- Volume:
- 24
- Issue:
- 3
- ISSN:
- 1367-2630
- Page Range / eLocation ID:
- Article No. 033007
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Information bottleneck (IB) is a technique for extracting information in one random variable X that is relevant for predicting another random variable Y. IB works by encoding X in a compressed “bottleneck” random variable M from which Y can be accurately decoded. However, finding the optimal bottleneck variable involves a difficult optimization problem, which until recently has been considered for only two limited cases: discrete X and Y with small state spaces, and continuous X and Y with a Gaussian joint distribution (in which case optimal encoding and decoding maps are linear). We propose a method for performing IB on arbitrarily-distributed discrete and/or continuous X and Y, while allowing for nonlinear encoding and decoding maps. Our approach relies on a novel non-parametric upper bound for mutual information. We describe how to implement our method using neural networks. We then show that it achieves better performance than the recently-proposed “variational IB” method on several real-world datasets.more » « less
-
The new 5G communications standard increases data rates and supports low-latency communication that places constraints on the computational complexity of channel decoders. 5G low-density parity-check (LDPC) codes have the so-called protograph-based raptor-like (PBRL) structure which offers inherent rate-compatibility and excellent performance. Practical LDPC decoder implementations use message-passing decoding with finite precision, which becomes coarse as complexity is more severely constrained. Performance degrades as the precision becomes more coarse. Recently, the information bottleneck (IB) method was used to design mutual-information-maximizing lookup tables that replace conventional finite-precision node computations. The IB approach exchanges messages represented by integers with very small bit width. This paper extends the IB principle to the flexible class of PBRL LDPC codes as standardized in 5G. The extensions include puncturing and rate-compatible IB decoder design. As an example of the new approach, a 4-bit information bottleneck decoder is evaluated for PBRL LDPC codes over a typical range of rates. Frame error rate simulations show that the proposed scheme outperforms offset min-sum decoding algorithms and operates very close to double-precision sum-product belief propagation decoding.more » « less
-
The notion of timely status updating is investigated in the context of cloud computing. Measurements of a time-varying process of interest are acquired by a sensor node, and uploaded to a cloud server to undergo some required computations. These computations have random service times that are independent and identically distributed across different uploads. After the computations are done, the results are delivered to a monitor, constituting an update. The goal is to keep the monitor continuously fed with fresh updates over time, which is assessed by an age-of-information(AoI) metric. A scheduler is employed to optimize the measurement acquisition times. Following an update, an idle waiting period may be imposed by the scheduler before acquiring a new measurement. The scheduler also has the capability to preempt a measurement in progress if its service time grows above a certain cutoff time, and upload a fresher measurement in its place. Focusing on stationary deterministic policies, in which waiting times are deterministic functions of the instantaneous AoI and the cutoff time is fixed for all uploads, it is shown that the optimal waiting policy that minimizes the long term average AoI has a threshold structure, in which a new measurement is uploaded following an update only if the AoI grows above a certain threshold that is a function of the service time distribution and the cutoff time. The optimal cutoff is then found for standard and shifted exponential service times. While it has been previously reported that waiting before updating can be beneficial for AoI, it is shown in this work that preemption of late updates can be even more beneficial.more » « less
-
Graph Neural Networks (GNNs) have received increasing attention due to their ability to learn from graph-structured data. To open the black-box of these deep learning models, post-hoc instance-level explanation methods have been proposed to understand GNN predictions. These methods seek to discover substructures that explain the prediction behavior of a trained GNN. In this paper, we show analytically that for a large class of explanation tasks, conventional approaches, which are based on the principle of graph information bottleneck (GIB), admit trivial solutions that do not align with the notion of explainability. Instead, we argue that a modified GIB principle may be used to avoid the aforementioned trivial solutions. We further introduce a novel factorized explanation model with theoretical performance guarantees. The modified GIB is used to analyze the structural properties of the proposed factorized explainer. We conduct extensive experiments on both synthetic and real-world datasets to validate the effectiveness of our proposed factorized explainer.more » « less