skip to main content

Title: Bottleneck Problems: An Information and Estimation-Theoretic View
Information bottleneck (IB) and privacy funnel (PF) are two closely related optimization problems which have found applications in machine learning, design of privacy algorithms, capacity problems (e.g., Mrs. Gerber’s Lemma), and strong data processing inequalities, among others. In this work, we first investigate the functional properties of IB and PF through a unified theoretical framework. We then connect them to three information-theoretic coding problems, namely hypothesis testing against independence, noisy source coding, and dependence dilution. Leveraging these connections, we prove a new cardinality bound on the auxiliary variable in IB, making its computation more tractable for discrete random variables. In the second part, we introduce a general family of optimization problems, termed “bottleneck problems”, by replacing mutual information in IB and PF with other notions of mutual information, namely f-information and Arimoto’s mutual information. We then argue that, unlike IB and PF, these problems lead to easily interpretable guarantees in a variety of inference tasks with statistical constraints on accuracy and privacy. While the underlying optimization problems are non-convex, we develop a technique to evaluate bottleneck problems in closed form by equivalently expressing them in terms of lower convex or upper concave envelope of certain functions. By applying this technique more » to a binary case, we derive closed form expressions for several bottleneck problems. « less
Award ID(s):
1845852 1900750
Publication Date:
Journal Name:
Sponsoring Org:
National Science Foundation
More Like this
  1. We generalize the information bottleneck (IB) and privacy funnel (PF) problems by introducing the notion of a sensitive attribute, which arises in a growing number of applications. In this generalization, we seek to construct representations of observations that are maximally (or minimally) informative about a target variable, while also satisfying constraints with respect to a variable corresponding to the sensitive attribute. In the Gaussian and discrete settings, we show that by suitably approximating the Kullback-Liebler (KL) divergence defining traditional Shannon mutual information, the generalized IB and PF problems can be formulated as semi-definite programs (SDPs), and thus efficiently solved, which is important in applications of high-dimensional inference. We validate our algorithms on synthetic data and demonstrate their use in imposing fairness in machine learning on real data as an illustrative application.
  2. Information bottleneck (IB) is a technique for extracting information in one random variable X that is relevant for predicting another random variable Y. IB works by encoding X in a compressed “bottleneck” random variable M from which Y can be accurately decoded. However, finding the optimal bottleneck variable involves a difficult optimization problem, which until recently has been considered for only two limited cases: discrete X and Y with small state spaces, and continuous X and Y with a Gaussian joint distribution (in which case optimal encoding and decoding maps are linear). We propose a method for performing IB on arbitrarily-distributed discrete and/or continuous X and Y, while allowing for nonlinear encoding and decoding maps. Our approach relies on a novel non-parametric upper bound for mutual information. We describe how to implement our method using neural networks. We then show that it achieves better performance than the recently-proposed “variational IB” method on several real-world datasets.
  3. Abstract

    The renormalization group (RG) is a class of theoretical techniques used to explain the collective physics of interacting, many-body systems. It has been suggested that the RG formalism may be useful in finding and interpreting emergent low-dimensional structure in complex systems outside of the traditional physics context, such as in biology or computer science. In such contexts, one common dimensionality-reduction framework already in use is information bottleneck (IB), in which the goal is to compress an ‘input’ signalXwhile maximizing its mutual information with some stochastic ‘relevance’ variableY. IB has been applied in the vertebrate and invertebrate processing systems to characterize optimal encoding of the future motion of the external world. Other recent work has shown that the RG scheme for the dimer model could be ‘discovered’ by a neural network attempting to solve an IB-like problem. This manuscript explores whether IB and any existing formulation of RG are formally equivalent. A class of soft-cutoff non-perturbative RG techniques are defined by families of non-deterministic coarsening maps, and hence can be formally mapped onto IB, and vice versa. For concreteness, this discussion is limited entirely to Gaussian statistics (GIB), for which IB has exact, closed-form solutions. Under this constraint, GIB hasmore »a semigroup structure, in which successive transformations remain IB-optimal. Further, the RG cutoff scheme associated with GIB can be identified. Our results suggest that IB can be used toimposea notion of ‘large scale’ structure, such as biological function, on an RG procedure.

    « less
  4. A key aspect of the neural coding problem is understanding how representations of afferent stimuli are built through the dynamics of learning and adaptation within neural networks. The infomax paradigm is built on the premise that such learning attempts to maximize the mutual information between input stimuli and neural activities. In this letter, we tackle the problem of such information-based neural coding with an eye toward two conceptual hurdles. Specifically, we examine and then show how this form of coding can be achieved with online input processing. Our framework thus obviates the biological incompatibility of optimization methods that rely on global network awareness and batch processing of sensory signals. Central to our result is the use of variational bounds as a surrogate objective function, an established technique that has not previously been shown to yield online policies. We obtain learning dynamics for both linear-continuous and discrete spiking neural encoding models under the umbrella of linear gaussian decoders. This result is enabled by approximating certain information quantities in terms of neuronal activity via pairwise feedback mechanisms. Furthermore, we tackle the problem of how such learning dynamics can be realized with strict energetic constraints. We show that endowing networks with auxiliary variablesmore »that evolve on a slower timescale can allow for the realization of saddle-point optimization within the neural dynamics, leading to neural codes with favorable properties in terms of both information and energy.« less
  5. The information bottleneck (IB) approach to clustering takes a joint distribution [Formula: see text] and maps the data [Formula: see text] to cluster labels [Formula: see text], which retain maximal information about [Formula: see text] (Tishby, Pereira, & Bialek, 1999 ). This objective results in an algorithm that clusters data points based on the similarity of their conditional distributions [Formula: see text]. This is in contrast to classic geometric clustering algorithms such as [Formula: see text]-means and gaussian mixture models (GMMs), which take a set of observed data points [Formula: see text] and cluster them based on their geometric (typically Euclidean) distance from one another. Here, we show how to use the deterministic information bottleneck (DIB) (Strouse & Schwab, 2017 ), a variant of IB, to perform geometric clustering by choosing cluster labels that preserve information about data point location on a smoothed data set. We also introduce a novel intuitive method to choose the number of clusters via kinks in the information curve. We apply this approach to a variety of simple clustering problems, showing that DIB with our model selection procedure recovers the generative cluster labels. We also show that, in particular limits of our model parameters, clusteringmore »with DIB and IB is equivalent to [Formula: see text]-means and EM fitting of a GMM with hard and soft assignments, respectively. Thus, clustering with (D)IB generalizes and provides an information-theoretic perspective on these classic algorithms.« less