Data Efficiency, Dimensionality Reduction, and the Generalized Symmetric Information Bottleneck

Martini, K Michael; Nemenman, Ilya

doi:10.1162/neco_a_01667

Citation Details

Data Efficiency, Dimensionality Reduction, and the Generalized Symmetric Information Bottleneck

The symmetric information bottleneck (SIB), an extension of the more familiar information bottleneck, is a dimensionality-reduction technique that simultaneously compresses two random variables to preserve information between their compressed versions. We introduce the generalized symmetric information bottleneck (GSIB), which explores different functional forms of the cost of such simultaneous reduction. We then explore the data set size requirements of such simultaneous compression. We do this by deriving bounds and root-mean-squared estimates of statistical fluctuations of the involved loss functions. We show that in typical situations, the simultaneous GSIB compression requires qualitatively less data to achieve the same errors compared to compressing variables one at a time. We suggest that this is an example of a more general principle that simultaneous compression is more data efficient than independent compression of each of the input variables. more »

Award ID(s):: 2010524

PAR ID:: 10545120

Author(s) / Creator(s):: Martini, K Michael; Nemenman, Ilya

Publisher / Repository:: Arxiv

Date Published:: 2024-06-07

Journal Name:: Neural Computation

Volume:: 36

Issue:: 7

ISSN:: 0899-7667

Page Range / eLocation ID:: 1353 to 1379

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1162/neco_a_01667

More Like this