skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: C-MI-GAN : Estimation of Conditional Mutual Information Using MinMax Formulation
Estimation of information theoretic quantities such as mutual information and its conditional variant has drawn interest in recent times owing to their multifaceted applications. Newly proposed neural estimators for these quantities have overcome severe drawbacks of classical kNN-based estimators in high dimensions. In this work, we focus on conditional mutual information (CMI) estimation by utilizing its formulation as a minmax optimization problem. Such a formulation leads to a joint training procedure similar to that of generative adversarial networks. We find that our proposed estimator provides better estimates than the existing approaches on a variety of simulated datasets comprising linear and non-linear relations between variables. As an application of CMI estimation, we deploy our estimator for conditional independence (CI) testing on real data and obtain better results than state-of-the-art CI testers  more » « less
Award ID(s):
1908003 1703403 1651236
PAR ID:
10175768
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR volume 124, 2020
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Conditional Mutual Information (CMI) is a measure of conditional dependence between random variables X and Y, given another random variable Z. It can be used to quantify conditional dependence among variables in many data-driven inference problems such as graphical models, causal learning, feature selection and time-series analysis. While k-nearest neighbor (kNN) based estimators as well as kernel-based methods have been widely used for CMI estimation, they suffer severely from the curse of dimensionality. In this paper, we leverage advances in classifiers and generative models to design methods for CMI estimation. Specifically, we introduce an estimator for KL-Divergence based on the likelihood ratio by training a classifier to distinguish the observed joint distribution from the product distribution. We then show how to construct several CMI estimators using this basic divergence estimator by drawing ideas from conditional generative models. We demonstrate that the estimates from our proposed approaches do not degrade in performance with increasing dimension and obtain significant improvement over the widely used KSG estimator. Finally, as an application of accurate CMI estimation, we use our best estimator for conditional independence testing and achieve superior performance than the state-of-the-art tester on both simulated and real data-sets. 
    more » « less
  2. The conditional mutual information I(X; Y|Z) measures the average information that X and Y contain about each other given Z. This is an important primitive in many learning problems including conditional independence testing, graphical model inference, causal strength estimation and time-series problems. In several applications, it is desirable to have a functional purely of the conditional distribution py|x, z rather than of the joint distribution pX, Y, Z. We define the potential conditional mutual information as the conditional mutual information calculated with a modified joint distribution pY|X, ZqX, Z, where qX, Z is a potential distribution, fixed airport. We develop K nearest neighbor based estimators for this functional, employing importance sampling, and a coupling trick, and prove the finite k consistency of such an estimator. We demonstrate that the estimator has excellent practical performance and show an application in dynamical system inference. 
    more » « less
  3. Estimation of mutual information from observed samples is a basic primitive in machine learning, useful in several learning tasks including correlation mining, information bottleneck, Chow-Liu tree, and conditional independence testing in (causal) graphical models. While mutual information is a quantity well-defined for general probability spaces, estimators have been developed only in the special case of discrete or continuous pairs of random variables. Most of these estimators operate using the 3H -principle, i.e., by calculating the three (differential) entropies of X, Y and the pair (X,Y). However, in general mixture spaces, such individual entropies are not well defined, even though mutual information is. In this paper, we develop a novel estimator for estimating mutual information in discrete-continuous mixtures. We prove the consistency of this estimator theoretically as well as demonstrate its excellent empirical performance. This problem is relevant in a wide-array of applications, where some variables are discrete, some continuous, and others are a mixture between continuous and discrete components. 
    more » « less
  4. We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI). Contrary to other CMI bounds, which are black-box bounds that do not exploit the structure of the problem and may be hard to evaluate in practice, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation, stability of the optimization algorithm, and the geometry of the loss-landscape. It applies both to the output of training algorithms as well as their predictions. We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning. In particular, our bounds are non-vacuous on large-scale image-classification tasks. 
    more » « less
  5. Abstract Standard estimators of the global average treatment effect can be biased in the presence of interference. This paper proposes regression adjustment estimators for removing bias due to interference in Bernoulli randomized experiments. We use a fitted model to predict the counterfactual outcomes of global control and global treatment. Our work differs from standard regression adjustments in that the adjustment variables are constructed from functions of the treatment assignment vector, and that we allow the researcher to use a collection of any functions correlated with the response, turning the problem of detecting interference into a feature engineering problem. We characterize the distribution of the proposed estimator in a linear model setting and connect the results to the standard theory of regression adjustments under SUTVA. We then propose an estimator that allows for flexible machine learning estimators to be used for fitting a nonlinear interference functional form. We propose conducting statistical inference via bootstrap and resampling methods, which allow us to sidestep the complicated dependences implied by interference and instead rely on empirical covariance structures. Such variance estimation relies on an exogeneity assumption akin to the standard unconfoundedness assumption invoked in observational studies. In simulation experiments, our methods are better at debiasing estimates than existing inverse propensity weighted estimators based on neighborhood exposure modeling. We use our method to reanalyze an experiment concerning weather insurance adoption conducted on a collection of villages in rural China. 
    more » « less