Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR volume 124, 2020
National Science Foundation
Conditional Mutual Information (CMI) is a measure of conditional dependence between random variables X and Y, given another random variable Z. It can be used to quantify conditional dependence among variables in many data-driven inference problems such as graphical models, causal learning, feature selection and time-series analysis. While k-nearest neighbor (kNN) based estimators as well as kernel-based methods have been widely used for CMI estimation, they suffer severely from the curse of dimensionality. In this paper, we leverage advances in classifiers and generative models to design methods for CMI estimation. Specifically, we introduce an estimator for KL-Divergence based on the likelihood ratio by training a classifier to distinguish the observed joint distribution from the product distribution. We then show how to construct several CMI estimators using this basic divergence estimator by drawing ideas from conditional generative models. We demonstrate that the estimates from our proposed approaches do not degrade in performance with increasing dimension and obtain significant improvement over the widely used KSG estimator. Finally, as an application of accurate CMI estimation, we use our best estimator for conditional independence testing and achieve superior performance than the state-of-the-art tester on both simulated and real data-sets.more » « less
The conditional mutual information I(X; Y|Z) measures the average information that X and Y contain about each other given Z. This is an important primitive in many learning problems including conditional independence testing, graphical model inference, causal strength estimation and time-series problems. In several applications, it is desirable to have a functional purely of the conditional distribution py|x, z rather than of the joint distribution pX, Y, Z. We define the potential conditional mutual information as the conditional mutual information calculated with a modified joint distribution pY|X, ZqX, Z, where qX, Z is a potential distribution, fixed airport. We develop K nearest neighbor based estimators for this functional, employing importance sampling, and a coupling trick, and prove the finite k consistency of such an estimator. We demonstrate that the estimator has excellent practical performance and show an application in dynamical system inference.more » « less
Estimation of mutual information from observed samples is a basic primitive in machine learning, useful in several learning tasks including correlation mining, information bottleneck, Chow-Liu tree, and conditional independence testing in (causal) graphical models. While mutual information is a quantity well-defined for general probability spaces, estimators have been developed only in the special case of discrete or continuous pairs of random variables. Most of these estimators operate using the 3H -principle, i.e., by calculating the three (differential) entropies of X, Y and the pair (X,Y). However, in general mixture spaces, such individual entropies are not well defined, even though mutual information is. In this paper, we develop a novel estimator for estimating mutual information in discrete-continuous mixtures. We prove the consistency of this estimator theoretically as well as demonstrate its excellent empirical performance. This problem is relevant in a wide-array of applications, where some variables are discrete, some continuous, and others are a mixture between continuous and discrete components.more » « less
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI). Contrary to other CMI bounds, which are black-box bounds that do not exploit the structure of the problem and may be hard to evaluate in practice, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation, stability of the optimization algorithm, and the geometry of the loss-landscape. It applies both to the output of training algorithms as well as their predictions. We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning. In particular, our bounds are non-vacuous on large-scale image-classification tasks.more » « less
Abstract Functional connectivity analyses focused on frequency-domain relationships, i.e. frequency coupling, powerfully reveal neurophysiology. Coherence is commonly used but neural activity does not follow its Gaussian assumption. The recently introduced mutual information in frequency (MIF) technique makes no model assumptions and measures non-Gaussian and nonlinear relationships. We develop a powerful MIF estimator optimized for correlating frequency coupling with task performance and other relevant task phenomena. In light of variance reduction afforded by multitaper spectral estimation, which is critical to precisely measuring such correlations, we propose a multitaper approach for MIF and compare its performance with coherence in simulations. Additionally, multitaper MIF and coherence are computed between macaque visual cortical recordings and their correlation with task performance is analyzed. Our multitaper MIF estimator produces low variance and performs better than all other estimators in simulated correlation analyses. Simulations further suggest that multitaper MIF captures more information than coherence. For the macaque data set, coherence and our new MIF estimator largely agree. Overall, we provide a new way to precisely estimate frequency coupling that sheds light on task performance and helps neuroscientists accurately capture correlations between coupling and task phenomena in general. Additionally, we make an MIF toolbox available for the first time.