skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A mutual information criterion with applications to canonical correlation analysis and graphical models.
This paper derives a criterion for deciding conditional independence that is consistent with small-sample corrections of Akaike's information criterion but is easier to apply to such problems as selecting variables in canonical correlation analysis and selecting graphical models. The criterion reduces to mutual information when the assumed distribution equals the true distribution; hence, it is called mutual information criterion (MIC). Although small-sample Kullback–Leibler criteria for these selection problems have been proposed previously, some of which are not widely known, MIC is strikingly more direct to derive and apply.  more » « less
Award ID(s):
1822221
PAR ID:
10348325
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Stat
Volume:
10
Issue:
1
ISSN:
2049-1573
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The conditional mutual information I(X; Y|Z) measures the average information that X and Y contain about each other given Z. This is an important primitive in many learning problems including conditional independence testing, graphical model inference, causal strength estimation and time-series problems. In several applications, it is desirable to have a functional purely of the conditional distribution py|x, z rather than of the joint distribution pX, Y, Z. We define the potential conditional mutual information as the conditional mutual information calculated with a modified joint distribution pY|X, ZqX, Z, where qX, Z is a potential distribution, fixed airport. We develop K nearest neighbor based estimators for this functional, employing importance sampling, and a coupling trick, and prove the finite k consistency of such an estimator. We demonstrate that the estimator has excellent practical performance and show an application in dynamical system inference. 
    more » « less
  2. —In this work, we address the lossy quantum-classical (QC) source coding problem, where the task is to compress the classical information about a quantum source, obtained after performing a measurement, below the Shannon entropy of the measurement outcomes, while incurring a bounded reconstruction error. We propose a new formulation, namely, "rate-channel theory", for the lossy QC source coding problem based on the notion of a backward (posterior) channel. We employ a singleletter posterior channel to capture the reconstruction error in place of the single-letter distortion observable. The formulation requires the reconstruction of the compressed quantum source to satisfy a block error constraint as opposed to the average singleletter distortion criterion in the rate-distortion setting. We also develop an analogous formulation for the classical variant with respect to a corresponding posterior channel. Furthermore, we characterize the asymptotic performance limit of the lossy QC and classical source coding problems in terms of single-letter quantum mutual information and mutual information quantities of the given posterior channel, respectively. We provide examples for the above formulations. 
    more » « less
  3. This paper studies M-estimators with gradient-Lipschitz loss function regularized with convex penalty in linear models with Gaussian design matrix and arbitrary noise distribution. A practical example is the robust M-estimator constructed with the Huber loss and the Elastic-Net penalty and the noise distribution has heavy-tails. Our main contributions are three-fold. (i) We provide general formulae for the derivatives of regularized M-estimators $$\hat\beta(y,X)$$ where differentiation is taken with respect to both X and y; this reveals a simple differentiability structure shared by all convex regularized M-estimators. (ii) Using these derivatives, we characterize the distribution of the residuals in the intermediate high-dimensional regime where dimension and sample size are of the same order. (iii) Motivated by the distribution of the residuals, we propose a novel adaptive criterion to select tuning parameters of regularized M-estimators. The criterion approximates the out-of-sample error up to an additive constant independent of the estimator, so that minimizing the criterion provides a proxy for minimizing the out-of-sample error. The proposed adaptive criterion does not require the knowledge of the noise distribution or of the covariance of the design. Simulated data confirms the theoretical findings, regarding both the distribution of the residuals and the success of the criterion as a proxy of the out-of-sample error. Finally our results reveal new relationships between the derivatives of the $$\hat\beta$$ and the effective degrees of freedom of the M-estimators, which are of independent interest. 
    more » « less
  4. Pre-trained language models derive substantial linguistic and factual knowledge from the massive corpora on which they are trained, and prompt engineering seeks to align these models to specific tasks. Unfortunately, existing prompt engineering methods require significant amounts of labeled data, access to model parameters, or both. We introduce a new method for selecting prompt templates without labeled examples and without direct access to the model. Specifically, over a set of candidate templates, we choose the template that maximizes the mutual information between the input and the corresponding model output. Across 8 datasets representing 7 distinct NLP tasks, we show that when a template has high mutual information, it also has high accuracy on the task. On the largest model, selecting prompts with our method gets 90% of the way from the average prompt accuracy to the best prompt accuracy and requires no ground truth labels. 
    more » « less
  5. null (Ed.)
    Inferring appropriate information from large datasets has become important. In particular, identifying relationships among variables in these datasets has far-reaching impacts. In this paper, we introduce the uniform information coefficient (UIC), which measures the amount of dependence between two multidimensional variables and is able to detect both linear and non-linear associations. Our proposed UIC is inspired by the maximal information coefficient (MIC) \cite{MIC:2011}; however, the MIC was originally designed to measure dependence between two one-dimensional variables. Unlike the MIC calculation that depends on the type of association between two variables, we show that the UIC calculation is less computationally expensive and more robust to the type of association between two variables. The UIC achieves this by replacing the dynamic programming step in the MIC calculation with a simpler technique based on the uniform partitioning of the data grid. This computational efficiency comes at the cost of not maximizing the information coefficient as done by the MIC algorithm. We present theoretical guarantees for the performance of the UIC and a variety of experiments to demonstrate its quality in detecting associations. 
    more » « less