skip to main content


Title: (f-Γ)-Divergences: Interpolating between f-Divergences and Integral Probability Metrics
Award ID(s):
2008970
NSF-PAR ID:
10356230
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Journal of machine learning research
Volume:
23
ISSN:
1533-7928
Page Range / eLocation ID:
1-70
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The quantum relative entropy is a measure of the distinguishability of two quantum states, and it is a unifying concept in quantum information theory: many information measures such as entropy, conditional entropy, mutual information, and entanglement measures can be realized from it. As such, there has been broad interest in generalizing the notion to further understand its most basic properties, one of which is the data processing inequality. The quantum f-divergence of Petz is one generalization of the quantum relative entropy, and it also leads to other relative entropies, such as the Petz--Renyi relative entropies. In this contribution, I introduce the optimized quantum f-divergence as a related generalization of quantum relative entropy. I prove that it satisfies the data processing inequality, and the method of proof relies upon the operator Jensen inequality, similar to Petz's original approach. Interestingly, the sandwiched Renyi relative entropies are particular examples of the optimized f-divergence. Thus, one benefit of this approach is that there is now a single, unified approach for establishing the data processing inequality for both the Petz--Renyi and sandwiched Renyi relative entropies, for the full range of parameters for which it is known to hold. 
    more » « less
  2. null (Ed.)
  3. Statistical distances (SDs), which quantify the dissimilarity between probability distributions, are central to machine learning and statistics. A modern method for estimating such distances from data relies on parametrizing a variational form by a neural network (NN) and optimizing it. These estimators are abundantly used in practice, but corresponding performance guarantees are partial and call for further exploration. In particular, there seems to be a fundamental tradeoff between the two sources of error involved: approximation and estimation. While the former needs the NN class to be rich and expressive, the latter relies on controlling complexity. This paper explores this tradeoff by means of non-asymptotic error bounds, focusing on three popular choices of SDs—Kullback-Leibler divergence, chi-squared divergence, and squared Hellinger distance. Our analysis relies on non-asymptotic function approximation theorems and tools from empirical process theory. Numerical results validating the theory are also provided. 
    more » « less