Title: Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space
We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods. Our main application is to the problem of mean-field variational inference, which seeks to approximate a distribution π over ℝd by a product measure π⋆. When π is strongly log-concave and log-smooth, we provide (1) approximation rates certifying that π⋆ is close to the minimizer π⋆⋄ of the KL divergence over a \emph{polyhedral} set ⋄, and (2) an algorithm for minimizing KL(⋅‖π) over ⋄ with accelerated complexity O(κ√log(κd/ε2)), where κ is the condition number of π. more »« less
Impagliazzo, Russell; Mouli, Sasank; Pitassi, Toniann
(, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Ta-Shma, Amnon
(Ed.)
For every prime p > 0, every n > 0 and κ = O(log n), we show the existence of an unsatisfiable system of polynomial equations over O(n log n) variables of degree O(log n) such that any Polynomial Calculus refutation over 𝔽_p with M extension variables, each depending on at most κ original variables requires size exp(Ω(n²)/10^κ(M + n log n))
Painsky, Amichai; Wornell, Gregory
(, Proceedings of International Symposium on Information Theory)
A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, and the minimizer of the expected loss is the true underlying probability of the data. Typical examples are the zero-one loss, the quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work we show that for binary classification problems, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a multiplicative normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss functions from this set. This property justifies the broad use of log-loss in regression, decision trees, deep neural networks and many other applications. In addition, we show that the KL divergence bounds from above any separable Bregman divergence that is convex in its second argument (up to a multiplicative normalization constant). This result introduces a new set of divergence inequalities, similar to the well-known Pinsker inequality.
Chattopadhyay, Eshan; Kanukurthi, Bhavana; Obbattu, Sai Lakshmi; Sekar, Sruthi
(, Lecture notes in computer science)
Non-malleable Codes give us the following property: their codewords cannot be tampered into codewords of related messages. Privacy Amplification allows parties to convert their weak shared secret into a fully hidden, uniformly distributed secret key, while communicating on a fully tamperable public channel. In this work, we show how to construct a constant round privacy amplification protocol from any augmented split-state non-malleable code. Existentially, this gives us another primitive (in addition to optimal non-malleable extractors) whose optimal construction would solve the long-standing open problem of building constant round privacy amplification with optimal entropy loss. Instantiating our code with the current best known NMC gives us an 8-round privacy amplification protocol with entropy loss O(log(n)+κlog(κ)) and min-entropy requirement Ω(log(n)+κlog(κ)), where κ is the security parameter and n is the length of the shared weak secret. In fact, for our result, even the weaker primitive of Non-malleable Randomness Encoders suffice. We view our result as an exciting connection between two of the most fascinating and well-studied information theoretic primitives, non-malleable codes and privacy amplification.
An, Dong; Lin, Lin
(, ACM Transactions on Quantum Computing)
We demonstrate that with an optimally tuned scheduling function, adiabatic quantum computing (AQC) can readily solve a quantum linear system problem (QLSP) with O (κ poly(log (κ ε))) runtime, where κ is the condition number, and ε is the target accuracy. This is near optimal with respect to both κ and ε, and is achieved without relying on complicated amplitude amplification procedures that are difficult to implement. Our method is applicable to general non-Hermitian matrices, and the cost as well as the number of qubits can be reduced when restricted to Hermitian matrices, and further to Hermitian positive definite matrices. The success of the time-optimal AQC implies that the quantum approximate optimization algorithm (QAOA) with an optimal control protocol can also achieve the same complexity in terms of the runtime. Numerical results indicate that QAOA can yield the lowest runtime compared to the time-optimal AQC, vanilla AQC, and the recently proposed randomization method.
Tang, Rong; Yang, Yun
(, Journal of machine learning research)
In this paper, we examine the computational complexity of sampling from a Bayesian posterior (or pseudo-posterior) using the Metropolis-adjusted Langevin algorithm (MALA). MALA first employs a discrete-time Langevin SDE to propose a new state, and then adjusts the proposed state using Metropolis-Hastings rejection. Most existing theoretical analyses of MALA rely on the smoothness and strong log-concavity properties of the target distribution, which are often lacking in practical Bayesian problems. Our analysis hinges on statistical large sample theory, which constrains the deviation of the Bayesian posterior from being smooth and log-concave in a very specific way. In particular, we introduce a new technique for bounding the mixing time of a Markov chain with a continuous state space via the s-conductance profile, offering improvements over existing techniques in several aspects. By employing this new technique, we establish the optimal parameter dimension dependence of d^1/3 and condition number dependence of κ in the non-asymptotic mixing time upper bound for MALA after the burn-in period, under a standard Bayesian setting where the target posterior distribution is close to a d-dimensional Gaussian distribution with a covariance matrix having a condition number κ. We also prove a matching mixing time lower bound for sampling from a multivariate Gaussian via MALA to complement the upper bound.
Jiang, Yiheng, Chewi, Sinho, and Pooladian, Aram-Alexandre. Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space. Retrieved from https://par.nsf.gov/biblio/10535875.
Jiang, Yiheng, Chewi, Sinho, & Pooladian, Aram-Alexandre. Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space. Retrieved from https://par.nsf.gov/biblio/10535875.
Jiang, Yiheng, Chewi, Sinho, and Pooladian, Aram-Alexandre.
"Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space". Country unknown/Code not available: Conference on Learning Theory (COLT) 2024. https://par.nsf.gov/biblio/10535875.
@article{osti_10535875,
place = {Country unknown/Code not available},
title = {Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space},
url = {https://par.nsf.gov/biblio/10535875},
abstractNote = {We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods. Our main application is to the problem of mean-field variational inference, which seeks to approximate a distribution π over ℝd by a product measure π⋆. When π is strongly log-concave and log-smooth, we provide (1) approximation rates certifying that π⋆ is close to the minimizer π⋆⋄ of the KL divergence over a \emph{polyhedral} set ⋄, and (2) an algorithm for minimizing KL(⋅‖π) over ⋄ with accelerated complexity O(κ√log(κd/ε2)), where κ is the condition number of π.},
journal = {},
publisher = {Conference on Learning Theory (COLT) 2024},
author = {Jiang, Yiheng and Chewi, Sinho and Pooladian, Aram-Alexandre},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.