Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Noise Contrastive Estimation (NCE) is a popular approach for learning probability density functions parameterized up to a constant of proportionality. The main idea is to design a classification problem for distinguishing training data from samples from an easytosample noise distribution q, in a manner that avoids having to calculate a partition function. It is wellknown that the choice of q can severely impact the computational and statistical efficiency of NCE. In practice, a common choice for q is a Gaussian which matches the mean and covariance of the data. In this paper, we show that such a choice can result in an exponentially bad (in the ambient dimension) conditioning of the Hessian of the loss, even for very simple data distributions. As a consequence, both the statistical and algorithmic complexity for such a choice of q will be problematic in practice, suggesting that more complex and tailored noise distributions are essential to the success of NCE.more » « less

We consider Ising models on the hypercube with a general interaction matrix 𝐽, and give a polynomial time sampling algorithm when all but 𝑂(1) eigenvalues of 𝐽 lie in an interval of length one, a situation which occurs in many models of interest. This was previously known for the Glauber dynamics when \emph{all} eigenvalues fit in an interval of length one; however, a single outlier can force the Glauber dynamics to mix torpidly. Our general result implies the first polynomial time sampling algorithms for lowrank Ising models such as Hopfield networks with a fixed number of patterns and Bayesian clustering models with lowdimensional contexts, and greatly improves the polynomial time sampling regime for the antiferromagnetic/ferromagnetic Ising model with inconsistent field on expander graphs. It also improves on previous approximation algorithm results based on the naive meanfield approximation in variational methods and statistical physics. Our approach is based on a new fusion of ideas from the MCMC and variational inference worlds. As part of our algorithm, we define a new nonconvex variational problem which allows us to sample from an exponential reweighting of a distribution by a negative definite quadratic form, and show how to make this procedure provably efficient using stochastic gradient descent. On top of this, we construct a new simulated tempering chain (on an extended state space arising from the HubbardStratonovich transform) which overcomes the obstacle posed by large positive eigenvalues, and combine it with the SGDbased sampler to solve the full problem.more » « less

Recently, many reinforcement learning techniques have been shown to have provable guarantees in the simple case of linear dynamics, especially in problems like linear quadratic regulators. However, in practice many tasks require learning a policy from rich, highdimensional features such as images, which are unlikely to be linear. We consider a setting where there is a hidden linear subspace of the highdimensional feature space in which the dynamics are linear. We design natural objectives based on forward and inverse dynamics models. We prove that these objectives can be efficiently optimized and their local optimizers extract the hidden linear subspace. We empirically verify our theoretical results with synthetic data and explore the effectiveness of our approach (generalized to nonlinear settings) in simple control tasks with rich observations.more » « less

We give a algorithm for exact sampling from the Bingham distribution p(x)∝exp(x⊤Ax) on the sphere Sd−1 with expected runtime of poly(d,λmax(A)−λmin(A)). The algorithm is based on rejection sampling, where the proposal distribution is a polynomial approximation of the pdf, and can be sampled from by explicitly evaluating integrals of polynomials over the sphere. Our algorithm gives exact samples, assuming exact computation of an inverse function of a polynomial. This is in contrast with Markov Chain Monte Carlo algorithms, which are not known to enjoy rapid mixing on this problem, and only give approximate samples. As a direct application, we use this to sample from the posterior distribution of a rank1 matrix inference problem in polynomial time.more » « less

Estimating the normalizing constant of an unnormalized probability distribution has important applications in computer science, statistical physics, machine learning, and statistics. In this work, we consider the problem of estimating the normalizing constant to within a multiplication factor of 1 ± ε for a μstrongly convex and Lsmooth function f, given query access to f(x) and ∇f(x). We give both algorithms and lowerbounds for this problem. Using an annealing algorithm combined with a multilevel Monte Carlo method based on underdamped Langevin dynamics, we show that O(d^{4/3}/\eps^2) queries to ∇f are sufficient. Moreover, we provide an information theoretic lowerbound, showing that at least d^{1o(1)}/\eps^{2o(1)} queries are necessary. This provides a first nontrivial lowerbound for the problem.more » « less