We study the problem of efficiently estimating the effect of an intervention on a single variable using observational samples. Our goal is to give algorithms with polynomial time and sample complexity in a nonparametric setting. Tian and Pearl (AAAI ’02) have exactly characterized the class of causal graphs for which causal effects of atomic interventions can be identified from observational data. We make their result quantitative. Suppose 𝒫 is a causal model on a set V of n observable variables with respect to a given causal graph G, and let do(x) be an identifiable intervention on a variable X. We show that assuming that G has bounded indegree and bounded ccomponents (k) and that the observational distribution satisfies a strong positivity condition: (i) [Evaluation] There is an algorithm that outputs with probability 2/3 an evaluator for a distribution P^ that satisfies TV(P(V  do(x)), P^(V)) < eps using m=O (n/eps^2) samples from P and O(mn) time. The evaluator can return in O(n) time the probability P^(v) for any assignment v to V. (ii) [Sampling] There is an algorithm that outputs with probability 2/3 a sampler for a distribution P^ that satisfies TV(P(V  do(x)), P^(V)) < eps using m=O (n/eps^2) samples from P and O(mn) time. The sampler returns an iid sample from P^ with probability 1 in O(n) time. We extend our techniques to estimate P(Y  do(x)) for a subset Y of variables of interest. We also show lower bounds for the sample complexity, demonstrating that our sample complexity has optimal dependence on the parameters n and eps, as well as if k=1 on the strong positivity parameter.
more »
« less
Efficient sampling from the Bingham distribution
We give a algorithm for exact sampling from the Bingham distribution p(x)∝exp(x⊤Ax) on the sphere Sd−1 with expected runtime of poly(d,λmax(A)−λmin(A)). The algorithm is based on rejection sampling, where the proposal distribution is a polynomial approximation of the pdf, and can be sampled from by explicitly evaluating integrals of polynomials over the sphere. Our algorithm gives exact samples, assuming exact computation of an inverse function of a polynomial. This is in contrast with Markov Chain Monte Carlo algorithms, which are not known to enjoy rapid mixing on this problem, and only give approximate samples.
As a direct application, we use this to sample from the posterior distribution of a rank1 matrix inference problem in polynomial time.
more »
« less
 NSFPAR ID:
 10231314
 Date Published:
 Journal Name:
 ALT
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We consider Ising models on the hypercube with a general interaction matrix 𝐽, and give a polynomial time sampling algorithm when all but 𝑂(1) eigenvalues of 𝐽 lie in an interval of length one, a situation which occurs in many models of interest. This was previously known for the Glauber dynamics when \emph{all} eigenvalues fit in an interval of length one; however, a single outlier can force the Glauber dynamics to mix torpidly. Our general result implies the first polynomial time sampling algorithms for lowrank Ising models such as Hopfield networks with a fixed number of patterns and Bayesian clustering models with lowdimensional contexts, and greatly improves the polynomial time sampling regime for the antiferromagnetic/ferromagnetic Ising model with inconsistent field on expander graphs. It also improves on previous approximation algorithm results based on the naive meanfield approximation in variational methods and statistical physics. Our approach is based on a new fusion of ideas from the MCMC and variational inference worlds. As part of our algorithm, we define a new nonconvex variational problem which allows us to sample from an exponential reweighting of a distribution by a negative definite quadratic form, and show how to make this procedure provably efficient using stochastic gradient descent. On top of this, we construct a new simulated tempering chain (on an extended state space arising from the HubbardStratonovich transform) which overcomes the obstacle posed by large positive eigenvalues, and combine it with the SGDbased sampler to solve the full problem.more » « less

We consider the fundamental problem of ReLU regression, where the goal is to output the best fitting ReLU with respect to square loss given access to draws from some unknown distribution. We give the first efficient, constantfactor approximation algorithm for this problem assuming the underlying distribution satisfies some weak concentration and anticoncentration conditions (and includes, for example, all logconcave distributions). This solves the main open problem of Goel et al., who proved hardness results for any exact algorithm for ReLU regression (up to an additive ϵ). Using more sophisticated techniques, we can improve our results and obtain a polynomialtime approximation scheme for any subgaussian distribution. Given the aforementioned hardness results, these guarantees can not be substantially improved. Our main insight is a new characterization of surrogate losses for nonconvex activations. While prior work had established the existence of convex surrogates for monotone activations, we show that properties of the underlying distribution actually induce strong convexity for the loss, allowing us to relate the global minimum to the activation’s Chow parameters.more » « less

null (Ed.)We introduce a framework for statistical estimation that leverages knowledge of how samples are collected but makes no distributional assumptions on the data values. Specifically, we consider a population of elements [n]={1,...,n} with corresponding data values x1,...,xn. We observe the values for a "sample" set A \subset [n] and wish to estimate some statistic of the values for a "target" set B \subset [n] where B could be the entire set. Crucially, we assume that the sets A and B are drawn according to some known distribution P over pairs of subsets of [n]. A given estimation algorithm is evaluated based on its "worstcase, expected error" where the expectation is with respect to the distribution P from which the sample A and target sets B are drawn, and the worstcase is with respect to the data values x1,...,xn. Within this framework, we give an efficient algorithm for estimating the target mean that returns a weighted combination of the sample values–where the weights are functions of the distribution P and the sample and target sets A, Band show that the worstcase expected error achieved by this algorithm is at most a multiplicative pi/2 factor worse than the optimal of such algorithms. The algorithm and proof leverage a surprising connection to the Grothendieck problem. We also extend these results to the linear regression setting where each datapoint is not a scalar but a labeled vector (xi,yi). This framework, which makes no distributional assumptions on the data values but rather relies on knowledge of the data collection process via the distribution P, is a significant departure from the typical statistical estimation framework and introduces a uniform analysis for the many natural settings where membership in a sample may be correlated with data values, such as when individuals are recruited into a sample through their social networks as in "snowball/chain" sampling or when samples have chronological structure as in "selective prediction".more » « less

Given two (di)graphs G, H and a cost function c:V(G) x V(H) > Q_{>= 0} cup {+infty}, in the minimum cost homomorphism problem, MinHOM(H), we are interested in finding a homomorphism f:V(G)> V(H) (a.k.a Hcoloring) that minimizes sum limits_{v in V(G)}c(v,f(v)). The complexity of exact minimization of this problem is well understood [Pavol Hell and Arash Rafiey, 2012], and the class of digraphs H, for which the MinHOM(H) is polynomial time solvable is a small subset of all digraphs. In this paper, we consider the approximation of MinHOM within a constant factor. In terms of digraphs, MinHOM(H) is not approximable if H contains a digraph asteroidal triple (DAT). We take a major step toward a dichotomy classification of approximable cases. We give a dichotomy classification for approximating the MinHOM(H) when H is a graph (i.e. symmetric digraph). For digraphs, we provide constant factor approximation algorithms for two important classes of digraphs, namely biarc digraphs (digraphs with a conservative semilattice polymorphism or minordering), and karc digraphs (digraphs with an extended minordering). Specifically, we show that:  Dichotomy for Graphs: MinHOM(H) has a 2V(H)approximation algorithm if graph H admits a conservative majority polymorphims (i.e. H is a biarc graph), otherwise, it is inapproximable;  MinHOM(H) has a V(H)^2approximation algorithm if H is a biarc digraph;  MinHOM(H) has a V(H)^2approximation algorithm if H is a karc digraph. In conclusion, we show the importance of these results and provide insights for achieving a dichotomy classification of approximable cases. Our constant factors depend on the size of H. However, the implementation of our algorithms provides a much better approximation ratio. It leaves open to investigate a classification of digraphs H, where MinHOM(H) admits a constant factor approximation algorithm that is independent of V(H).more » « less