skip to main content

Title: Efficient sampling from the Bingham distribution
We give a algorithm for exact sampling from the Bingham distribution p(x)∝exp(x⊤Ax) on the sphere Sd−1 with expected runtime of poly(d,λmax(A)−λmin(A)). The algorithm is based on rejection sampling, where the proposal distribution is a polynomial approximation of the pdf, and can be sampled from by explicitly evaluating integrals of polynomials over the sphere. Our algorithm gives exact samples, assuming exact computation of an inverse function of a polynomial. This is in contrast with Markov Chain Monte Carlo algorithms, which are not known to enjoy rapid mixing on this problem, and only give approximate samples. As a direct application, we use this to sample from the posterior distribution of a rank-1 matrix inference problem in polynomial time.
Authors:
; ; ;
Award ID(s):
1845171 1704656
Publication Date:
NSF-PAR ID:
10231314
Journal Name:
ALT
Sponsoring Org:
National Science Foundation
More Like this
  1. We introduce a framework for statistical estimation that leverages knowledge of how samples are collected but makes no distributional assumptions on the data values. Specifically, we consider a population of elements [n]={1,...,n} with corresponding data values x1,...,xn. We observe the values for a "sample" set A \subset [n] and wish to estimate some statistic of the values for a "target" set B \subset [n] where B could be the entire set. Crucially, we assume that the sets A and B are drawn according to some known distribution P over pairs of subsets of [n]. A given estimation algorithm is evaluatedmore »based on its "worst-case, expected error" where the expectation is with respect to the distribution P from which the sample A and target sets B are drawn, and the worst-case is with respect to the data values x1,...,xn. Within this framework, we give an efficient algorithm for estimating the target mean that returns a weighted combination of the sample values–-where the weights are functions of the distribution P and the sample and target sets A, B--and show that the worst-case expected error achieved by this algorithm is at most a multiplicative pi/2 factor worse than the optimal of such algorithms. The algorithm and proof leverage a surprising connection to the Grothendieck problem. We also extend these results to the linear regression setting where each datapoint is not a scalar but a labeled vector (xi,yi). This framework, which makes no distributional assumptions on the data values but rather relies on knowledge of the data collection process via the distribution P, is a significant departure from the typical statistical estimation framework and introduces a uniform analysis for the many natural settings where membership in a sample may be correlated with data values, such as when individuals are recruited into a sample through their social networks as in "snowball/chain" sampling or when samples have chronological structure as in "selective prediction".« less
  2. We consider the fundamental problem of ReLU regression, where the goal is to output the best fitting ReLU with respect to square loss given access to draws from some unknown distribution. We give the first efficient, constant-factor approximation algorithm for this problem assuming the underlying distribution satisfies some weak concentration and anti-concentration conditions (and includes, for example, all log-concave distributions). This solves the main open problem of Goel et al., who proved hardness results for any exact algorithm for ReLU regression (up to an additive ϵ). Using more sophisticated techniques, we can improve our results and obtain a polynomial-time approximationmore »scheme for any subgaussian distribution. Given the aforementioned hardness results, these guarantees can not be substantially improved. Our main insight is a new characterization of surrogate losses for nonconvex activations. While prior work had established the existence of convex surrogates for monotone activations, we show that properties of the underlying distribution actually induce strong convexity for the loss, allowing us to relate the global minimum to the activation’s Chow parameters.« less
  3. Given two (di)graphs G, H and a cost function c:V(G) x V(H) -> Q_{>= 0} cup {+infty}, in the minimum cost homomorphism problem, MinHOM(H), we are interested in finding a homomorphism f:V(G)-> V(H) (a.k.a H-coloring) that minimizes sum limits_{v in V(G)}c(v,f(v)). The complexity of exact minimization of this problem is well understood [Pavol Hell and Arash Rafiey, 2012], and the class of digraphs H, for which the MinHOM(H) is polynomial time solvable is a small subset of all digraphs. In this paper, we consider the approximation of MinHOM within a constant factor. In terms of digraphs, MinHOM(H) is not approximablemore »if H contains a digraph asteroidal triple (DAT). We take a major step toward a dichotomy classification of approximable cases. We give a dichotomy classification for approximating the MinHOM(H) when H is a graph (i.e. symmetric digraph). For digraphs, we provide constant factor approximation algorithms for two important classes of digraphs, namely bi-arc digraphs (digraphs with a conservative semi-lattice polymorphism or min-ordering), and k-arc digraphs (digraphs with an extended min-ordering). Specifically, we show that: - Dichotomy for Graphs: MinHOM(H) has a 2|V(H)|-approximation algorithm if graph H admits a conservative majority polymorphims (i.e. H is a bi-arc graph), otherwise, it is inapproximable; - MinHOM(H) has a |V(H)|^2-approximation algorithm if H is a bi-arc digraph; - MinHOM(H) has a |V(H)|^2-approximation algorithm if H is a k-arc digraph. In conclusion, we show the importance of these results and provide insights for achieving a dichotomy classification of approximable cases. Our constant factors depend on the size of H. However, the implementation of our algorithms provides a much better approximation ratio. It leaves open to investigate a classification of digraphs H, where MinHOM(H) admits a constant factor approximation algorithm that is independent of |V(H)|.« less
  4. Given two (di)graphs G, H and a cost function c:V(G) x V(H) -> Q_{>= 0} cup {+infty}, in the minimum cost homomorphism problem, MinHOM(H), we are interested in finding a homomorphism f:V(G)-> V(H) (a.k.a H-coloring) that minimizes sum limits_{v in V(G)}c(v,f(v)). The complexity of exact minimization of this problem is well understood [Pavol Hell and Arash Rafiey, 2012], and the class of digraphs H, for which the MinHOM(H) is polynomial time solvable is a small subset of all digraphs. In this paper, we consider the approximation of MinHOM within a constant factor. In terms of digraphs, MinHOM(H) is not approximablemore »if H contains a digraph asteroidal triple (DAT). We take a major step toward a dichotomy classification of approximable cases. We give a dichotomy classification for approximating the MinHOM(H) when H is a graph (i.e. symmetric digraph). For digraphs, we provide constant factor approximation algorithms for two important classes of digraphs, namely bi-arc digraphs (digraphs with a conservative semi-lattice polymorphism or min-ordering), and k-arc digraphs (digraphs with an extended min-ordering). Specifically, we show that: - Dichotomy for Graphs: MinHOM(H) has a 2|V(H)|-approximation algorithm if graph H admits a conservative majority polymorphims (i.e. H is a bi-arc graph), otherwise, it is inapproximable; - MinHOM(H) has a |V(H)|^2-approximation algorithm if H is a bi-arc digraph; - MinHOM(H) has a |V(H)|^2-approximation algorithm if H is a k-arc digraph. In conclusion, we show the importance of these results and provide insights for achieving a dichotomy classification of approximable cases. Our constant factors depend on the size of H. However, the implementation of our algorithms provides a much better approximation ratio. It leaves open to investigate a classification of digraphs H, where MinHOM(H) admits a constant factor approximation algorithm that is independent of |V(H)|.« less
  5. Given a random n × n symmetric matrix 𝑾 drawn from the Gaussian orthogonal ensemble (GOE), we consider the problem of certifying an upper bound on the maximum value of the quadratic form 𝒙^⊤ 𝑾 𝒙 over all vectors 𝒙 in a constraint set 𝒮 ⊂ ℝⁿ. For a certain class of normalized constraint sets we show that, conditional on a certain complexity-theoretic conjecture, no polynomial-time algorithm can certify a better upper bound than the largest eigenvalue of 𝑾. A notable special case included in our results is the hypercube 𝒮 = {±1/√n}ⁿ, which corresponds to the problem of certifyingmore »bounds on the Hamiltonian of the Sherrington-Kirkpatrick spin glass model from statistical physics. Our results suggest a striking gap between optimization and certification for this problem. Our proof proceeds in two steps. First, we give a reduction from the detection problem in the negatively-spiked Wishart model to the above certification problem. We then give evidence that this Wishart detection problem is computationally hard below the classical spectral threshold, by showing that no low-degree polynomial can (in expectation) distinguish the spiked and unspiked models. This method for predicting computational thresholds was proposed in a sequence of recent works on the sum-of-squares hierarchy, and is conjectured to be correct for a large class of problems. Our proof can be seen as constructing a distribution over symmetric matrices that appears computationally indistinguishable from the GOE, yet is supported on matrices whose maximum quadratic form over 𝒙 ∈ 𝒮 is much larger than that of a GOE matrix.« less